協同行為導向之系統模擬平台應用於平行運作系統設計、除錯

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳昕益 Wu, Hsin-I
論文名稱：	協同行為導向之系統模擬平台應用於平行運作系統設計、除錯 A Virtualization-Assisted Full-System Simulation Approach for the Verification of System Inter-Component Interactions
指導教授：	蔡仁松 Tsay, Ren-Song
口試委員:	邱瀞德 Chiu, Ching-Te 李哲榮 Lee, Che-Rung 蘇培陞 Su, Pei-Sheng 蘇泓萌 Su, Hung-Meng
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2020
畢業學年度：	108
語文別：	中文
論文頁數：	64
中文關鍵詞：	模型建構、效能分析、模擬、虛擬化
外文關鍵詞：	Modeling, Performance analysis, Simulation, Virtualization
相關次數：	點閱：65 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

全系統模擬對於Embedded System設計驗證至關重要。其Flexibility與Early Availability性質特別適合設計初期探索與驗證系統行為。然而,傳統的模擬加速方法通常會遭遇Performance、Accuracy或是Scalability之困難。因此,本計畫提出VIRA (VIRtualization-Assisted)方法,建立Fast、Accurate與Scalable之全系統模擬器以克服上述困難。為了增加模擬效能,VIRA將Hardware-Assisted Component執行於Host HardwareDevice以得到Native Execution之效能。為了確保Accuracy與正確的Data Dependency,本計畫提出了一個包含Bus Contention Delay之Deterministic Timing Model。為了增加擴充功能,VIRA整合了Software-Modeled Component以支援新增功能,並使用快速Data Pass-Through機制以減少被模擬元件間的Communication Overhead。我們透過在市售的SoC(System-on-Chip)板上實作此一技術來驗證所提出的Virtualization-Assisted全系統模擬。實驗結果顯示,除了能夠得到正確的Inter-Component Interaction結果,執行速度也比Commercial Functional Simulator快了58〜625倍。

We propose in this thesis a near-real-time performance full-system simulation approach with hardware acceleration using virtualization techniques. Traditional acceleration approaches generally cannot capture inter-component interactions due to unpredictable component simulation progress. Our approach leverages existing hardware virtualization framework and devises three key implementation techniques to achieve fast and accurate full-system simulations. First, our approach utilizes the virtualization framework trap mechanism and precisely intercepts inter-component interactions with no need to check every data access, but effectively maintains deterministic chronological orders of inter-component interactions. Second, VIRA provides very accurate system performance estimation for early system-level designs through effective integration of component timing models, interrupt effects, and bus contention analysis. Third, VIRA achieves near-real-time performance by having software and hardware simulated components executed on the same host machine to minimize the overhead of inter-component data exchange. We implement the proposed approach on a virtualization-enabled off-the-shelf System-on-Chip board to demonstrate the effectiveness of our idea. The experiments show that VIRA always produces deterministic results while running 58~625 times faster than a commercial tool and the system performance estimation is only 3~6% from real systems. Moreover, our deterministic full-system simulator is also verified to carry as little as 2~57% overhead compared to ideal native executions on the same host hardware devices.

摘要    2
Abstract    3
Contents    4
I.    Introduction    7
II.    Related work    13
2.1    Software-Based Simulation Acceleration    13
2.2    Hardware-Based Simulation Acceleration    15
2.3    High Abstraction Modeling Approaches    17
III.    THE VIRTUALIZATION-ASSISTED APPROACH    20
3.1    Hardware Annotation    20
3.2    Data-Dependency-based Synchronization    25
3.3    Runtime Operation Timing Calculation    28
3.4    Contention-Aware Timing Model    30
IV.    IMPLEMENTATION    34
4.1. Support Both HACs and SMCs    34
4.2    Integrate HACs and SMCs through Fast Data Path    36
4.3    Intercept Synchronization Points    37
4.4    Identify SDAs    39
4.5    VIRA Simulator Architecture    42
4.6    VIRA Full Simulation Flow    43
V.    EXPERIMENTAL RESULTS    46
5.1    Performance Comparison    47
5.2    Full System Simulations Considering Bus Contention Effect    52
5.3    Full System Performance Estimation    53
VI.    CONCLUSION    55
References    55
                                

[1]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S. “A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation.” In ACM Transactions on Embedded Computing Systems. no. 54. 2013.
[2]Cai, L., & Gajski, D. “Transaction level modeling: an overview.” In Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. pp. 19-24. 2003
[3]Bellard, F. “QEMU, a Fast and Portable Dynamic Translator.” In USENIX Annual Technical Conference. pp. 41-46. 2005.
[4]Khaligh, R. S., & Radetzki, M. “A dynamic load balancing method for parallel simulation of accuracy adaptive TLMs.” In Specification & Design Languages. pp. 1-6. 2010.
[5]Chen, J., Annavaram, M., & Dubois, M. “SlackSim: a platform for parallel simulations of CMPs on CMPs.” In ACM SIGARCH Computer Architecture News. pp. 20-29. 2009.
[6]Moy, M. (2013, March). “Parallel programming with SystemC for loosely timed models: a non-intrusive approach.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp. 9-14. 2013.
[7]Weinstock, J. H., Schumacher, C., Leupers, R., Ascheid, G., & Tosoratto, L. “Time-decoupled parallel SystemC simulation.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp.1-4. 2014.
[8]Vinco, S., Chatterjee, D., Bertacco, V., & Fummi, F. “SAGA: SystemC acceleration on GPU architectures.” In Proceedings of the Design Automation Conference. pp. 115-120. 2012.
[9]Sinha, R., Prakash, A., & Patel, H. D. “Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs.” In Design Automation Conference Asia and South Pacific. pp. 455-460. 2012.
[10]Nakamura, Y., Hosokawa, K., Kuroda, I., Yoshikawa, K., & Yoshimura, T. “A fast hardware/software co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication.” In Proceedings of the Design Automation Conference. pp. 299-304. 2004.
[11]Chung, E. S., Nurvitadhi, E., Hoe, J. C., Falsafi, B., & Mai, K., “PROToFLEX: FPGA-accelerated hybrid functional simulator.” In Parallel and Distributed Processing Symposium. pp.1-6. 2007.
[12]Chiou, D., Sunwoo, D., Kim, J., Patil, N. A., Reinhart, W., Johnson, D. E. & Angepat, H. “FPGA -accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators.” In Proceedings of the International Symposium on Microarchitecture. pp. 249-261. 2007.
[13]Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., & Asanović, K. “RAMP gold: an FPGA-based architecture simulator for multiprocessors.” In Proceedings of the Design Automation Conference. pp. 463-468. 2010.
[14]Dall, C., & Nieh, J. “KVM/ARM: the design and implementation of the linux ARM hypervisor.” In ACM SIGARCH Computer Architecture News. pp. 333-348. 2014.
[15]Erdfelt, J., & Drake, D. LibUSB Homepage. http://www.libusb.org.
[16]Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., & Gupta, A. “The SPLASH-2 programs: Characterization and methodological considerations.” In ACM SIGARCH computer architecture news. pp. 24-36. 1995.
[17]https://www.96boards.org/product/rock960/
[18]Russell, R. “ virtio: towards a de-facto standard for virtual I/O devices.” In ACM SIGOPS Operating Systems Review. pp.95-103. 2008.
[19]Chandran, P., Chandra, J., Simon, B. P., & Ravi, D. “Parallelizing SystemC kernel for fast hardware simulation on SMP machines.” In Proceedings of the ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation. pp. 80-87. 2009.
[20]Raghav, S., Marongiu, A., Pinto, C., Atienza, D., Ruggiero, M., & Benini, L. “Full-system simulation of many-core heterogeneous SOCs using GPU and QEMU semihosting.” In Proceedings of the Workshop on General Purpose Processing with Graphics Processing Units. pp. 101-109. 2012.
[21]Pellauer, M., Adler, M., Kinsy, M., Parashar, A., & Emer, J. “HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing.” In High Performance Computer Architecture International Symposium. pp. 406-417. 2011.
[22]Tan, Z., Waterman, A., Cook, H., Bird, S., Asanović, K., & Patterson, D. “A case for FAME: FPGA architecture model execution.” In ACM SIGARCH Computer Architecture News. pp. 290-301. 2010.
[23]Mukherjee, S. S., Reinhardt, S.K., Falsafi, B., Litzkow, M., Hill, M.D., Wood, D.A., Huss-Lederman, S. & Larus, J.R. “Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator.” In IEEE Concurrency. pp.12-20. 2000.
[24]Kivity, A., Kamay, Y., Laor, D., Lublin, U., & Liguori, A. “kvm: the Linux virtual machine monitor.” In Proceedings of the Linux Symposium. pp. 225-230. 2007.
[25]Khaligh, R. S., & Radetzki, M. “Efficient parallel transaction level simulation by exploiting temporal decoupling.” In Analysis, Architectures and Modelling of Embedded Systems. pp. 149-158. 2009.
[26]Matteo Monchiero, Jung Ho Ahn, Ayose Falcón, Daniel Ortega, and Paolo Faraboschi. “How to simulate 1000 cores.” In ACM SIGARCH Computer Architecture News 37, no. 2. pp. 10-19. 2009.
[27]Rodman, N. “ARM FastModels–Virtual Platforms for Embedded Software Development.” In Information Quarterly Magazine. pp. 33-36. 2008.
[28]Lo, Chen Kang, and Ren Song Tsay. “Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model.” In Design Automation Conference Asia and South Pacific. pp. 558-563. 2009.
[29]Pasricha, S., Dutt, N., & Ben-Romdhane, M. “Fast exploration of bus-based communication architectures at the CCATB abstraction.” In ACM Transactions on Embedded Computing Systems (TECS), 2008.
[30]Caldari, M., Conti, M., Coppola, M., Curaba, S., Pieralisi, L., & Turchetti, C.). “Transaction-level models for AMBA bus architecture using SystemC 2.0.” In Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum-Volume 2. (p. 20026). 2003.
[31]Radetzki, M., & Khaligh, R. S. “Modelling Alternatives for Cycle Approximate Bus TLMs.” In FDL. pp. 74-79. 2007.
[32]Rosén, J., Neikter, C. F., Eles, P., Peng, Z., Burgio, P., & Benini, L. “Bus access design for combined worst and average case execution time optimization of predictable real-time applications on multiprocessor systems-on-chip.” In Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 291-301. 2011.
[33]Mao-Lin Li, Chen-Kang Lo, Li-Chun Chen, Hong-Jie Huang, Jen-Chieh Yeh, Ren-Song Tsay, “A Formal Full Bus TLM Modeling for Fast and Accurate Contention Analysis,” In the 17th Workshop on Synthesis And System Integration of Mixed Information technologies. 2012.
[34]Hwang, Y., Abdi, S., & Gajski, D. "Cycle-approximate retargetable performance estimation at the transaction level.” In Proceedings of the conference on Design, automation and test in Europe. pp. 3-8. 2008.
[35]Schirrmeister, F., Benchorin, S., & Thoen, F. “Using virtual platforms for pre-silicon software development.” In White paper, Synopsys. 2008
[36]Wang, Z., Liu, R., Chen, Y., Wu, X., Chen, H., Zhang, W., & Zang, B. “COREMU: a scalable and portable parallel full-system emulator.” In ACM SIGPLAN Notices, 46(8). pp. 213-222. 2011.
[37]Crockett, L. H., Elliot, R. A., Enderwitz, M. A., & Stewart, R. W. “The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc.” In Strathclyde Academic Media. 2014.
[38]Bammi, J. R., Kruijtzer, W., Lavagno, L., Harcourt, E., & Lazarescu, M. T. “Software performance estimation strategies in a system-level design tool.” In Proceedings of the eighth international workshop on Hardware/software codesign. pp. 82-86. 2000.
[39]Popek, G. J., & Goldberg, R. P. “Formal requirements for virtualizable third generation architectures.” In Communications of the ACM. pp. 412-421. 1974.
[40]Ding, J. H., Chang, P. C., Hsu, W. C., & Chung, Y. C. “PQEMU: A parallel system emulator based on QEMU.” In Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference. pp. 276-283. 2011.
[41]Hong, D. Y., Hsu, C. C., Yew, P. C., Wu, J. J., Hsu, W. C., Liu, P., & Chung, Y. C. “HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores.” In Proceedings of the Tenth International Symposium on Code Generation and Optimization pp. 104-113. 2012.
[42]Bringmann, O., Ecker, W., Gerstlauer, A., Goyal, A., Mueller-Gritschneder, D., Sasidharan, P., & Singh, S. “The next generation of virtual prototyping: Ultra-fast yet accurate simulation of HW/SW systems.” In Proceedings of the Design, Automation & Test in Europe Conference. pp. 1698-1707. 2015.
[43]Vinco, S., Guarnieri, V., & Fummi, F. “Code Manipulation for Virtual Platform Integration.” In IEEE Transactions on Computers, 65(9), pp. 2694-2708. 2016.
[44]Sandberg, A., Nikoleris, N., Carlson, T. E., Hagersten, E., Kaxiras, S., & Black-Schaffer, D. “Full speed ahead: Detailed architectural simulation at near-native speed.“ In Workload Characterization International Symposium. pp. 183-192. 2015.
[45]R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe, “SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,” In Proc. International Symposium on Computer Architecture (ISCA), pp. 84–95, 2003.
[46]E. Perelman, G. Hamerly, and B. Calder. “Picking Statistically Valid and Early Simulation Points,” In the International Symposium on Parallel Architecture and Compilation Techniques, 2003.
[47]Sugerman, J., Venkitachalam, G., & Lim, B. H. “Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor, “ In USENIX Annual Technical Conference, General Track. pp. 1-14, 2001.
[48]Lamport, L. “How to make a multiprocessor computer that correctly executes multiprocess program, “ In IEEE transactions on computers, (9), pp. 690-691. 1979
[49]Chen, S. Y., Chen, C. H., & Tsay, R. S. “An activity-sensitive contention delay model for highly efficient deterministic full-system simulations.” In Design, Automation and Test in Europe Conference and Exhibition. pp. 1-6. 2014.
[50]Zukerman, M. “Introduction to queueing theory and stochastic teletraffic models,“ In arXiv preprint arXiv:1307.2968. 2013
[51]Fritts, J. E., Steiling, F. W., & Tucek, J. A. “Mediabench II video: expediting the next generation of video systems research,” In Embedded Processors for Multimedia and Communications II (Vol. 5683). pp. 79-94. 2005.
[52]x265 [Online]. Available: http://x265.org
[53]Fan-Wei Yu, Bo-Han Zeng, Yu-Hung Huang, Hsin-I Wu, Che-Rung Lee and Ren-Song Tsay “A Critical-Section-Level Timing Synchronization Approach for Deterministic Multi-Core Instruction-Set Simulations,” In Design, Automation and Test in Europe Conference and Exhibition. 2013
[54]Jones, M. T. “Linux initial RAM disk (initrd) overview,“ In IBM developerworks, linux, Technical library. 2006
[55]Schirner, G., & Domer, R. “Result-oriented modeling—A novel technique for fast and accurate TLM,“ In IEEE Transactions on computer-aided design of integrated circuits and systems. pp. 1688-1699. 2007
[56]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S.,”A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation”. In ACM Transactions on Embedded Computing Systems. 2013.
[57]Wu, H. I., Chen, C. K., Lu, T. Y., & Tsay, R. S., “A highly efficient full-system virtual prototype based on virtualization-assisted approach”. In Design, Automation & Test in Europe Conference & Exhibition. 2018
[58]Iqbal, S. M. Z., Liang, Y., & Grahn, H., “Parmibench-an open-source benchmark for embedded multiprocessor systems”. In IEEE Computer Architecture Letters, 9(2), pp. 45-48. 2010
[59]http://cubieboard.org/model/
[60]Karandikar, S., Mao, H., Kim, D., Biancolin, D., Amid, A., Lee, D., & Huang, Q., “FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud”. In Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 29-42. 2018

簡易檢索 / 詳目顯示

相關論文