研究生: |
吳昕益 Wu, Hsin-I |
---|---|
論文名稱: |
協同行為導向之系統模擬平台應用於平行運作系統設計、除錯 A Virtualization-Assisted Full-System Simulation Approach for the Verification of System Inter-Component Interactions |
指導教授: |
蔡仁松
Tsay, Ren-Song |
口試委員: |
邱瀞德
Chiu, Ching-Te 李哲榮 Lee, Che-Rung 蘇培陞 Su, Pei-Sheng 蘇泓萌 Su, Hung-Meng |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 64 |
中文關鍵詞: | 模型建構 、效能分析 、模擬 、虛擬化 |
外文關鍵詞: | Modeling, Performance analysis, Simulation, Virtualization |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
全系統模擬對於Embedded System設計驗證至關重要。其Flexibility與Early Availability性質特別適合設計初期探索與驗證系統行為。然而,傳統的模擬加速方法通常會遭遇Performance、Accuracy或是Scalability之困難。因此,本計畫提出VIRA (VIRtualization-Assisted)方法,建立Fast、Accurate與Scalable之全系統模擬器以克服上述困難。為了增加模擬效能,VIRA將Hardware-Assisted Component執行於Host HardwareDevice以得到Native Execution之效能。為了確保Accuracy與正確的Data Dependency,本計畫提出了一個包含Bus Contention Delay之Deterministic Timing Model。為了增加擴充功能,VIRA整合了Software-Modeled Component以支援新增功能,並使用快速Data Pass-Through機制以減少被模擬元件間的Communication Overhead。我們透過在市售的SoC(System-on-Chip)板上實作此一技術來驗證所提出的Virtualization-Assisted全系統模擬。實驗結果顯示,除了能夠得到正確的Inter-Component Interaction結果,執行速度也比Commercial Functional Simulator快了58〜625倍。
We propose in this thesis a near-real-time performance full-system simulation approach with hardware acceleration using virtualization techniques. Traditional acceleration approaches generally cannot capture inter-component interactions due to unpredictable component simulation progress. Our approach leverages existing hardware virtualization framework and devises three key implementation techniques to achieve fast and accurate full-system simulations. First, our approach utilizes the virtualization framework trap mechanism and precisely intercepts inter-component interactions with no need to check every data access, but effectively maintains deterministic chronological orders of inter-component interactions. Second, VIRA provides very accurate system performance estimation for early system-level designs through effective integration of component timing models, interrupt effects, and bus contention analysis. Third, VIRA achieves near-real-time performance by having software and hardware simulated components executed on the same host machine to minimize the overhead of inter-component data exchange. We implement the proposed approach on a virtualization-enabled off-the-shelf System-on-Chip board to demonstrate the effectiveness of our idea. The experiments show that VIRA always produces deterministic results while running 58~625 times faster than a commercial tool and the system performance estimation is only 3~6% from real systems. Moreover, our deterministic full-system simulator is also verified to carry as little as 2~57% overhead compared to ideal native executions on the same host hardware devices.
[1]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S. “A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation.” In ACM Transactions on Embedded Computing Systems. no. 54. 2013.
[2]Cai, L., & Gajski, D. “Transaction level modeling: an overview.” In Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. pp. 19-24. 2003
[3]Bellard, F. “QEMU, a Fast and Portable Dynamic Translator.” In USENIX Annual Technical Conference. pp. 41-46. 2005.
[4]Khaligh, R. S., & Radetzki, M. “A dynamic load balancing method for parallel simulation of accuracy adaptive TLMs.” In Specification & Design Languages. pp. 1-6. 2010.
[5]Chen, J., Annavaram, M., & Dubois, M. “SlackSim: a platform for parallel simulations of CMPs on CMPs.” In ACM SIGARCH Computer Architecture News. pp. 20-29. 2009.
[6]Moy, M. (2013, March). “Parallel programming with SystemC for loosely timed models: a non-intrusive approach.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp. 9-14. 2013.
[7]Weinstock, J. H., Schumacher, C., Leupers, R., Ascheid, G., & Tosoratto, L. “Time-decoupled parallel SystemC simulation.” In Proceedings of the Conference on Design, Automation and Test in Europe. pp.1-4. 2014.
[8]Vinco, S., Chatterjee, D., Bertacco, V., & Fummi, F. “SAGA: SystemC acceleration on GPU architectures.” In Proceedings of the Design Automation Conference. pp. 115-120. 2012.
[9]Sinha, R., Prakash, A., & Patel, H. D. “Parallel simulation of mixed-abstraction SystemC models on GPUs and multicore CPUs.” In Design Automation Conference Asia and South Pacific. pp. 455-460. 2012.
[10]Nakamura, Y., Hosokawa, K., Kuroda, I., Yoshikawa, K., & Yoshimura, T. “A fast hardware/software co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication.” In Proceedings of the Design Automation Conference. pp. 299-304. 2004.
[11]Chung, E. S., Nurvitadhi, E., Hoe, J. C., Falsafi, B., & Mai, K., “PROToFLEX: FPGA-accelerated hybrid functional simulator.” In Parallel and Distributed Processing Symposium. pp.1-6. 2007.
[12]Chiou, D., Sunwoo, D., Kim, J., Patil, N. A., Reinhart, W., Johnson, D. E. & Angepat, H. “FPGA -accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators.” In Proceedings of the International Symposium on Microarchitecture. pp. 249-261. 2007.
[13]Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., & Asanović, K. “RAMP gold: an FPGA-based architecture simulator for multiprocessors.” In Proceedings of the Design Automation Conference. pp. 463-468. 2010.
[14]Dall, C., & Nieh, J. “KVM/ARM: the design and implementation of the linux ARM hypervisor.” In ACM SIGARCH Computer Architecture News. pp. 333-348. 2014.
[15]Erdfelt, J., & Drake, D. LibUSB Homepage. http://www.libusb.org.
[16]Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., & Gupta, A. “The SPLASH-2 programs: Characterization and methodological considerations.” In ACM SIGARCH computer architecture news. pp. 24-36. 1995.
[17]https://www.96boards.org/product/rock960/
[18]Russell, R. “ virtio: towards a de-facto standard for virtual I/O devices.” In ACM SIGOPS Operating Systems Review. pp.95-103. 2008.
[19]Chandran, P., Chandra, J., Simon, B. P., & Ravi, D. “Parallelizing SystemC kernel for fast hardware simulation on SMP machines.” In Proceedings of the ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation. pp. 80-87. 2009.
[20]Raghav, S., Marongiu, A., Pinto, C., Atienza, D., Ruggiero, M., & Benini, L. “Full-system simulation of many-core heterogeneous SOCs using GPU and QEMU semihosting.” In Proceedings of the Workshop on General Purpose Processing with Graphics Processing Units. pp. 101-109. 2012.
[21]Pellauer, M., Adler, M., Kinsy, M., Parashar, A., & Emer, J. “HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing.” In High Performance Computer Architecture International Symposium. pp. 406-417. 2011.
[22]Tan, Z., Waterman, A., Cook, H., Bird, S., Asanović, K., & Patterson, D. “A case for FAME: FPGA architecture model execution.” In ACM SIGARCH Computer Architecture News. pp. 290-301. 2010.
[23]Mukherjee, S. S., Reinhardt, S.K., Falsafi, B., Litzkow, M., Hill, M.D., Wood, D.A., Huss-Lederman, S. & Larus, J.R. “Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator.” In IEEE Concurrency. pp.12-20. 2000.
[24]Kivity, A., Kamay, Y., Laor, D., Lublin, U., & Liguori, A. “kvm: the Linux virtual machine monitor.” In Proceedings of the Linux Symposium. pp. 225-230. 2007.
[25]Khaligh, R. S., & Radetzki, M. “Efficient parallel transaction level simulation by exploiting temporal decoupling.” In Analysis, Architectures and Modelling of Embedded Systems. pp. 149-158. 2009.
[26]Matteo Monchiero, Jung Ho Ahn, Ayose Falcón, Daniel Ortega, and Paolo Faraboschi. “How to simulate 1000 cores.” In ACM SIGARCH Computer Architecture News 37, no. 2. pp. 10-19. 2009.
[27]Rodman, N. “ARM FastModels–Virtual Platforms for Embedded Software Development.” In Information Quarterly Magazine. pp. 33-36. 2008.
[28]Lo, Chen Kang, and Ren Song Tsay. “Automatic generation of Cycle Accurate and Cycle Count Accurate transaction level bus models from a formal model.” In Design Automation Conference Asia and South Pacific. pp. 558-563. 2009.
[29]Pasricha, S., Dutt, N., & Ben-Romdhane, M. “Fast exploration of bus-based communication architectures at the CCATB abstraction.” In ACM Transactions on Embedded Computing Systems (TECS), 2008.
[30]Caldari, M., Conti, M., Coppola, M., Curaba, S., Pieralisi, L., & Turchetti, C.). “Transaction-level models for AMBA bus architecture using SystemC 2.0.” In Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum-Volume 2. (p. 20026). 2003.
[31]Radetzki, M., & Khaligh, R. S. “Modelling Alternatives for Cycle Approximate Bus TLMs.” In FDL. pp. 74-79. 2007.
[32]Rosén, J., Neikter, C. F., Eles, P., Peng, Z., Burgio, P., & Benini, L. “Bus access design for combined worst and average case execution time optimization of predictable real-time applications on multiprocessor systems-on-chip.” In Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 291-301. 2011.
[33]Mao-Lin Li, Chen-Kang Lo, Li-Chun Chen, Hong-Jie Huang, Jen-Chieh Yeh, Ren-Song Tsay, “A Formal Full Bus TLM Modeling for Fast and Accurate Contention Analysis,” In the 17th Workshop on Synthesis And System Integration of Mixed Information technologies. 2012.
[34]Hwang, Y., Abdi, S., & Gajski, D. "Cycle-approximate retargetable performance estimation at the transaction level.” In Proceedings of the conference on Design, automation and test in Europe. pp. 3-8. 2008.
[35]Schirrmeister, F., Benchorin, S., & Thoen, F. “Using virtual platforms for pre-silicon software development.” In White paper, Synopsys. 2008
[36]Wang, Z., Liu, R., Chen, Y., Wu, X., Chen, H., Zhang, W., & Zang, B. “COREMU: a scalable and portable parallel full-system emulator.” In ACM SIGPLAN Notices, 46(8). pp. 213-222. 2011.
[37]Crockett, L. H., Elliot, R. A., Enderwitz, M. A., & Stewart, R. W. “The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc.” In Strathclyde Academic Media. 2014.
[38]Bammi, J. R., Kruijtzer, W., Lavagno, L., Harcourt, E., & Lazarescu, M. T. “Software performance estimation strategies in a system-level design tool.” In Proceedings of the eighth international workshop on Hardware/software codesign. pp. 82-86. 2000.
[39]Popek, G. J., & Goldberg, R. P. “Formal requirements for virtualizable third generation architectures.” In Communications of the ACM. pp. 412-421. 1974.
[40]Ding, J. H., Chang, P. C., Hsu, W. C., & Chung, Y. C. “PQEMU: A parallel system emulator based on QEMU.” In Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference. pp. 276-283. 2011.
[41]Hong, D. Y., Hsu, C. C., Yew, P. C., Wu, J. J., Hsu, W. C., Liu, P., & Chung, Y. C. “HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores.” In Proceedings of the Tenth International Symposium on Code Generation and Optimization pp. 104-113. 2012.
[42]Bringmann, O., Ecker, W., Gerstlauer, A., Goyal, A., Mueller-Gritschneder, D., Sasidharan, P., & Singh, S. “The next generation of virtual prototyping: Ultra-fast yet accurate simulation of HW/SW systems.” In Proceedings of the Design, Automation & Test in Europe Conference. pp. 1698-1707. 2015.
[43]Vinco, S., Guarnieri, V., & Fummi, F. “Code Manipulation for Virtual Platform Integration.” In IEEE Transactions on Computers, 65(9), pp. 2694-2708. 2016.
[44]Sandberg, A., Nikoleris, N., Carlson, T. E., Hagersten, E., Kaxiras, S., & Black-Schaffer, D. “Full speed ahead: Detailed architectural simulation at near-native speed.“ In Workload Characterization International Symposium. pp. 183-192. 2015.
[45]R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe, “SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling,” In Proc. International Symposium on Computer Architecture (ISCA), pp. 84–95, 2003.
[46]E. Perelman, G. Hamerly, and B. Calder. “Picking Statistically Valid and Early Simulation Points,” In the International Symposium on Parallel Architecture and Compilation Techniques, 2003.
[47]Sugerman, J., Venkitachalam, G., & Lim, B. H. “Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor, “ In USENIX Annual Technical Conference, General Track. pp. 1-14, 2001.
[48]Lamport, L. “How to make a multiprocessor computer that correctly executes multiprocess program, “ In IEEE transactions on computers, (9), pp. 690-691. 1979
[49]Chen, S. Y., Chen, C. H., & Tsay, R. S. “An activity-sensitive contention delay model for highly efficient deterministic full-system simulations.” In Design, Automation and Test in Europe Conference and Exhibition. pp. 1-6. 2014.
[50]Zukerman, M. “Introduction to queueing theory and stochastic teletraffic models,“ In arXiv preprint arXiv:1307.2968. 2013
[51]Fritts, J. E., Steiling, F. W., & Tucek, J. A. “Mediabench II video: expediting the next generation of video systems research,” In Embedded Processors for Multimedia and Communications II (Vol. 5683). pp. 79-94. 2005.
[52]x265 [Online]. Available: http://x265.org
[53]Fan-Wei Yu, Bo-Han Zeng, Yu-Hung Huang, Hsin-I Wu, Che-Rung Lee and Ren-Song Tsay “A Critical-Section-Level Timing Synchronization Approach for Deterministic Multi-Core Instruction-Set Simulations,” In Design, Automation and Test in Europe Conference and Exhibition. 2013
[54]Jones, M. T. “Linux initial RAM disk (initrd) overview,“ In IBM developerworks, linux, Technical library. 2006
[55]Schirner, G., & Domer, R. “Result-oriented modeling—A novel technique for fast and accurate TLM,“ In IEEE Transactions on computer-aided design of integrated circuits and systems. pp. 1688-1699. 2007
[56]Wu, M. H., Wang, P. C., Fu, C. Y., and Tsay, R. S.,”A Distributed Timing Synchronization Technique for Parallel Multi-Core Instruction-Set Simulation”. In ACM Transactions on Embedded Computing Systems. 2013.
[57]Wu, H. I., Chen, C. K., Lu, T. Y., & Tsay, R. S., “A highly efficient full-system virtual prototype based on virtualization-assisted approach”. In Design, Automation & Test in Europe Conference & Exhibition. 2018
[58]Iqbal, S. M. Z., Liang, Y., & Grahn, H., “Parmibench-an open-source benchmark for embedded multiprocessor systems”. In IEEE Computer Architecture Letters, 9(2), pp. 45-48. 2010
[59]http://cubieboard.org/model/
[60]Karandikar, S., Mao, H., Kim, D., Biancolin, D., Amid, A., Lee, D., & Huang, Q., “FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud”. In Proceedings of the 45th Annual International Symposium on Computer Architecture, pp. 29-42. 2018