研究生: |
張鈞皓 |
---|---|
論文名稱: |
高效率同步程序層級之平行多核心指令集模擬方法 A Highly Efficient Approach for Synchronization-Function-Level Parallel Multi-Core Instruction-Set Simulations |
指導教授: | 麥偉基 |
口試委員: |
蔡仁松
張豐願 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 54 |
中文關鍵詞: | 平行多核心指令集模擬 |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在這篇論文中,我們藉由在每次執行同步程序之前進行時間同步,提出了一個高效率且準確的平行多核心指令集模擬方法。所謂的同步程序即多執行緒程式之間用來協調彼此執行順序之程序。透過提出一個涵蓋所有同步程序的「阻擋/非阻擋」「 發送/接收」之通用同步程序模型,我們大幅提昇了目前最新的關鍵區間層級模擬方法的應用性。為了減少同步所需的時間,我們也引入了例如綜合時間同步技術之類的最佳化方法,並提供了一個分析工具,可用來選擇模擬速度最快的家平台。實驗結果顯示我們提出的方法可以達到每秒272百萬指令的模擬速度,平均比共享變數層級模擬方法快3倍,並能得到和周期精準模擬方法相同的準確時間及功能。
In this paper, we propose a highly efficient and accurate Multi-Core Instruction-Set parallel simulation (MCISS) approach by synchronizing timing before each synchronization function call. We greatly improve the applicability of the state-of-art critical-section-level simulation approach with a generic blocking/non-blocking send/receive model covering all types of synchronization functions. To reduce synchronization overhead, we also introduce optimization methods such as hybrid scheduling technique and provide an analysis tool for host platform choice for best simulation performance. Experiments show that the proposed approach attains a simulation speed of up to 272 MIPS, which is 3 times faster than the shared-variable-level approach while producing accurate timing and functional results equal to those from cycle-accurate approaches.
[1] Burger, Doug, and Todd M. Austin. "The SimpleScalar tool set, version 2.0." ACM SIGARCH Computer Architecture News 25, no. 3 (1997): 13-25.
[2] Rosenblum, Mendel, Edouard Bugnion, Scott Devine, and Stephen A. Herrod. "Using the SimOS machine simulator to study complex computer systems." ACM Transactions on Modeling and Computer Simulation (TOMACS) 7, no. 1 (1997): 78-103.
[3] Wu, Meng-Huan, Cheng-Yang Fu, Peng-Chih Wang, and Ren-Song Tsay. "An effective synchronization approach for fast and accurate multi-core instruction-set simulation." In Proceedings of the seventh ACM international conference on Embedded software, pp. 197-204. ACM, 2009.
[4] Wu, Meng-Huan, Peng-Chih Wang, Cheng-Yang Fu, and Ren-Song Tsay. "A high-parallelism distributed scheduling mechanism for multi-core instruction-set simulation." In Design Automation Conference (DAC), 2011 48th ACM/EDAC/IEEE, pp. 339-344. IEEE, 2011.
[5] Miller, Jason E., Harshad Kasture, George Kurian, Charles Gruenwald, Nathan Beckmann, Christopher Celio, Jonathan Eastep, and Anant Agarwal. "Graphite: A distributed parallel simulator for multicores." In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on, pp. 1-12. IEEE, 2010.
[6] Zheng, Gengbin, Gunavardhan Kakulapati, and Laxmikant V. Kalé. "Bigsim: A parallel simulator for performance prediction of extremely large parallel machines." In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, p. 78. IEEE, 2004.
[7] Mukherjee, Shubhendu S., Steven K. Reinhardt, Babak Falsafi, Mike Litzkow, Mark D. Hill, David A. Wood, Steven Huss-Lederman, and James R. Larus. "Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator." Concurrency, IEEE8, no. 4 (2000): 12-20.
[8] Monchiero, Matteo, Jung Ho Ahn, Ayose Falcón, Daniel Ortega, and Paolo Faraboschi. "How to simulate 1000 cores." ACM SIGARCH Computer Architecture News 37, no. 2 (2009): 10-19.
[9] Chen, Jianwei, Murali Annavaram, and Michel Dubois. "SlackSim: a platform for parallel simulations of CMPs on CMPs." ACM SIGARCH Computer Architecture News 37, no. 2 (2009): 20-29.
[10] Wang, Zhaoguo, Ran Liu, Yufei Chen, Xi Wu, Haibo Chen, Weihua Zhang, and Binyu Zang. "COREMU: a scalable and portable parallel full-system emulator." InProceedings of the 16th ACM symposium on Principles and practice of parallel programming, pp. 213-222. ACM, 2011.
[11] Jefferson, David, Brian Beckman, Frederick Wieland, Leo Blume, and Mike DiLoreto. Time warp operating system. Vol. 21, no. 5. ACM, 1987.
[12] Das, Samir, Richard Fujimoto, Kiran Panesar, Don Allison, and Maria Hybinette. "GTW: a time warp system for shared memory multiprocessors." In Simulation Conference Proceedings, 1994. Winter, pp. 1332-1339. IEEE, 1994.
[13] Ben Atitallah, R., Smail Niar, Samy Meftali, and J-L. Dekeyser. "An MPSoC performance estimation framework using transaction level modeling." InEmbedded and Real-Time Computing Systems and Applications, 2007. RTCSA 2007. 13th IEEE International Conference on, pp. 525-533. IEEE, 2007.
[14] Cornet, Jérôme, Florence Maraninchi, and Laurent Maillet-Contoz. "A method for the efficient development of timed and untimed transaction-level models of systems-on-chip." In Design, Automation and Test in Europe, 2008. DATE'08, pp. 9-14. IEEE, 2008.
[15] Chatterjee, Debapriya, Andrew DeOrio, and Valeria Bertacco. "Event-driven gate-level simulation with GP-GPUs." In Proceedings of the 46th Annual Design Automation Conference, pp. 557-562. ACM, 2009.
[16] Mello, Aline, Isaac Maia, Alain Greiner, and Francois Pecheux. "Parallel simulation of SystemC TLM 2.0 compliant MPSoC on SMP workstations." In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010, pp. 606-609. IEEE, 2010.
[17] Zeng, Hui, Matt Yourst, Kanad Ghose, and Dmitry Ponomarev. "MPTLsim: a cycle-accurate, full-system simulator for x86-64 multicore architectures with coherent caches." ACM SIGARCH Computer Architecture News 37, no. 2 (2009): 2-9.
[18] Silberschatz, Abraham, Peter Baer Galvin, and Greg Gagne. Operating System Principles. John Wiley & Sons, 2006.
[19] Downey, Allen B. The little book of semaphores. Vol. 2, no. 2. Green Tea Press, 2005.
[20] Snir, Marc, Steve W. Otto, David W. Walker, Jack Dongarra, and Steven Huss-Lederman. MPI: the complete reference. MIT press, 1995.
[21] Bo, Hang-Zeng, Reng, Song-Tsay and Ting, Chi-Wang. "An efficient hybrid synchronization technique for scalable multi-core instruction set simulations." In the 18th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 588-593. IEEE, 2013.
[22] Woo, Steven Cameron, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. "The SPLASH-2 programs: Characterization and methodological considerations." In ACM SIGARCH Computer Architecture News, vol. 23, no. 2, pp. 24-36. ACM, 1995.
[23] Xu, Min, Rastislav Bodik, and Mark D. Hill. "A flight data recorder for enabling full-system multiprocessor deterministic replay." In ACM SIGARCH Computer Architecture News, vol. 31, no. 2, pp. 122-135. ACM, 2003.
[24] Flanagan, Cormac, and Stephen N. Freund. "Type-based race detection for Java." In ACM SIGPLAN Notices, vol. 35, no. 5, pp. 219-232. ACM, 2000.
[25] Nishihara, Kazunori, and Takaai Hiramatsu. "Condition variable to synchronize high level communication between processing threads." U.S. Patent 6,026,427, issued February 15, 2000.
[26] Bienia, Christian, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. "The PARSEC benchmark suite: characterization and architectural implications." In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pp. 72-81. ACM, 2008.
[27] Lo, Chen-Kang, Li-Chun Chen, Meng-Huan Wu, and Ren-Song Tsay. "Cycle-count-accurate processor modeling for fast and accurate system-level simulation." InDesign, Automation & Test in Europe Conference & Exhibition (DATE), 2011, pp. 1-6. IEEE, 2011.
[28] Bellard, Fabrice. "QEMU, a Fast and Portable Dynamic Translator." In USENIX Annual Technical Conference, FREENIX Track, pp. 41-46. 2005.
[29] Yu, Fan-Wei, Bo-Han Zeng, Yu-Hung Huang, Hsin-I. Wu, Che-Rung Lee, and Ren-Song Tsay. "A Critical-Section-Level timing synchronization approach for deterministic multi-core instruction-set simulations." In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, pp. 643-648. IEEE, 2013.
[30] Olszewski, Marek, Jason Ansel, and Saman Amarasinghe. "Kendo: efficient deterministic multithreading in software." In ACM Sigplan Notices, vol. 44, no. 3, pp. 97-108. ACM, 2009.
[31] Devietti, Joseph, Brandon Lucia, Luis Ceze, and Mark Oskin. "DMP: deterministic shared memory multiprocessing." In ACM Sigplan Notices, vol. 44, no. 3, pp. 85-96. ACM, 2009.
[32] Bergan, Tom, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. "CoreDet: a compiler and runtime system for deterministic multithreaded execution." In ACM SIGARCH Computer Architecture News, vol. 38, no. 1, pp. 53-64. ACM, 2010.
[33] Devietti, Joseph, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. "RCDC: a relaxed consistency deterministic computer." ACM SIGPLAN Notices46, no. 3 (2011): 67-78.