利用同步程序之交易層級之平行多核心指令集模擬方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	白憲倫 Pai, Hsien-Lun
論文名稱：	利用同步程序之交易層級之平行多核心指令集模擬方法 A Synchronization-Function-Based TLM Approach for Parallel Multi-Core Instruction-Set Simulations
指導教授：	蔡仁松 Tsay, Ren-Song
口試委員:	謝明得周志遠
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2013
畢業學年度：	101
語文別：	英文
論文頁數：	60
中文關鍵詞：	平行系統模擬、時間同步
外文關鍵詞：	Parallel system simulation, Timing synchronization
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

為了因應日漸普及的多核心的運算平台，一個多核心指令集模擬器是十分重要的。現今我們可以利用平行運算的技術來增加多核心指令集模擬器的速度，但往往會遇到準確性不佳的問題，這是因為多核心之各別模擬速度不同，造成多核心之間之交互作用結果不正確，為了有效解決此問題，我們發明了一個以同步程序為基礎之高效能交易層級方法，可以用於平行多核心平台之指令集模擬，所謂的同步程序即多執行緒程式之間用來協調彼此執行順序之程序，此方法以交易層級模型為基礎，我們將交易邊界設定為每次同步程序之呼叫，此邊界同時也是不同核心之間之交互作用點，因此兩次同步程序呼叫之間的眾多指令可被視為一筆交易，透過一個「阻擋/非阻擋」「發送/接收」之同步程序模型，以及適當的時間同步方法，每筆交易的時間及順序就能正確且有效率的被維護，另一方面，若一筆交易牽涉到多核心之間之溝通，我們將之稱為「公開交易」，若一筆交易沒有牽涉到多核心之間之溝通，我們將之稱為「私下交易」，「公開交易」的時間及次序需要被維持，而「私下交易」的順序則不會影響模擬的準確性，藉由這個特性，此方法的性能又能進一步提升。我們的實驗結果顯示，這個方法可以達到每秒549百萬指令的模擬速度，此為最新「共享參數」方法的三倍快，並且能和「週期精準」方法一樣，得到準確的時間及功能。

We describe a highly efficient transaction-level modeling (TLM) technique for parallel Multi-Core Instruction-Set simulations (MCISS). We set all the calls of synchronization functions—which dictate interactions among applications on different CPU cores—as the transaction boundary. Using a generic blocking/non-blocking send/receive modeling approach for synchronization functions and proper timing synchronization, we can precisely determine the temporal order of each transaction and hence efficiently calculate accurate simulation results. Our experiments show that the proposed approach attains a simulation speed of up to 549 MIPS, which is three times faster than the state-of-art shared-variable-access approach while producing accurate timing and functional results equal to those from cycle-accurate approaches.

   Introduction
   Related Work
1.    TLM Approaches
2.    Timing Synchronization Techniques
   The SF-Based TLM
1.    The SF-Based Timing Synchronization
2.    Function-Level SF Timing Model
2.1.    Four Types of SF Functions
2.2.    The Delay Calculation of SFs
2.3.    The Scheduling of Multiple BR-SFs
3.    The SF-Level Simulation Framework
   Discussions
1.    Applicability of the SF-Based TLM
2.    Optimization of the SF-Based TLM
3.    Modeling Sophisticated SFs
4.    Host-Platform Selection
   Experimental Results
1.    Performance Evaluation
2.    Accuracy and Determinism
3.    Host-Platform Evaluation
4.    Application Studies
4.1.    Debugging
4.2.    Performance Profiling
   Conclusions
   References

                                

[1] D. Burger and T. M. Austin, “The SimpleScalar tool set, version 2.0,” SIGARCH Comput. Archit. News, vol. 25, no. 3, pp. 13–25, Jun. 1997.
[2] M. Rosenblum , “Using the simOS machine simulator to study complex computer systems,” in ACM Trans. Modeling and Computer Simulation, Jan 1997, pp. 78-103.
[3] M.-H. Wu, C.-Y. Fu, P.-C. Wang, and R.-S. Tsay, “An effective synchronization approach for fast and accurate multi-core instruction-set simulation,” in EMSOFT ’09: Proceedings of the seventh ACM international conference on Embedded Software, 2009, p. 197.
[4] M.-H. Wu, P.-C. Wang, C.-Y. Fu, and R.-S. Tsay, “A high-parallelism distributed scheduling mechanism for multi-core instruction-set simulation,” in 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC), 2011, pp. 339-344.
[5] Jason E. Miller et al., “Graphite: A distributed parallel simulator for multicores,” in HPCA ’10: Proceedings of the 16th International Symposium on High-Performance Computer Architecture, Jan. 2010.
[6] G. Zheng, G. Kakulapati, and L. V. Kal´e, “BigSim: A parallel simulator for performance prediction of extremely large parallel machines,” in 18th International Parallel and Distributed Processing Symposium (IPDPS), Apr 2004, p. 78.
[7] S. S. Mukherjee et al., “Wisconsin Wind Tunnel II: A fast, portable parallel architecture simulator,” IEEE Concurrency, vol. 8, no. 4, pp. 12–20,Oct–Dec 2000
[8] M. Monchiero, J. H. Ahn, A. Falc´on, D. Ortega, and P. Faraboschi, “How to simulate 1000 cores,” SIGARCH Comput. Archit. News, vol. 37, no. 2, pp. 10–19, 2009.
[9] J. Chen, M. Annavaram, and M. Dubois, “SlackSim: A Platform for Parallel Simulations of CMPs on CMPs,” SIGARCH Comput. Archit. News, vol. 37, no. 2, pp. 20–29, 2009.
[10] Z. Wang et al., “COREMU: a scalable and portable parallel full-system emulator,” in Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, New York, NY, USA, 2011, pp. 213–222.
[11] D. Jefferson, B. Beckman, F. Wieland, L. Blume, and M. Diloreto. Time warp operating system. In Proceedings of the 11th ACM Symposium on Operating System Principles, pages 77–93, 1987.
[12] S. R. Das, R. Fujimoto, K. S. Panesar, D. Allison, and M. Hybinette. GTW: a time warp system for shared memory multiprocessors. In Winter Simulation Conference, pages 1332–1339, 1994.
[13] R. B. Atitallah, S. Niar, S. Meftali, and J. L. Dekeyser, “An MPSoC Performance Estimation Framework Using Transaction Level Modeling,” in RTCSA’ 07.
[14] J. Cornet, F. Maraninchi, and L. M. Contoz, “A Method for the Efﬁcient Development of Timed and Untimed Transaction-Level Models of Systems-on-Chip,” in DATE’08.
[15] D. Chatterjee, A. DeOrio, and V. Bertacco, “Event-Driven Gate-Level Simulation with GP-GPUs,” in DAC’09.
[16] A. Mello, I. Maia, A. Greiner, and F. Pecheux, “Parallel Simulation of SystemC TLM 2.0 Compliant MPSoC on SMP Workstations” in DATE’10.
[17] H. Zeng, M. Yourst, K. Ghose, and D. Ponomarev, “MPTLsim: A Cycle-Accurate, Full-System Simulator for x86-64 Multicore Architectures with Coherent Caches,” SIGARCH Comput. Archit. News, vol. 37, no. 2, pp. 2–9, 2009.
[18] Silberschatz, A., Galvin, P. B. and Gagne, G., “Operating System Principles,” Seventh Edition, John Wiley & Sons, Inc., 2006.
[19] Downey, A. B. The Little Book of Semaphores, Version 2.1.5, available at http://www.greenteapress.com/semaphores
[20] Snir, Marc; Otto, Steve; Huss-Lederman, Steven; Walker, David; Dongarra, Jack (1995) MPI: The Complete Reference. MIT Press Cambridge, MA, USA. ISBN 0-262-69215-5
[21] B.-H. Zeng and R.-S. Tsay, “An Efficient Hybrid Synchronization Technique for Scalable Multi-Core Instruction Set Simulations,” in ASPDAC ’13.
[22] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 programs: characterization and methodological considerations,” SIGARCH Comput. Archit. News, 1995.
[23] M. Xu, R. Bodik, and M. D. Hill, “A ‘flight data recorder’ for enabling full-system multiprocessor deterministic replay,” in Computer Architecture, 2003.
[24] Cormac Flanagan , Stephen N. Freund, “Type-based race detection for Java,” Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.219-232, June 18-21, 2000, Vancouver, British Columbia, Canada
[25] K Nishihara, T Hiramatsu, “Condition variable to synchronize high level communication between processing threads,” US Patent 6,026,427, 2000
[26] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC benchmark suite: Characterization and architectural implications,” in Proc. of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), October 2008.
[27] Chen-Kang Lo, Li-Chun Chen, Meng-Huan Wu, and Ren-Song Tsay, “Cycle-Count-Accurate Processor Modeling for Fast and Accurate System-Level Simulation,” in DATE’11

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文