研究生: |
吳孟寰 Wu, Meng-Huan |
---|---|
論文名稱: |
一種針對多核心指令集模擬之高平行度時間同步技術 A High-Parallelism synchronization Technique for Multi-Core Instruction-Set Simulation |
指導教授: |
蔡仁松
Tsay, Ren-Song |
口試委員: |
蔡仁松
Tsay, Ren-Song 鍾葉青 Chung, Yeh-Ching 許雅三 Hsu, Yarsun 蘇泓萌 Su, Hong-Meng 游本中 Yew, Pen-Chung |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 72 |
中文關鍵詞: | 時間同步 、分散式同步 、多核心模擬 、指令集模擬 |
外文關鍵詞: | Timing Synchronization, Distributed Synchronization, Multi-Core Simulation, Instruction-Set Simulation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著多核心系統成為當今設計的主流,設計者須要相對應的多核心指令集模擬器來幫助系統開發。多核心模擬器可由數個獨立的單核心模擬組成。理想上,我們可以將多核心模擬平行化運作以加快模擬速度。但傳統集中式的時間同步方法會大幅限制其平行度,讓速度難以提升。為解決此一問題,本研究提出一種新的分散式同步方法,藉此增加多核心模擬的平行程度,進而提升整體的模擬速度。利用分散式的排班法,讓每一個模擬器各自獨立運作,使得同一時間內可以有更多的模擬器一起執行。此外,我們的方法更預測未來同步點可能的發生時間,並根據預測的結果來有效地縮減每次同步所花費的等待時間。因此可達到更高的平行度,進而提升整體多核心模擬的效能。根據我們的實驗結果,在同樣共享記憶體的同步方式下,我們所提出的分散式技術隨著模擬核心的增加,相較於傳統集中式排班法可以提升至九到二十倍的模擬效能。
As multi-core architecture has become the mainstream, the corresponding multi-core instruction-set simulation (MCISS) is also needed to aid system development. Ideally, we may run a MCISS in parallel to enhance the simulation speed. However, the conventional centralized timing synchronization mechanism would greatly constrain the parallelism of a MCISS so that the simulation speed is bounded. To resolve this issue, we propose a new distributed timing synchroni-zation technique which allows higher parallelism for a high-speed MCISS. By allowing each ISS to schedule with others independently, more ISSs can run in parallel. Furthermore, the proposed distributed technique predicts the possible time of each ISS’s future sync point (i.e., the time point for synchronization). Based on this prediction, the time spent on synchronization is effectively reduced, thereby leading to higher parallelism for better simulation performance of a MCISS. The experimental results show that for the same shared memory based synchronization, our distributed technique improve the simulation performance by 9 to 20 times over the conventional centralized approach as the number of cores increases.
[1] BURGER, D. AND AUSTIN, T. M. 1997., The SimpleScalar tool set, version 2.0. SIGARCH Comput. Archit. News 25, (3), 13-25.
[2] ZHU, J. AND GAJSKI, D. D., 1999. A retargetable, ultra-fast instruction set simulator. In Proceedings of the conference on Design, automation and test in Europe (DATE), 62-69.
[3] ZHU, J. AND GAJSKI, D. D., 2002. An ultra-fast instruction set simulator. IEEE Trans. Very Large Scale Integr. Syst. (TVLSI) 10 (3), 363-373.
[4] BURTSCHER, M. AND GANUSOV, I., 2004. Automatic synthesis of high-speed processor simulators. In Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 55-66.
[5] CMELIK, B.AND KEPPEL, D., 1994. Shade: a fast instruction-set simulator for execution profiling. In Proceedings of the ACM SIGMETRICS conference on Measurement and modeling of computer systems (SIGMETRICS), 128-137.
[6] WITCHEL, E., ROSENBLUM, M., 1996. Embra: fast and flexible machine simulation. In Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (SIGMETRICS), 68-79.
[7] MAGNUSSON, P., CHRISTENSSON, M., ESKILSON, J., FORSGREN, D., HÅLLBERG, G., HÖGBERG, J., LARSSON, F., MOESTEDT, A., AND WERNER, B., 2002. Simics: a full system simulation platform, Computer, 35 (2), 50-58.
[8] NOHL, A., BRAUN, G., SCHLIEBUSCH, O., LEUPERS, R., MEYR, H., AND HOFFMANN, A., 2002. A universal technique for fast and flexible instruction-set architecture simulation. In Proceedings of the 39th conference on Design automation (DAC), 22-27.
[9] BRAUN, G., NOHL, A., HOFFMANN, A., SCHLIEBUSCH, O., LEUPERS, R., AND MEYR, H., 2004. A universal technique for fast and flexible instruction-set architecture simulation. IEEE Trans. on CAD of Integr. Circ. and Syst. (TCAD), 23 (12), 1625-1639.
[10] RESHADI, M., MISHRA, P., AND DUTT, N., 2003. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In Proceedings of the 40th Annual Design Automation Conference (DAC), 758-763.
[11] RESHADI, M., DUTT, N., AND MISHRA, P., 2006. A retargetable framework for instruction-set architecture simulation. ACM Trans. Embed. Comput. Syst. (TECS) 5 (2), 431-452.
[12] BELLARD, F., 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference, 41-46.
[13] QIN, W., D'ERRICO, J., AND ZHU, X., 2006. A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation. In Proceedings of the 4th international Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 193-198.
[14] SITES, R., CHERNOFF, A., KIRK, M., MARKS, M., AND ROBINSON, S., 1993. "Binary translation," in Commun. ACM, 36 (2), 69-81.
[15] SCHNERR, J., BRINGMANN, O., AND ROSENSTIEL, W., 2005. Cycle accurate binary translation for simulation acceleration in rapid prototyping of SoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 792-797.
[16] MUKHERJEE, S., REINHARDT, S., FALSAFI, B., LITZKOW, M., HILL, M., WOOD, D., HUSS-LEDERMAN, S., AND LARUS, J., 2000. Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator. Concurrency, IEEE, 8 (4), 12-20.
[17] JUNG, J., YOO, S., AND CHOI, K. 2001., Performance improvement of multi-processor systems cosimulation based on SW analysis. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 749-753.
[18] KIM, D., YI, Y., AND HA, S., 2005. Trace-driven HW/SW cosimulation using virtual synchronization technique. In Proceedings of the 42nd Annual Design Automation Conference (DAC), 345-348.
[19] HENNESSY, J. AND PATTERSON, D., 2007. Computer Architecture: a quantitative approach, 4th ed. Morgan Kaufmann Publishers.
[20] WOO, S. C., OHARA, M., TORRIE, E., SINGH, J. P., AND GUPTA, A., 1995. The splash-2 programs: characterization and methodological considerations. In Proceedings of the 22nd international symposium on Computer architecture (ISCA), 24-36.
[21] JEFFERSON, D., 1985. Virtual time. ACM Trans. Program. Lang. Syst. (TOPLAS), 7 (3), 404-425.
[22] LAMPORT, L., 1978. Time, clocks, and the ordering of events in a distributed system, Communications of the ACM, 21 (7), 558-565.
[23] ADVE, S. AND GHARACHORLOO, K., 1996. Shared memory consistency models: a tutorial. IEEE Computer, 29 (12), 66-76.
[24] GRÖTKER, T., LIAO, S., MARTIN, G., AND SWAN, S., 2002. System Design with SystemC, Kluwer Academic Publishers.
[25] RIGHTER, R. AND WALRAND, J., 1981. Distributed simulation of discrete event systems. In Proceedings of the IEEE, 77 (1), 99-113.
[26] MISRA, J., 1986. Distributed discrete-event simulation. ACM Comput. Surv., 18 (1), 39-65.
[27] YOO, S. AND CHOI, K., 1998. Optimistic distributed timed cosimulation based on thread simulation model. In Proceedings of the 6th international Workshop on Hardware/Software Codesign (CODES/CASHE), 71-75.
[28] NICOL, D., 1993. The cost of conservative synchronization in parallel discrete event simulations. J. ACM, 40 (2), 304-333.
[29] SCHNERR, J., BRINGMANN, O., VIEHL, A., AND ROSENSTIEL, W., 2008. High-performance timing simulation of embedded software. In Proceedings of the 45th Annual Design Automation Conference (DAC), 290-295.
[30] LIN, K., LO, C., AND TSAY, R., 2010. Source-level timing annotation for fast and accurate TLM computation model generation. In Proceedings of the 15th Asia and South Pacific Design Automation Conference (ASPDAC), 235–240.
[31] FUJIMOTO, R., 2001. Parallel simulation: parallel and distributed simulation systems. In Proceedings of the conference on Winter simulation (WSC), 147-157.
[32] CHEN, J., ANNAVARAM, M., AND DUBOIS, M., 2009. Exploiting simulation slack to improve parallel simulation Speed. In Proceedings of the international Conference on Parallel Processing (ICPP), 371-378.
[33] WU, M., FU, C., WANG, P., AND TSAY, R., 2009. An effective synchronization approach for fast and accurate multi-core instruction-set simulation. In Proceedings of the Seventh ACM international Conference on Embedded Software (EMSOFT), 197-204.
[34] WU, M., LEE, W., CHUANG, C., AND TSAY, R., 2010. Automatic Generation of Software TLM in Multiple Abstraction Layers for Efficient HW/SW Co-simulation. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 1177-1182.
[35] LEVINE, J., 1999. Linkers and Loaders. Morgan Kaufmann.
[36] SHEN, H. AND PETROT, F., 2010. A flexible hybrid simulation platform targeting multiple configurable processors SoC. In Proceedings of the 15th Asia and South Pacific Design Automation Conference (ASPDAC), 155-160.
[37] ENGBLOM J., 2009. Virtutech white paper: SIMICS accelerator. Available at http://www.virtutech.com/whitepapers/accelerator.html.
[38] ANDES TECHNOLOGY CORP., 2008. AndeStarTM instruction set architecture manual/Andes programming guide. Available at http://www.andestech.com/p4-5.htm.