簡易檢索 / 詳目顯示

研究生: 簡嘉宏
Chien, Chia-Hung
論文名稱: 應用在基於QEMU的多核心系統平行模擬器的分離式二進制程式碼快取模型
A Separate Code Cache Model for a Parallel Multi-Core System Emulator Based on QEMU
指導教授: 鍾葉青
Chung, Yeh-Ching
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2010
畢業學年度: 99
語文別: 英文
論文頁數: 54
中文關鍵詞: 二進制程式碼快取平行系統模擬器多核心QEMU
外文關鍵詞: Code Cache, parallel, system emulator, Multi-core, QEMU
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • QEMU 是一個應用動態二進制轉譯技術來達到高效模擬的高速處理器模擬器。我們可以應用QEMU,在某個指令集架構的機器上跑為另一個指令集架構設計的各種操作系統和程式。儘管如此,現階段QEMU的設計只適合單核心處理器的模擬。當在多核心機器上執行多執行緒的應用程式時,QEMU只能用循序執行的方式模擬,不能善用多執行緒平行運算的特性以及底層的架構。本論文提出新穎的多執行緒QEMU設計,又名P-QEMU 。它可以在底層多核心機器的架構下,有效地運用多顆虛擬處理器。本設計主要的想法是在QEMU 執行流程中,加入分離式二進制程式碼快取模型。為了評估這項設計,我們在四核心x86 i7的系統上用P-QEMU來模擬ARM11 MPCore的機器,並使用SPLASH-2,PARSEC,還有CoreMark當作實驗評估標準程式。SPLASH-2的實驗結果顯示,P-QEMU的效能平均上比QEMU快3.79倍,並且在四核心x86 i7的系統上,效能具有可擴展的特性。


    QEMU is a fast processor emulator by adopting dynamic binary translation techniques to achieve high emulation efficiency. With QEMU, various operating systems and programs created for one ISA can be run on a machine with a different ISA. However, the current design of QEMU is only suitable for single-core processor emulation. When executing a multi-threaded application on a multi-core machine, QEMU emulates the execution of the application in serial and cannot take advantage of the parallelism available in the application and the underlying hardware. In this work, we propose a novel design of a multi-threaded QEMU, called P-QEMU, which can effectively deploy multiple simulated virtual CPUs on the underlying multi-core machine. The main idea of the design is to add a Separate Code Cache model to the execution flow of QEMU. To evaluate the design, we emulate an ARM11 MPCore by running P-QEMU on a quad-core x86 i7 system and use SPLASH-2, PARSEC, and CoreMark as benchmarks. The experimental results show that the performance of P-QEMU is, on average, 3.79 times faster than that of QEMU and is scalable on the quad-core i7 system for the SPLASH-2 benchmark suite.

    Chapter 1 Introduction Chapter 2 Related Work Chapter 3 Sequential QEMU (S-QEMU) 3.1 Emulation of VCPUs 3.2 Memory Access Emulation 3.3 I/O Interrupt Delivery Model Chapter 4 The Design of P-QEMU 4.1 Emulation thread 4.2 DBT Synchronization Model 4.2.1 Global common resource analysis 4.2.2 DBT Event Synchronization Table before the partitioning 4.3 Global common resource partitioning 4.3.1 Separate TCG 4.3.2 The rest partitioning 4.4 Architectural concerns 4.4.1 Atomic instructions 4.4.2 Interrupt delivery model 4.5 Unified Code Cache versus Separate Code Cache 4.6 Extra Memory Footprint Evaluation Chapter 5 Experimental Results 5.1 Performance Evaluation 5.2 Scalability Evaluation 5.3 Internal Event Profiling 5.3.1 Sync events versus Unsync events 5.3.2 Unified Code Cache versus Separate Code Cache 5.4 Throughput Evaluation 5.5 Flush Overhead Evaluation 5.6 COREMU versus P-QEMU Chapter 6 Conclusions and Future Work References

    1] Binkert, N.L., Derslinki, R.G., Hsu, L.R., Lim, K.T., Saidi, A.G., Reinhardt, S.K., 2006. The M5 Simulator: Modeling Networked Systems. IEEE Micro. 26,4 (July-Aug. 2006), 52-60.
    [2] Bohrer, P., et al., Mambo: A Full System Simulator for the PowerPC Architecture. SIGMETRICS Perf. Eval. Rev. 31, 4 (Mar. 2004), 8-12, 2004
    [3] Cmelik, R.F., and Keppel, D. Shade: a fast instruction set simulator for execution profiling. Technical Report UWCSE-93-06-06, CSE Dept., University of Washington, 1993
    [4] D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proc. of the 27th Annual International Symposium on Computer Architecture, pages 83–94, June 2000.
    [5] D. C. Burger and T. M. Austin. The SimpleScalar tool set, version 2.0. Computer Architecture News, 25(3):13–25, June 1997.
    [6] F. Bellard. QEMU, a fast and portable dynamic translator. In In Proc. of the USENIX Annual Technical Conference, pages 41–46, April 2005.
    [7] K. Hirata and J. Goodacre. Arm mpcore; the streamlined and scalable arm11 processor core. Design Automation Conference, 2007. ASP-DAC ’07. Asia and South Pacific, pages 747–748, Jan. 2007.
    [8] K. P. Lawton, Bochs: A portable PC emulator for Unix, Linux Journal, vol. 1996, no. 29, p. 7, 1996.
    [9] Lantz, R. Fast functional simulation with parallel Embra. In Proc. of the 4th Annual Workshop on Modeling, Benchmarking and Simulation, 2008
    53
    [10] M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta, The SimOS approach, IEEE Parallel and Distributed Technology, vol. 4, no. 3, 1995.
    [11] Peter S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50–58, February 2002.
    [12] S. C. Woo, M. Ohara, E.Torrie, J.P. Singh and A. Gupta, The SPLASH-2 Characterization and Methodological Considerations, 22nd Annual Int. International Symposium on Computer Architecture, June 1995.
    [13] Vijay S. Pai, Parthasarathy Ranganathan, and Sarita V. Adve. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors. In Proc. of the Third Workshop on Computer Architecture Education, February 1997.
    [14] Witchel, E. and Rosenblum R. Embra: fast and flexible machine simulation. In Proc. of the SIGMETRICS ’96 Conference on Measurement and Modeling of Computer Systems. 68-78, 1996
    [15] PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2007 (San Jose, CA, 2007). 23-34.
    [16] C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, pages 72-81, 2008.
    [17] CoreMark. http://www.coremark.org/home.php.
    [18] James E. Smith and Ravi Nair. “Virtual Machines: Versatile Platforms for systems and processes,” Elsevier, 2005
    54
    [19] COREMU. http://sourceforge.net/p/coremu/home/.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE