研究生: |
邱奕楠 Chiu, Yih-Nan |
---|---|
論文名稱: |
基於晶片網路之單晶片多處理器的平行模擬器 TileSim+: A Parallel Trace-driven Simulator for NoC-based Cache-coherent CMP on TILERA 64 |
指導教授: |
金仲達
King, Chung-Ta |
口試委員: |
黃婷婷
Hwang, Ting-Ting 徐慰中 Hsu, Wei-Chung |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 38 |
中文關鍵詞: | 平行模擬器 、晶片網路 |
外文關鍵詞: | TileSim+, parallel trace-driven, NoC-based, Cache-coherent, TILERA64 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
單晶片多處理器已經成為目前處理器晶片的基準,追蹤檔驅動的模擬器因為其對
於架構設計空間探鎖的速度較快,在設計單晶片多處理器系統上此技術被廣泛的
使用。由於身邊擁有高達64 核的平行多核心電腦,像是特亞拉的特亞拉64,平
行處理追蹤驅動模擬以達成更快速的架構設計探鎖也變得不再困難。但是現今探
討平行處理追蹤驅動模擬的論文確非常的少。
此篇論文探討的是以晶片網路為基底的多核心晶片上快取一致性之平行追蹤驅
動模擬器的設計和實作,此模擬器我們將其取名為特亞模擬器+。特亞模擬器+
藉由提供了精準度為周期的晶片網路模組和精準度為周期數的快取模擬模組不
僅可對記憶體存取時間做精確的評估還可對基於晶片網路上的單晶片多處理器
上快取設計空間做探索。而最重要的是,特亞模擬器+特亞拉的特亞拉64加速
下,在不失去擴充性的狀況下提升了追蹤驅動模擬的速度。在特亞拉上完成的對
特亞模擬器+的實驗評估顯示了此模擬器不僅可為特定的測試程式產生正確的
模擬結果也可達成對比於順序模擬器下良好的速度提升。我們也會在論文中教讀
者如何藉由特亞模擬器+來對快取設計空間的評估。
Chip Multiprocessor(CMP) is becoming the norm of processor chips. To design CMP, tracedriven
simulation has been a commonly used technique for fast exploration of architecture
design space. With the availability of parallel computers, such as Tilera’s Tile64, parallel
trace-driven simulation for faster architecture evaluation is becoming possible. However,
there are very few papers discussing parallel trace-driven simulation.
This thesis discusses the design and implementation of a parallel trace-driven simulator
for NoC-based cache coherence CMP named TileSim+, TileSim+ provides cycle-accurate
network model and cycle-count accurate cache simulation model, which allows the precise
evaluation of memory access delay but exploration of cache design space for NoC-based
CMP. Most importantly, accelerated with machine such as Tilera’s Tile64, TileSim+ speeds
up trace-driven simulation with good scalability. The experimental evaluation of TileSim+
on TILE64 shows that it can obtain correct simulation results for the tested benchmark
programs and achieve good speedup over sequential simulator. We also demonstrate how to
use TileSim+ to evaluate CMP cache designs.
[1] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg,
A. Moestedt F. Larsson, and B. Werner, “Simics: A full system simulation platform”,
in Proc. Computer, 2002, pp. 50–58.
[2] Milo M.K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu,
Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood, “Multifacet’s
general execution-driven multiprocessor simulator (gems)toolset”, in ACM SIGARCH
Computer Architecture News, 2005.
[3] N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt,
“The m5 simulator: Modeling networked systems”, in Proc. IEEE Micro, 2006.
[4] Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Pengju Ren, Omer Khan, and Srinivas
Devadas, “Darsim: a parallel cycle-level noc simulator”, in Proc. Modeling, Benchmarking
and Simulation, 2010.
[5] Miller Jason E, Harshad Kasture, George Kurian, Charles Gruenwald III, Nathan Beckmann,
Christopher Celio, Jonathan Eastep, and Anant Agarwal, “Graphite: A distributed
parallel simulator for multicores”, in Proc. Proceedings of the International
Symposium on High Performance Computer Architecture, 2010.
[6] Yoshi Shih-Chieh Huang, Jin Ouyang, Yuan-Ying Chang, Yuan Xie, and Chung-Ta
King, “Tilesim: A scalable and parallel simulator for noc-centric research”, in Proc.
submitted to TVLSI (under review), 2012.
37
[7] Nicholas Nethercote and Julian Seward, “Valgrind: A framework for heavyweight dynamic
binary instrumentation”, in Proc. Proceedings of Programming Language Design
and Implementation(PLDI), 2007.
[8] S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif,
L. Bao, J. Brown, et al., “Tile64 processor: A 64-core soc with mesh interconnect”, in
Proc. ISSCC, 2008, pp. 88–598.
[9] D. Wentzlaff, “On-chip interconnection architecture of the tile processor”, in IEEE
Micro, 2007, pp. 15–31.
[10] J. Jaehyuk Huh, C. Changkyu Kim, H. Shafi, L. Lixin Zhang, D. Burger, and S.W.
Keckler, “A nuca substrate for flexible cmp cache sharing”, in Proc. 19th Intl Conf.
Supercomputing (ICS), 2007, pp. 31–40.
[11] Changkyu Kim, Doug Burger, and W. Keckler Stephen, “An adaptive, non-uniform
cache structure for wire-delay dominated on-chip caches”, in Proceedings of the 10th
International Conference on Architectural Support for Programming Languages and Operating
Systems (ASPLOS), 2002, pp. 211–222.
[12] Aamer Jaleel, Robert S. Cohn, Chi-Keung Luk, and Bruce Jacob, “Cmp$im: A pinbased
on-the-fly multi-core cache simulator”, in Proc. Modeling, Benchmarking and
Simulation, 2008.
[13] A. Jaleel, R. Cohn, C. K. Luk, and B. Jacob, “Cmp$im: A binary instrumentation
approach to modeling memory behavior of workloads on cmps.”, in Technical Report -
UMD-SCA-2006-01.
[14] C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J.
Reddi, and K. Hazelwood, “Pin: Building customized program analysis tools with
dynamic instrumentation”, in Proc. Proceedings of Programming Language Design and
Implementation(PLDI), 2005.