以應用導向之晶片網路設計及其模擬機制｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃世傑 Huang, Shih-Chieh
論文名稱：	以應用導向之晶片網路設計及其模擬機制 Application-Driven Design and Evaluation of Network-on-Chip for Many-Core Systems
指導教授：	金仲達
口試委員:	楊佳玲陳添福郭峻因蔡仁松鍾葉青金仲達
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2013
畢業學年度：	101
語文別：	英文
論文頁數：	84
中文關鍵詞：	晶片網路、多核心、模擬
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

晶片網路已經成為多核心系統主要的連結架構，故對於晶片網路更深入的瞭解與探索其特性在近期的研究已成為一個重要的主題。在此論文中，我們以應用程式的角度切入，深入瞭解多核心應用程式之特性，探討這些特性與晶片網路之間的關係。利用應用程式之規律行為的特性（如迴圈，空間與時間的重複存取特性），設計出所對應之適合的晶片網路。此論文之主要貢獻為：1. 利用應用程式之特性，我們可以幫助晶片網路做更好的省電機制 2. 利用應用程式特性，設計出可以快速評估晶片網路效能之模擬機制。相較於過去的方法，我們的實驗證明以應用程式的角度切入，可以帶來比單純以硬體角度切入更好的效能。

As the number of cores in a chip increases continuously, Network-on-Chip (NoC) becomes the primary choice for interconnecting the nodes, so exploring the design of NoC is hence important for fully utilizing the power of the nodes. Unfortunately, when designing the interconnect fabric, the implications coming from the applications running on top of the NoC are usually ignored and wasted. Instead, they mostly rely on fixed parameters decided in very early-design stage. Even if dynamic adaptations at runtime are designed, they rely only on simple counters available in the hardware. Therefore, the NoC design is usually conservative and even does not fit the target applications in the end.

In fact, the information coming from the application can dramatically improve the design of NoC. In this thesis, we named this perspective as \emph{application-driven NoC design}. Based on the idea of application-driven design, two important topics in the design of NoC are well-researched and solved from the proposed perspective in this thesis. First, we exploited the traffic patterns of the applications running on top of the on-chip networks for saving the power consumption of on-chip network. In previous works, one promising solution is to dynamically adjust the working frequencies/voltages (DVFS) of the switches as well as the links between switches in the NoC to match the traffic flows. The question is when to adjust and by how much. Based on the idea of application-driven design, we propose a hardware mechanism to proactively adjust the frequencies/voltages of switches and/or links in NoC by predicting the application runtime traffic. Different from the previous approaches, our novel approach predicts the traffic from the implicit program behaviors in the network, i.e., loop, repeitive transmissions, which make the DVFS strategy more proactive and accurate. Following up the idea of application-driven design, we proposed a causality-aware simulation methodology for more accurate evaluation. By observing the application behaviors, the causalities between each pair of the nodes in the system can be extracted and embedded into the trace logs, and therefore the accuracy of simulation can be largely improved compared with the conventional trace-driven simulation. Moreover, we further take advantages of the causalities to compress the huge trace logs and design a corresponding traffic generator, named as Attackboard. In Attackboard, it reflects the interactions between each pair of processor and router recorded as several causality-aware small tables, and then it dynamically generates traffic injections to NoC on-the-fly to simulate the real application behaviors at runtime, whereas the traffic is similar to that generated by the original trace but with much lower space overhead.

Introduction 3
Application-Driven End-to-End Traffic Predictions for Low Power NoC
Design 8
1 A Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 System Design of ATPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Basic Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Design of prediction tables . . . . . . . . . . . . . . . . . . . 18
3.2 Design of Selector for Predictors . . . . . . . . . . . . . . . . . . . . . 19
4 Link-DVFS by ATPT-Based Predictor . . . . . . . . . . . . . . . . . . . . . 21
4.1 Superposing Link Utilization . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Strategies for Link-DVFS . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Implementation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1 Table Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1.1 Reducing the size of the 1st-level table . . . . . . . . . . . . 25
5.1.2 Sharing the 2nd-level table . . . . . . . . . . . . . . . . . . 26
5.1.3 Quantizing the values in the 2nd-level table . . . . . . . . . 27
5.1.4 Entry replacement in the 2nd-level table . . . . . . . . . . . 27
5.2 Aggregation for Traffic Predictions . . . . . . . . . . . . . . . . . . . 27
5.2.1 Na¨ıve broadcasting . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 Control network . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.3 Packet piggybacking . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Area Occupancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Energy Consumption of ATPT-Based Predictors . . . . . . . . . . . . 30
5.4.1 Energy consumption of ATPTs . . . . . . . . . . . . . . . . 30
5.4.2 Energy consumption of DVFS switching . . . . . . . . . . . 30
6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.1 Evaluation for ATPT-Based Predictors . . . . . . . . . . . . . . . . . 32
6.1.1 Pattern learning and misprediction recovery . . . . . . . . . 37
6.2 A Case Study for DVFS on Links . . . . . . . . . . . . . . . . . . . . 39
6.2.1 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2.3 The accuracy of link voltage and frequency level adjustment 41
6.2.4 Communication links power reduction results . . . . . . . . 42
7 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Attackboard: A Novel Dependency-Aware Traffic Generator for Exploring
NoC Design Space 46
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Structure of Attackboard’s attackboard . . . . . . . . . . . . . . . . . 53
3.3 Packets Dependencies Extraction . . . . . . . . . . . . . . . . . . . . 54
3.4 Simulation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Storage space overhead . . . . . . . . . . . . . . . . . . . . . 64
4.3.2 Traffic behavior . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.1 NoC Design Space Exploration . . . . . . . . . . . . . . . . . . . . . 68
5.2 Parallel Object Detection . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Conclusion 76
                                

[1] “Arteris.” [Online]. Available: http://www.arteris.com/
[2] S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif,
L. Bao, J. Brown, M. Mattina, C.-C. Miao, C. Ramey, D. Wentzlaff, W. Anderson,
E. Berger, N. Fairbanks, D. Khan, F. Montenegro, J. Stickney, and J. Zook, “TILE64
- processor: A 64-core soc with mesh interconnect,” in Proceedings of International
Conference on Solid-State Circuits, 2008. ISSCC 2008., feb. 2008, pp. 88 –598.
[3] M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman,
P. Johnson, J.-W. Lee, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen,
M. Frank, S. Amarasinghe, and A. Agarwal, “The raw microprocessor: a computational
fabric for software circuits and general-purpose programs,” vol. 22, no. 2, pp. 25–35,
Mar. 2002.
[4] Z. Yu, M. Meeuwsen, R. Apperson, O. Sattari, M. Lai, J. Webb, E. Work, D. Truong,
T. Mohsenin, and B. Baas, “Asap: An asynchronous array of simple processors,” IEEE
Journal of Solid State Circuits, vol. 43, no. 3, p. 695, 2008.
[5] S. Mukherjee, P. Bannon, S. Lang, A. Spink, and D. Webb, “The alpha 21364 network
architecture,” IEEE micro, pp. 26–35, 2002.
[6] A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi, “Orion 2.0: A fast and accurate noc power
and area model for early-stage design space exploration,” in Proc. DATE ’09. Design,
Automation. Test in Europe Conference. Exhibition, Apr. 20–24, 2009, pp. 423–428.
[7] A. K. Mishra, R. Das, S. Eachempati, R. Iyer, N. Vijaykrishnan, and C. R. Das, “A
case for dynamic frequency tuning in on-chip networks,” in Proceedings of the 42nd
International Symposium on Microarchitecture, ser. MICRO 42. New York, NY, USA:
ACM, 2009, pp. 292–303.
[8] A. K. Mishra, A. Yanamandra, R. Das, S. Eachempati, R. Iyer, N. Vijaykrishnan, and
C. R. Das, “Raft: A router architecture with frequency tuning for on-chip networks,”
J. Parallel Distrib. Comput., vol. 71, pp. 625–640, May 2011.
[9] U. Y. Ogras and R. Marculescu, “Prediction-based flow control for network-on-chip
traffic,” in Proc. 43rd ACM/IEEE Design Automation Conference, 2006, pp. 839–844.
[10] G. Chen, F. Li, M. Kandemir, and M. Irwin, “Reducing noc energy consumption through
compiler-directed channel voltage scaling,” in Proceedings of the 2006 ACM SIGPLAN
conference on Programming language design and implementation. ACM New York,
NY, USA, 2006, pp. 193–203.
[11] C. Isci, G. Contreras, and M. Martonosi, “Live, runtime phase monitoring and prediction
on real systems with application to dynamic power management,” in MICRO 39:
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture.
Washington, DC, USA: IEEE Computer Society, 2006, pp. 359–370.
[12] Y. S.-C. Huang, K. C.-K. Chou, C.-T. King, and S.-Y. Tseng, “Ntpt: On the endto-
end traffic prediction in the on-chip networks,” in Proceedings of the 47th Design
Automation Conference (DAC), 2010.
[13] F. Worm, P. Ienne, P. Thiran, and G. De Micheli, “An adaptive low-power transmission
scheme for on-chip networks,” in Proceedings of the 15th international symposium on
System Synthesis. ACM, 2002, p. 100.
[14] W. Kim, M. Gupta, G. Wei, and D. Brooks, “System level analysis of fast, per-core
dvfs using on-chip switching regulators,” in High Performance Computer Architecture,
2008. HPCA 2008. IEEE 14th International Symposium on. Ieee, 2008, pp. 123–134.
[15] L.-S. Peh and W. Dally, “Flit-reservation flow control,” in High-Performance Computer
Architecture, 2000. HPCA-6. Proceedings. Sixth International Symposium on, 2000, pp.
73 –84.
[16] G. He, A. Zhai, and P. Yew, “Ex-mon: An architectural framework for dynamic program
monitoring on multicore processors,” in The Twelfth Workshop on Interaction between
Compilers and Computer Architectures, Interact-12, 2008.
[17] A. Sharifi, H. Zhao, and M. Kandemir, “Feedback control for providing qos in noc based
multicores,” in Proc. Design, Automation, and Test in Europe, 2010.
[18] J. Ouyang and Y. Xie, “Loft: A high performance network-on-chip providing qualityof-
service support,” in Proceedings of the 2010 43rd Annual IEEE/ACM International
Symposium on Microarchitecture, ser. MICRO ’43. Washington, DC, USA: IEEE Computer
Society, 2010, pp. 409–420.
[19] Y. S.-C. Huang, K. C.-K. Chou, and C.-T. King, “Implementation details of atpt-based
predictions for low power noc design,” Tech. Rep., 2012.
[20] N. Muralimanohar, R. Balasubramonian, and N. Jouppi, “Optimizing nuca organizations
and wiring alternatives for large caches with cacti 6.0,” in Proceedings of the 40th
Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer
Society, 2007, pp. 3–14.
[21] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 programs:
characterization and methodological considerations,” in Proc. of the 22nd Annual Int’l
Symposium on Computer Architecture, 1995, pp. 24–36.
[22] “Intel MPI Benchmarks.” [Online]. Available: http://software.intel.com/enus/
articles/intel-mpi-benchmarks/
[23] J. Hu and R. Marculescu, “Exploiting the routing flexibility for energy/performance
aware mapping of regular noc architectures,” in Proc. Design, Automation and Test in
Europe Conference and Exhibition, vol. 1, Feb. 16–20, 2004, p. xxix.
[24] S. Murali and G. De Micheli, “Bandwidth-constrained mapping of cores onto noc architectures,”
in Proc. Design, Automation and Test in Europe Conference and Exhibition,
vol. 2, Feb. 16–20, 2004, pp. 896–901.
[25] K. J. Nesbit and J. E. Smith, “Data cache prefetching using a global history buffer,” in
Proc. 10th International Symposium on HPCA-10 High Performance Computer Architecture,
Feb. 14–18, 2004, p. 96.
[26] N. Agarwal, T. Krishna, L. Peh, and N. Jha, “Garnet: A detailed on-chip network
model inside a full-system simulator,” in Proceedings of International Symposium on
Performance Analysis of Systems and Software, 2009.
[27] T. Sherwood, E. Perelman, and B. Calder, “Basic block distribution analysis to find
periodic behavior and simulation points in applications,” in International Conference
on Parallel Architectures and Compilation Techniques, 2001, pp. 3–14.
[28] N. Agarwal, T. Krishna, L.-S. Peh, and N. Jha, “Garnet: A detailed on-chip network
model inside a full-system simulator,” in Proceedings of IEEE International Symposium
on Performance Analysis of Systems and Software, 2009. ISPASS 2009., April 2009,
pp. 33 –42.
[29] N. Binkert, B. Beckmann, G. Black, S. Reinhardt, A. Saidi, A. Basu, J. Hestness,
D. Hower, T. Krishna, S. Sardashti et al., “The gem5 simulator,” ACM SIGARCH
Computer Architecture News, vol. 39, no. 2, pp. 1–7, 2011.
[30] C. Hughes, V. Pai, P. Ranganathan, and S. Adve, “Rsim: Simulating shared-memory
multiprocessors with ilp processors,” Computer, vol. 35, no. 2, pp. 40–49, 2002.
[31] “BookSim 2.0,” Network-on-Chip project at Standford University. [Online]. Available:
https://nocs.stanford.edu
[32] A. B. Kahng, B. Lin, K. Samadi, and R. S. Ramanujam, “Trace-driven optimization
of networks-on-chip configurations,” in Proceedings of the 47th Design Automation
Conference. New York, NY, USA: ACM, 2010, pp. 437–442. [Online]. Available:
http://doi.acm.org/10.1145/1837274.1837384
[33] C. Nitta, M. Farrens, K. Macdonald, and V. Akella, “Inferring packet dependencies
to improve trace based simulation of on-chip networks,” in Proceedings of
the Fifth ACM/IEEE International Symposium on Networks-on-Chip, ser. NOCS
’11. New York, NY, USA: ACM, 2011, pp. 153–160. [Online]. Available:
http://doi.acm.org/10.1145/1999946.1999971
[34] J. Hestness, B. Grot, and S. W. Keckler, “Netrace: Dependency-driven trace-based
network-on-chip simulation,” in Proceedings of the Third International Workshop on
Network on Chip Architectures. New York, NY, USA: ACM, 2010, pp. 31–36. [Online].
Available: http://doi.acm.org/10.1145/1921249.1921258
[35] Y. S.-C. Huang, Y.-C. Chang, T.-C. Tsai, Y.-Y. Chang, and C.-T. King, “Attackboard:
A novel dependency-aware traggic generator for exploring noc design space,” in Proceedings
of the 49th Design Automation Conference (DAC), 2012.
[36] F. Fazzino, M. Palesi, and D. Patti, “Noxim: Network-on-chip simulator,” 2008.
[37] F. Trivino, F. J. Andujar, F. J. Alfaro, J. L. Sanchez, and A. Ros, “Self-related traces:
An alternative to full-system simulation for nocs,” in High Performance Computing and
Simulation (HPCS), 2011 International Conference on, july 2011, pp. 819 –824.
[38] N. Binkert, R. Dreslinski, L. Hsu, K. Lim, A. Saidi, and S. Reinhardt, “The M5 simulator:
Modeling networked systems,” in Proc. of the 39th Int’l Symposium on Microarchitecture,
vol. 26, no. 4, 2006, pp. 52–60.
[39] Z. Tan, A. Waterman, H. Cook, S. Bird, K. Asanovi´c, and D. Patterson, “A case
for FAME: FPGA architecture model execution,” in Proc. of the 37th Annual Int’l
Symposium on Computer Architecture, 2010, pp. 290–301.
[40] Y. S.-C. Huang, J. Ouyang, Y.-Y. Chang, Y. Xie, and C.-T. King, “Tilesim: A scalable
and parallel simulator for noc-centric research,” in Proc. submitted to TVLSI (under
review), 2012.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文