簡易檢索 / 詳目顯示

研究生: 陳韋廷
Chen, Wei-Ting
論文名稱: 用於多核心平台的晶片上網路之容錯設計與分析
Fault-Tolerant NoC Design and Analysis for Many-Core Platform
指導教授: 黃稚存
Huang, Chih-Tsun
口試委員: 劉靖家
Liou, Jing-Jia
黃俊達
Huang, Juinn-Dar
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2013
畢業學年度: 102
語文別: 英文
論文頁數: 63
中文關鍵詞: 容錯多核心晶片網路運算元件間相互溝通晶片系統
外文關鍵詞: inter-PE communication, local channel
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著系統晶片技術的改進,多核心處理器正變得越來越重要,其通信基礎設施實現在晶片網路上,晶片網路包含了大量的交換機和互連,形成一個橫跨晶片的結構。現代的多核心系統隨著越來越多的晶片組件,變得特別容易故障。在晶片網路上,即使是一個小小的通道斷掉了,也可能會導致部分通訊停止,甚至整個卡住,使得整個晶片沒有辦法使用。因此提高其良率變成很重要的課題。
    我們提出了一個方法來分析和改進的晶片網路架構的容錯,在未來的技術中,這是一個必要的步驟。我們提出了一個自我修復的方法。我們在相鄰運算元件之間增加通道用以實現容錯晶片網路,這將顯著提高系統的良率。我們用一個簡單的RC公式來計算新通道的導線延遲,其線延遲小於1ns,因此該延遲是我們負擔的起的。此外,這個新通道取代了原本相鄰運算元件之間的傳輸方式,使之可以減少傳輸時間。我們在SystemC平台上設計和分析容錯晶片網路中,此多核心平台有16個運算單元,對於這些運算單元我們增加兩到四個OCP接口用來跟相鄰的運算單元做連接。實驗結果顯示容錯晶片網路的擁有不錯的良率,而SPLASH2測資證明當有通道斷掉時,其額外增加的延遲只有約1%到2%左右。


    As improvement of System-on-Chip (SoC) technology, many-core processors are becoming more and more important. Their communication infrastructures will be implemented with Networks-on-Chip (NoC). Networks-on-Chip (NoC) contains a large number of switches and interconnects that form a structure spanning across the chip. Unfortunately, with increasing numbers of on-chip components expected to be defective in near-future chips, modern parallel systems, such as many-core system, become especially vulnerable to these faults. Just a single channel broken in the Network-on-Chip (NoC) may cause part of the communication stop and even deadlock, rendering the chip useless. Network-on-Chip (NoC) may also be needed for improving the chip yield.
    In this thesis, we present an approach for analyzing and improving fault tolerance aspects in NoC architecture. This is a necessary step to be taken in order to implement reliable systems in future technologies. We propose a self-repair method. Adding a local channel between adjacent Processing Elements (PEs) to implement fault-tolerant NoC, which will signi cantly improve the yield of the system. We use a simple RC formulation to calculate the wire delay of local channel. The wire delay is under 1ns and, so it is a ordable for using local channel. Also this local channel only connects adjacent PEs, which is closed on the mesh. We dont need to worry about the complexity of routing on the chip. Besides, the local channel between adjacent PEs can reduce the transaction time.
    We design and analyze fault-tolerant NoC on the many-core ESL simulation platform in SystemC. This ESL-platform have sixteen Processing Elements (PEs) based on NoC. We add two to four OCP interface in each Processing Element (PE) for local channel between adjacent PEs. Detail of this architecture will show in the thesis. The experimental results show the yield of fault-tolerant NoC and the latency overhead when the channels are broken. The SPLASH2 application represent that the latency overhead is about 1% to 2% when there are channels broken.

    1 Introduction 1 1.1 Introduction of Many-Core platform . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3.1 Local Channel between Adjacent-PE . . . . . . . . . . . . . . . . . . 3 1.3.2 Fault-Tolerant NoC with local channel . . . . . . . . . . . . . . . . . 3 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Previous Work 5 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 SystemC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 OSCI TLM-2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 OpenRISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.4 Open Core Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.5 Transaction Generator 2 . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 The ESL Many-Core Platform . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Network-on-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Processing Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.4 Communication Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Software Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Existing Methods for Improvement of Fault-Tolerant NoC . . . . . . . . . . 16 3 Proposed Fault-tolerant Architecture 18 3.1 Local Channel between Adjacent-PE . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Many-Core Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Processing Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Communication Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.4 Wire Delay Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.5 Latency of Data Transmission . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Fault-Tolerant NoC with local channel . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 Matrix Represents the Channel of the Many-Core Platform . . . . . . 28 3.2.2 Fault Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.3 Yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.4 Reroute Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4 Experimental Results 39 4.1 Overview of Experiment Environment . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Yield of Fault-Tolerant NoC Analysis . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Random Send Test Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4 Odd-Even Sort Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.5 The SPLASH2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.6 Incidental Result and Overhead of Local Channel . . . . . . . . . . . . . . . 54 5 Conclusion and Future Work 57 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    [1] Jeremy A. Kaplan, \45 years later, does moores law still hold true?", Jan. 2011, http://www.foxnews.com/scitech/2011/01/04/years-later-does-moores-law-hold-true/.
    [2] W. Wolf, A. A. Jerraya, and G. Martin, \Multiprocessor System-on-Chip (MPSoC) Technology", IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp. 17011713, Oct. 2008.
    [3] P. Pande, C. Grecu, A. Ivanov, R. Saleh, and G. De Micheli, Design, aynthesis, and test of networks on chips, Design & Test of Computers, vol. 22, no. 5, pp. 404413, Sep-Oct 2005.
    [4] A. Agarwal, C. Iskander, and R. Shankar, \Survey of network on chip (NoC) archi-tectures & contributions," Journal of Engineering, Computing and Architecture, vol. 3, 2009.
    [5] D. Bertozzi and L. Benini. Xpipes: A Network-on-Chip Architecture for Gigascale Systems-on-Chip. In IEEE Circuits and Systems, Vol. 4, No. 2, pp. 1831, 2004.
    [6] T.-S. Hsu and J.-J. Liou, \A DVFS Many-core ESL Simulation Platform with Software Communication API", in Master Thesis, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan, Nov. 2011.
    [7] Open SystemC Initiative, \IEEE Standard SystemC Language Reference Manual", IEEE Std 1666-2005, pp. 1423, Mar. 2006.
    [8] D. C. Black, J. Donovan, B. Bunton, and A. Keist, SystemC: From the ground up, Springer Verlag, 2009.
    [9] Open SystemC Initiative, OSCI TLM 2.0 Language Reference Manual, July 2009.
    [10] D. Lampret, C.-M. Chen, M. Mlinar, J. Rydberg, M. Ziv-Av, C. Ziomkowski, G. Mc- Gary, B. Gardner, R. Mathur, and M. Bolado, OpenRISC 1000 Architecture Manual rev 1.3, May 2006, http://opencores.org/or1k/Main Page.
    [11] Open Core Protocol International Partnership, Open Core Protocol Speci cation Release 2.2, Jan. 2007.
    [12] L. Lehtonen et al. \Analysis of Modeling Styles on Network-on-Chip Simulation", Norchip Conference, Tampere, Finland, Nov. 2010
    [13] E. Pekkarinen, L. Lehtonen, E. Salminen, and T. Hamalainen, \A set of trac models for network-on-chip benchmarking," in System on Chip (SoC), 2011 International Symposium on. IEEE, 2011, pp. 7881.
    [14] J. Bennett, \Building a loosely timed soc model with osci tlm 2.0," 2008.
    [15] Y.-H. Chen and C.-T. Huang, \Design and Analysis of Inter-PE Communication on Many-Core Platform," in Master Thesis, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Nov. 2012.
    [16] M. Ali, M. Welzl, S. Hessler, and S. Hellebrand, \An Ecient fault tolerant mechanism to deal with permanent and transient failures in a network on chip," International Journal of High Performance Systems Architecture , vol. 1, no. 2,pp. 113-123, 2007.
    [17] A. Shahabi, N. Honarmand, H. Soho and Z. Navabi, \Degradable mesh-based on-chip networks using programmable routing tables," IEICE Electron. Express, vol. 4, no. 10, pp.332-339, 2007.
    [18] P. Rantala, T. Lehtonen, J. Isoaho, J. Plosila: \Faulttolerant Routing Approach for Re-con gurable Networks-on- Chip," in Proceedings of International Symposium on System- on-Chip, 2006, pp.1-4.
    [19] K. Heikki, N. Jari, \Fault-tolerant 2-D Mesh Network-On-Chip for Multi-Porcessor System-on-Chip," Institute of Digital and Computer Systems, Tampere University of Technology, IEEE, 2006.
    [20] L. Teijo, L. Pasi and P. juha, \Fault Tolerance Analysis of NoC Architectures," Turku Centre for Computer Science, University of Turku, Department of Information Technology, IEEE, 2007.
    [21] R. Fatemeh, A. Homa, S. Saeed, P. Paolo, N. Zainalabein, \Relability in Application Speci c Mesh-based NoC Architectures," Department of Electrical and Computer Enginnering School of Engineering, University of Tehran, IEEE, 2008.
    [22] V. Arseniy, S. Vassos, N. Chrysostomos, \A Fine-Grained Link-Level Fault-Tolerant Meshanism for Networkd-on-Chip," Dept, of Electrical Engineering and Information Technology, IEEE, 2010.
    [23] S. R. Vangal, J. Howard, G. Ruhl, S. Dighe, H.Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain et al., \An 80-tile sub-100-w tera
    ops processor in 65-nm cmos," Solid-State Circuits, IEEE Journal of, vol. 43, no. 1, pp. 2941, 2008.
    [24] R. Ho, \On-chip wires: scaling and eciency," Ph.D. dissertation, Citeseer, 2003.
    [25] P. Bai, C. Auth, S. Balakrishnan, M. Bost, R. Brain, V. Chikarmane, R. Heussner, M. Hussein, J. Hwang, D. Ingerly et al., \A 65nm logic technology featuring 35nm gate lengths, enhanced channel strain, 8 cu interconnect layers, low-k ild and 0.57 um2 sram cell," in Electron Devices Meeting, 2004. IEDM Technical Digest. IEEE International. IEEE, 2004, pp. 657660.
    [26] Malcolm Phillips, \Sort Techniques Array Sortin", http://homepages.ihug.co.nz/ aurora76/Malc/Sorting Array.htm#Exchanging.
    [27] B. Wilkinson and M. Allen, Parallel Programming Techniques and Applications Using Networked Workstations and Parallel Computers 2nd ed., Pearson Education Inc, Mar 2004.
    [28] B. Wilkinson and C. M. Allen, Parallel programming. Prentice hall New Jersey, 1999, vol.999.
    [29] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, \The splash-2 programs: Characterization and methodological considerations," in ACM SIGARCH Computer Architecture News, vol. 23, no. 2. ACM, 1995, pp. 2436.
    [30] J. P. Singh, W.-D. Weber, and A. Gupta, \Splash: Stanford parallel applications for sharedmemory," ACM SIGARCH Computer Architecture News, vol. 20, no. 1, pp. 544, 1992.
    [31] D. H. Bailey, \Ffts in external of hierarchical memory," in Proceedings of the 1989 ACM/IEEE conference on Supercomputing. ACM, 1989, pp. 234242.
    [32] L. Greengard, The rapid evaluation of potential elds in particle systems. the MIT Press, 1988.
    [33] P. Hanrahan, D. Salzman, and L. Aupperle, \A rapid hierarchical radiosity algorithm," in ACM SIGGRAPH Computer Graphics, vol. 25, no. 4. ACM, 1991, pp. 197206.
    [34] G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha, \A comparison of sorting algorithms for the connection machine cm-2," in Proceedings of the third annual ACM symposium on Parallel algorithms and architectures. ACM, 1991, pp. 316.
    [35] P. S.Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, \Simics: A full system simulation platform," Computer, vol. 35, no. 2, pp. 5058, 2002.
    [36] M. M. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood, \Multifacets general execution-driven multiprocessor simulator (gems) toolset," ACM SIGARCH Computer Architecture News, vol. 33, no. 4, pp.9299, 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE