簡易檢索 / 詳目顯示

研究生: 董志宣
Tung, Chih-Hsuan
論文名稱: 網狀連接處理器陣列之記憶體內建同儕修復架構
A Memory Built-In Peer-Repair Architecture for Mesh-Connected Processor Array
指導教授: 吳誠文
Wu, Cheng-Wen
口試委員: 黃錫瑜
Huang, Shi-Yu
呂學坤
Lu, Shyue-Kung
李昆忠
Lee, Kuen-Jong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 41
中文關鍵詞: 內嵌式記憶體記憶體內建自我修復記憶體內建同儕修復網狀連接處理器陣列記憶體修復記憶體測試備援分析系統晶片備援配置動態隨機存取記憶體良率提升
外文關鍵詞: embedded memory, memory built-in self-repair (MBISR), memory built-in peer-repair (MBIPR), mesh-connected processor array, memory repair, memory testing, redundancy analysis, SoC, spare allocation, SRAM, yield improvement
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,在人工智慧應用的推動下,網狀連接處理器陣列已成為流行的高效能人工智慧運算架構,例如IBM TrueNorth、Intel Loihi、Cerebras WSE、Google TPU等。該架構除了處理器核心以外,系統中的記憶體在性能和功耗方面也有著至關重要的地位,因嵌入式記憶體通常佔據這些晶片總面積的2/3以上,從而主導運算晶片的良率和可靠度。記憶體內建自我修復(MBISR)已被認為是測試和修復處理器核心中嵌入式記憶體的可行解決方案。但是,MBISR沒有發揮網狀連接處理器陣列的連接特性優勢。在本文中,我們提出了一種記憶體內建同儕修復(MBIPR)架構,該架構使處理器核心能夠與網狀連接陣列中的相鄰核心共享其備用記憶體。實驗結果顯示,提出的MBIPR的修復率優於原始MBISR。並且與MBISR相比,MBIPR將備用記憶體的利用率提高了2.1到8.1倍、系統壽命提高了2.1到7.9倍,而面積成本僅增加了0.2-0.9%。


    Driven by artificial intelligence (AI) applications in recent years, the mesh-connected processor array has become a popular high-performance AI computing architecture, such as IBM TrueNorth, Intel Loihi, Cerebras WSE, Google TPU, etc. In addition to processor cores, however, it is well known that memories in the system play a critical role in performance and power consumption, so normally the embedded memories occupy more than 2/3 of the overall area of these chips, which in turn dominate the yield and reliability of the computing chips. Memory built-in self-repair (MBISR) has been considered a feasible solution for test and repair of embedded memories in the respective processor cores. However, so far MBISR does not take advantage of the regular topology of the mesh-connected processor array. In this paper, we propose a memory built-in peer-repair (MBIPR) architecture that enables the processor core to share its spare memories with the neighboring cores in the mesh-connected array. Experimental results show that the repair rate of the proposed MBIPR outperforms that of the original MBISR. Compared with MBISR, MBIPR increases the spare utilization by 2.1 to 8.1 times, and the lifetime by 2.1 to 7.9 times, with only about 0.2-0.9% higher area overhead.

    摘要 i Abstract ii Contents iii List of Figures v List of Tables vii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Introduction to Mesh-Connected Processor Arrays 3 1.4 Organization 4 Chapter 2 Background 5 2.1 Redundant-Core Repair for Mesh-Connected Processor Arrays 5 2.2 Memory Built-In Self-Test (MBIST) 6 2.2.1 Controller Architecture 9 2.2.2 Sequencer Architecture 9 2.2.3 Test Pattern Generator (TPG) Architecture 10 2.2.4 BIST Intermediate Description (BID) Format 11 2.2.5 BIST Circuit Compilation Flow 13 2.3 Memory Built-In Self-Repair (MBISR) 13 2.3.1 The Essential Spare Pivoting (ESP) Algorithm 14 2.3.2 BRAINS+: The MBIST/R Generator 17 Chapter 3 Proposed Method 19 3.1 MBIPR for Mesh-Connected Processor Arrays 19 3.2 MBIPR Architecture 20 3.3 MBIPR in the Test Mode and the Normal Mode 21 Chapter 4 Experimental Results 25 4.1 Evaluation of Repair Rate 25 4.2 Evaluation of Lifetime and Spare Utilization 27 4.3 Area Overhead 32 4.4 Timing and Power Overhead 34 4.5 I/O Overhead 35 Chapter 5 Conclusion and Future Work 37 5.1 Conclusion 37 5.2 Future Work 37 Bibliography 38

    [1] F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S. Modha, “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 10, pp. 1537-1557, Aug. 2015.
    [2] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, Sri Harsha Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang, and H. Wang, “Loihi: A neuromorphic manycore processor with onchip learning,” IEEE Micro, vol. 38, no. 1, pp. 82-99, Jan.-Feb. 2018.
    [3] Cerebras Systems, “Wafer-Scale Deep Learning,” in Proc. IEEE Hot Chips 31 Symp. (HCS), Cupertino, pp. 1-31, Aug. 2019.
    [4] Cerebras Systems, “Cerebras Systems: Achieving Industry Best AI Performance Through A Systems Approach,” https://f.hubspotusercontent30.net/hubfs/896853 3/Cerebras-CS-2-Whitepaper.pdf, Apr. 2021.
    [5] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, R. C. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and H. Y. Doe, “In-datacenter performance analysis of a tensor processing unit,” in Proc. 44th Annu. Int. Symp. on computer architecture, Toronto, vol. 17, pp. 1-12, June 2017.
    [6] I. Takanami, and T. Horita, “A built-in circuit for self-repairing mesh-connected processor arrays by direct spare replacement,” in Proc. IEEE 18th Pacific Rim Int. Symp. on Dependable Computing, Niigata, pp. 96-104, Nov. 2012.
    [7] S.-Y. Kung, S.-N. Jean and C.-W. Chang, “Fault-tolerant array processors using single-track switches,” IEEE Trans. on Computers, vol. 38, no. 4, pp. 501-514, Jan. 1989.
    [8] Semiconductor Industry Association, “International Technology Roadmap for Semiconductors (ITRS), 2009 Edition,” Sematech, Hsinchu, Taiwan, Dec. 2009.
    [9] L.-T. Wang, C.-W. Wu, and X. Wen, Design for Testability: VLSI Test Principles and Architectures, Elsevier (Morgan Kaufmann), San Francisco, 2006.
    [10] M. Lee and C.-W. Wu, "Method for Repairing Memory and System Thereof", U.S. Patent No. 8095832B2, Jan. 2012.
    [11] M. Lee, L.-M. Denq, and C.-W. Wu, “BRAINS+: A memory built-in self-repair generator,” in Proc. 1st VLSI Test Technology Workshop (VTTW), Hsinchu, July 2007.
    [12] S.-K. Lu, C.-L. Yang, Y.-C. Hsiao, and C.-W. Wu, “Efficient BISR techniques for embedded memories considering cluster faults,” IEEE Trans. on VLSI Systems, vol. 18, no. 2, pp. 184-193, Feb. 2010.
    [13] P. Ohler, S. Hellebrand, and H. J. Wunderlich, “An integrated built-in test and repair approach for memories with 2-D redundancy,” in Proc. IEEE Eur. Test Symp. (ETS), Freiburg, pp. 91-96, May. 2007.
    [14] J. Lee, K. Park, and S. Kang, “An area-efficient built-in redundancy analysis for embedded memories with optimal repair rate using 2-D redundancy,” in Int. SoC Design Conf. (ISOCC), Busan, pp. 353-356, Nov. 2009.
    [15] S.-K. Lu, Y.-C. Tsai, C.-H. Hsu, K.-H. Wang, and C.-W. Wu, “Efficient built-in redundancy analysis for embedded memories with 2-D redundancy,” IEEE Trans. on VLSI Systems, vol. 14, no. 1, pp. 34-42, Jan. 2006.
    [16] Y.-J. Huang, D.-M. Chang, and J.-F. Li, “A built-in redundancy analysis scheme for self-repairable RAMs with two-level redundancy,” in Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems (DFT), Arlington, pp. 362-370, Oct. 2006.
    [17] S. K. Thakur, R. A. Parekhji, and A. N. Chandorkar, “On-chip test and repair of memories for static and dynamic faults,” in Proc. IEEE Int. Test Conf. (ITC), Santa Clara, pp. 1-10, Oct. 2006.
    [18] C.-D. Huang, J.-F. Li, and T.-W. Tseng, “ProTaR: An infrastructure IP for repairing RAMs in system-on-chips,” IEEE Trans. On VLSI Systems, vol. 15, no. 10, pp. 1135-1143, Oct. 2007.
    [19] T.-W. Tseng and J.-F. Li, “A shared parallel built-in self-repair scheme for random access memories in SoCs,” in Proc. IEEE Int. Test Conf. (ITC), Santa Clara, pp. 1-9, Oct. 2008.
    [20] B. D. de Dinechin, “Kalray MPPA®: Massively parallel processor array: Revisiting DSP acceleration with the Kalray MPPA Manycore processor,” in Proc. IEEE Hot Chips 27 Symp. (HCS), Cupertino, pp. 1-27, Aug. 2015.
    [21] A. Lopich, and D. Piotr, “A SIMD cellular processor array vision chip with asynchronous processing capabilities,” IEEE Trans. on Circuits and Systems I: Regular Papers, vol. 58, no. 10, pp. 2420-2431. Oct. 2011.
    [22] C. Cheng, C.-T. Huang, J.-R. Huang, C.-W. Wu, C.-J. Wey, and M.-C. Tsai, “BRAINS: A BIST complier for embedded memories,” in Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems (DFT), Yamanashi, pp. 299-307, Oct. 2000.
    [23] K.-L. Cheng, C.-M. Hsueh, J.-R. Huang, J.-C. Yeh, C.-T. Huang, and C.-W. Wu, “Automatic generation of memory built-in self-test cores for system-on-chip,” in Proc. 10th IEEE Asian Test Symp. (ATS), Kyoto, pp. 91-96, Nov. 2001.
    [24] C.-T. Huang, The user guide of programmable memory BIST compiler, version 1.0, Oct. 2001.
    [25] C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, “Built-in redundancy analysis for memory yield improvement,” IEEE Trans. on Reliability, vol. 52, no. 4, pp. 386-399, Dec. 2003.
    [26] M. Lee, L.-M. Denq, and C.-W. Wu, “A memory built-in self-repair scheme based on configurable spares.” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 6, pp. 919-929, June 2011.
    [27] J. P. Bickford, R. Rosner, E. Hedberg, J. W. Yoder, and T. S. Barnett, “SRAM Redundancy - Silicon Area versus Number of Repairs Trade-off,” in Proc. IEEE/SEMI ASMC, Cambridge, pp. 387-392, May 2008.
    [28] C.-F. Wu, C.-T. Huang, and C.-W. Wu, “RAMSES: A fast memory fault simulator,” in Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI Systems (DFT), Albuquerque, pp. 165-173, Nov. 1999.

    QR CODE