研究生: |
李峻毅 Li, June-Yi. |
---|---|
論文名稱: |
應用於二進制人工智慧之深層神經網路邊緣處理器以非揮發性電阻式記憶體為基礎之記憶體內進行乘法與累加運算 A Non-volatile Computing-In-Memory ReRAM Macro with Multiply-and-Accumulate for Binary DNN AI Edge Processors |
指導教授: |
張孟凡
Chang, Meng-Fan |
口試委員: |
呂仁碩
LIU, REN-SHUO 鄭桂忠 Tang, Kea Tiong |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 積體電路設計與製程開發產業碩士專班 Graduate Program in Integrated Circuit Design and Process Development |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 34 |
中文關鍵詞: | 非揮發性記憶體內存計算 、深度神經網路 |
外文關鍵詞: | nv-CIM, Deep-Neuron-Network |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著深度學習神經網路的快速發展以及行動裝置的廣泛應用,資料的計算及傳輸數量急遽成長。然而,傳統處理器范紐曼(Von Neumann)架構,受限於資料傳輸的速度、硬體成本,以及能量的消耗,因此記憶體內運算便成為一個具潛力之解決方案,能夠將傳統架構所需的資料交換過程於記憶體內部完成。
深度學習神經網路通常需要m層卷積層(Convolution Neural Network)以及n層全連接層(Fully-Connected Network),然而,除了大量的資料集存取時間外,卷積神經網路以及全連接網路的乘加運算,亦會產生大量的中間資料(intermediate data),因此,實現具高密度之非揮發性記憶體內的運算,能夠避免范紐曼架構受限之資料傳輸速度、減少中間資料的搬移,並且縮短一個記憶體運算週期內所需的乘加運算時間,大幅提升深度學習神經網路的運算速度。
本篇論文透過0.18µm邏輯製程,以模擬結果實現了容量為1Mb,基於電阻式記憶體(ReRAM)之卷積神經網路的加速運算,具二進制輸入及三元權重(Binary Input Ternary Weight)之乘積和(Product Sum),使用3 x 3 kernel完成一次乘加運算於2bits輸出模式之讀取時間為37.5ns,3bits輸出模式為50ns.
With the rapid development of the deep neural network and the widely popular of variety mobile devices, the number of computing data and transmission has rapid grown. However, the conventional Von Neumann architecture are limited by the transmission speed, hardware costs and energy consumption. Therefore, the computing in memory has become a solution with high potential, that the conventional data exchange could be process in memory directly.
The deep neural networks usually require m-layers of the convolution neural networks (CNN) and n-layers fully-connected networks (FCN). However, except the heave access time generated from amounts of data sets, it also generated large amounts of the intermediate data due to the multiply-and-accumulate (MAC) operations for CNN and FCN. Therefore, demonstrate a compute-in memory(CIM) approach with high-density NVM could avoid the limit of conventional Von Neumann architecture transmission speed, reducing the intermediate data access, and reduce the latency of the MAC operations in one cycle, can enhance the DNN operations speed greatly.
Based on 0.18um CMOS logic process, we fabricated a 1Mb computing-in-memory ReRAM macro with Multiply-and-Accumulate operation and BITW structure for DNN by simulation. This work achieves 37.5ns access time for CNN operation using 3x3 kernels in 2bits sensing mode, and 50ns access time in 3bits sensing mode.
[1] A. G. Hanlon et al., “Content-Addressable and Associative Memory Systems a Survey,” IEEE Transactions on Electronic Computers, vol. EC-15, no.4, pp.509-521, Aug. 1966.
[2] C. C. Wang et al., “An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices,” IEEE International Solid-State Circuits Conference (ISSCC), pp.390-622, Feb. 2008.
[3] J. Li et al., “1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing,” IEEE Journal of Solid-State Circuits (JSSC), vol. 49, Issue 4, pp. 896-907, Apr. 2014.
[4] M. F. Chang et al., “A 3T1R Nonvolatile TCAM Using MLC ReRAM with Sub-1ns Search Time,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 1-3, Feb. 2015.
[5] D. Smith et al., “A 3.6ns 1Kb ECL I/O BiCMOS U.V. EPROM,” IEEE International Symposium on Circuits and Systems (ISCAS), vol. 3, pp. 1987-1990, May. 1990.
[6] C. Kuo et al., “A 512-kb flash EEPROM embedded in a 32-b microcontroller,” IEEE Journal of Solid-State Circuits (JSSC), vol. 27, Issue 4, pp. 574-582, Apr. 1992.
[7] S. H. Kulkarni et al., “A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 μm2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS,” IEEE Journal of Solid-State Circuits (JSSC), vol. 45, Issue 4, pp. 863-868, Apr. 2010.
[8] Y. H. Tsai et al., “45nm Gateless Anti-Fuse Cell with CMOS Fully Compatible Process,” IEEE International Electron Devices Meeting (IEDM), pp. 95-98, Dec. 2007.
[9] Webfeet Inc., “Semiconductor industry outlook,” Non-Volatile Memory Conference, 2002.
[10] S. L. Min et al., “Current trends in flash memory technology,” IEEE Asia and South Pacific Conference on Design Automation, pp. 24-27, Jan. 2006.
[11] F. Masuoka et al., “New ultra high density EPROM and flash EEPROM with NAND structure cell,” IEEE International Electron Devices Meeting (IEDM), vol. 33, pp. 552-555, 1987.
[12] A. Bergemont et al., “NOR virtual ground (NVG)-a new scaling concept for very high density flash EEPROM and its implementation in a 0.5 um process,” IEEE International Electron Devices Meeting (IEDM), pp. 15-18, Dec. 1993.
[13] D. Kuzum et al., “Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing,” Nano Letters 12 (5), 2179-2186, 2012.
[14] B. Chen et al., “Efficient in-memory computing architecture based on crossbar arrays,” IEEE International Electron Devices Meeting (IEDM), pp. 17.5.1-17.5.4, 2015.
[15] S. Li et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1-6, 2016.
[16] Q. Dong et al., “A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology,” IEEE Symposium on VLSI Circuits, pp. C160-C161, 2017.
[17] F. Su et al., “A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory,” IEEE Symposium on VLSI Circuits, pp. C260-C261, 2017.
[18] Y. H. Tseng et al., “High density and ultra small cell size of Contact ReRAM (CR-RAM) in 90nm CMOS logic technology and circuits,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2009.
[19] C. H. Ho et al., “A Highly Reliable Self-Aligned Graded Oxide WOx Resistance Memory: Conduction Mechanisms and Reliability,” IEEE Symposium on VLSI Technology, pp. 228-229, Jun. 2007.
[20] M. J. Lee et al., “2-stack 1D-1R Cross-point Structure with Oxide Diodes as Switch Elements for High Density Resistance RAM Applications,” IEEE International Electron Devices Meeting (IEDM), pp. 771-774, Dec. 2007.
[21] H. Y. Lee et al., “Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2008.
[22] B. Gao et al., “Oxide-based RRAM switching mechanism: A new ion-transport-recombination model,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2008.
[23] C. H. Wang et al., “Three-dimensional 4F2 ReRAM cell with CMOS logic compatible process,” IEEE International Electron Devices Meeting (IEDM), pp. 29.6.1-29.6.4, Dec. 2010.
[24] Y. S. Chen et al., “Highly scalable hafnium oxide memory with improvements of resistive distribution and read disturb immunity,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2009.
[25] G. Bersuker et al., “Metal oxide RRAM switching mechanism based on conductive filament microscopic properties,” IEEE International Electron Devices Meeting (IEDM), pp. 19.6.1-19.6.4, Dec. 2010.
[26] C. Cagli et al., “Evidence for threshold switching in the set process of NiO-based RRAM and physical modeling for set, reset, retention and disturb prediction,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2008.
[27] J. Lee et al., “Diode-less nano-scale ZrOx/HfOx RRAM device with excellent switching uniformity and reliability for high-density cross-point memory applications,” IEEE International Electron Devices Meeting (IEDM), pp. 19.5.1-19.5.4, Dec. 2010.
[28] B. Lee et al., “NiO resistance change memory with a novel structure for 3D integration and improved confinement of conduction path,” IEEE Symposium on VLSI Technology, pp. 28-29, Jun. 2009.
[29] K. Aratani et al., “A Novel Resistance Memory with High Scalability and Nanosecond Switching,” IEEE International Electron Devices Meeting (IEDM), pp. 10-12, Dec. 2007.
[30] M. F. Chang et al., “An Offset-Tolerant Fast-Random-Read Current-Sampling-Based Sense Amplifier for Small-Cell-Current Nonvolatile Memory,” IEEE Journal of Solid-State Circuits (JSSC), vol. 48, no. 3, pp. 864-877, Mar. 2013.
[31] V. Khwa et al., “A 65nm 4Kb Algorithm-Dependent Computing-in-Memory SRAM Unit-Macro with 2.3ns and 55.8 TOPS/W Fully Parallel Product-Sum Operation for Binary DNN Edge Processors,” IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, pp. 496-497, Feb. 2018.
[32] W.-H. Chen et al., “A 65nm 1Mb Nonvolatile Computing-in-Memory ReRAM Macro with sub-16ns Multiply-and-Accumulate for Binary DNN AI Edge Processors,” IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, pp. 494-495, Feb. 2018
[33] W.Chien et al., "A novel high performance WOx ReRAM based on thermally-induced SET operation," 2013 Symposium on VLSI Technology, Kyoto, 2013, pp. T100-T101.
[34] W.C.Chien et al., "Multi-layer sidewall WOX resistive memory suitable for 3D ReRAM," 2012 Symposium on VLSI Technology (VLSIT), Honolulu, HI, 2012, pp. 153-154.