研究生: |
陳嘉璟 Chen, Jia-Jing |
---|---|
論文名稱: |
應用於深度學習資料處理以靜態隨機存取記憶體為基礎之記憶體內運算巨集 A SRAM Based Computing-In-Memory Macro for Deep neural network Data Processing |
指導教授: |
張孟凡
Chang, Meng-Fan |
口試委員: |
洪浩喬
Hong, Hao-Chiao 呂仁碩 Liu, Ren-Shuo |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電子工程研究所 Institute of Electronics Engineering |
論文出版年: | 2018 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 49 |
中文關鍵詞: | 靜態隨機存取記憶體 、記憶體內運算 |
外文關鍵詞: | SRAM, Computing in memory |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著人工智慧與深度學習神經網路的快速發展以及多樣的行動裝置被廣泛應用,資料的傳輸數量呈現了巨量的成長。然而,在傳統處理器為范紐曼(Von Neumann)架構中,能量大多消耗在搬動資料的過程,我們稱之為范紐曼瓶頸。因此,記憶體內運算就成為了一個具有極大潛力的發展目標,意即能夠使傳統的交換資料過程全部在記憶體內部完成。
記憶體內運算在系統中扮演硬體加速器的角色,能同時擁有運算以及記憶的功能,並且直接傳送運算後的結果並減少搬動資料的過程。為了要達到以上的目的,本篇論文的設計概念為將神經網路的輸入同時致能多個記憶體內字元線,使其具與記憶體內的權重進行運算乘加的功能以達到記憶體內運算的目標。
本篇論文為提供卷積神經網絡加速運算,透過55奈米邏輯製程,實現了容量為4Kb, 1到8位可配置的6T靜態隨機存取記憶體 (SRAM)為基礎的記憶體內運算單元,為了實現高讀取精度,小面積,可重新配置和低讀取能量,這項工作採用1)混合陣列內二進制乘積和(Product Sum)操作與數位近陣列多位乘積和累加; 2)串行輸入以及將多位權重列基映射於陣列; 3)自參考多級讀取方案。 製造的55nm 4Kb 靜態隨機存取記憶體記憶體內運算單元比以前的工作提高了4.6-50倍的功績(FOM)。
With the rapid development of the deep learning neural network and the widely popular of variety mobile devices, the number of data transmission has grown tremendously. However, the process of moving data from CPU and memory consume most energy in traditional Von Neumann architecture, we call it the Von Neumann bottleneck. Therefore, computing-in-memory has become a development goal with great potential, which means that the process of data movement can be completed in the memory.
The computing-in-memory can have both the operation and the memory function, and can directly transfer the result data after computing to reduce the process of moving the data instead of transferring data to compute. In order to achieve the target above, the design concept of this paper is to input the neural network feature map to activate multiple memory word lines at the same time, so that it has the function of operation multiplication and addition to achieve the goal of memory operation.
This paper presents a 1-to-8 bits configurable 6T SRAM CIM for CNN operations. To achieve high read accuracy, compact area, re-configurability, and low read energy, this work employs 1) a hybrid structure combining in-array binary product-sum (PS) operations with digital near-array multi-bit PS accumulation; 2) multi-bit column-based weight mapping with serial inputs; and 3) a self-reference multi-level read scheme. A fabricated 55nm 4Kb 6T SRAM-CIM macro improves 4.6-50x FoM than previous works.
[1] H. Qin, et al., “SRAM leakage suppression by minimizing standby supply voltage,” in IEEE International Symposium on Quality Electronic Design, pp. 55-60, 2004.
[2] K. Nii, et al., “A Low Power SRAM using Auto-Backgate-Controlled MT-CMOS,”in IEEE International Symposium on Low Power Electronics and Design, pp. 293-298, Aug. 1998.
[3] C. Morishima, et al., “A 1-V 20-ns 512-Kbit MT-CMOS SRAM with Auto-Power-Cut Scheme Using Dummy Memory Cells,”in IEEE European Solid-State Circuit Conference, pp. 452-455, Sept. 1998.
[4] A. G. Hanlon et al., “Content-Addressable and Associative Memory Systems a Survey,” IEEE Transactions on Electronic Computers, vol. EC-15, no.4, pp.509-521, Aug. 1966.
[5] C. C. Wang et al., “An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices,” IEEE International Solid-State Circuits Conference, pp.390-622, Feb. 2008.
[6] J. Li et al., “1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing,” IEEE Journal of Solid-State Circuits, vol. 49, Issue 4, pp. 896-907, Apr. 2014.
[7] M. F. Chang et al., “A 3T1R Nonvolatile TCAM Using MLC ReRAM with Sub-1ns Search Time,” IEEE International Solid-State Circuits Conference, pp. 1-3, Feb. 2015.
[8] D. Smith et al., “A 3.6ns 1Kb ECL I/O BiCMOS U.V. EPROM,” IEEE International Symposium on Circuits and Systems, vol. 3, pp. 1987-1990, May. 1990.
[9] C. Kuo et al., “A 512-kb flash EEPROM embedded in a 32-b microcontroller,” IEEE Journal of Solid-State Circuits, vol. 27, Issue 4, pp. 574-582, Apr. 1992.
[10] S. H. Kulkarni et al., “A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 μm2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS,” IEEE Journal of Solid-State Circuits, vol. 45, Issue 4, pp. 863-868, Apr. 2010.
[11] Y. H. Tsai et al., “45nm Gateless Anti-Fuse Cell with CMOS Fully Compatible Process,” IEEE International Electron Devices Meeting, pp. 95-98, Dec. 2007.
[12] Webfeet Inc., “Semiconductor industry outlook,” Non-Volatile Memory Conference, 2002.
[13] S. L. Min et al., “Current trends in flash memory technology,” IEEE Asia and South Pacific Conference on Design Automation, pp. 24-27, Jan. 2006.
[14] F. Masuoka et al., “New ultra high density EPROM and flash EEPROM with NAND structure cell,” IEEE International Electron Devices Meeting, vol. 33, pp. 552-555, 1987.
[15] A. Bergemont et al., “NOR virtual ground (NVG)-a new scaling concept for very high density flash EEPROM and its implementation in a 0.5 um process,” IEEE International Electron Devices Meeting, pp. 15-18, Dec. 1993.
[16] D. Kuzum et al., “Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing,” Nano Letters 12 (5), 2179-2186, 2012.
[17] B. Chen et al., “Efficient in-memory computing architecture based on crossbar arrays,” IEEE International Electron Devices Meeting, pp. 17.5.1-17.5.4, 2015.
[18] S. Li et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” ACM/EDAC/IEEE Design Automation Conference, pp. 1-6, 2016.
[19] Q. Dong et al., “A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology,” IEEE Symposium on VLSI Circuits, pp. C160-C161, 2017.
[20] J. Zhang, Z. Wang and N. Verma, "In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array," in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017.
[21] A. Biswas, et al., “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications” IEEE International Solid-State Circuits Conference, pp. 488-490, 2018
[22] S. K. Gonugondla, et al., “A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training” IEEE International Solid-State Circuits Conference, pp. 490-492, 2018
[23] M. Motomura, et al., "BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W," IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 983-994, April 2018.
[24] W. Khwa et al., “A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors” IEEE International Solid-State Circuits Conference, pp. 496-498, 2018
[25] E. Seevinck, et al., "Static-noise margin analysis of MOS SRAM cells," in IEEE Journal of Solid-State Circuits, vol. 22, pp. 748-754, Oct. 1987.
[26] A. Agarwal, et al., "A 320mV-to-1.2V On-Die Fine-Grained Reconfigurable Fabric for DSP/Media Accelerators in 32nm CMOS,"in IEEE International Solid-State Circuits Conference, pp. 328-329, Feb. 2010.
[27] M. Wieckowski, et al., "A portless SRAM Cell using stunted wordline drivers," in IEEE International Symposium on Circuits and Systems, pp. 584-587, 2008.
[28] M. Wieckowski, et al., "Portless SRAM-A High-Performance Alternative to the 6T Methodology,"in IEEE Journal of Solid-State Circuits, vol. 42, pp. 2600-2610, Nov. 2007.
[29] K. Nii, et al., "A 45-nm single-port and dual-port SRAM family with robust read/write stabilizing circuitry under DVFS environment,"in IEEE Symposium on VLSI Circuits, pp. 212-213, 2008.
[30] D. P. Wang, et al., "A 45nm dual-port SRAM with write and read capability enhancement at low voltage," in International SoC Design Conference, pp. 211-214, 2007.
[31] S. A. Tawfik, et al., "Low power and robust 7T dual-Vt SRAM circuit," in IEEE International Symposium on Circuits and Systems, pp. 1452-1455, 2008.
[32] J. Singh, et al., "Single ended 6T SRAM with isolated read-port for low-power embedded systems," in Design, Automation & Test in Europe Conference & Exhibition, pp. 917-922, 2009.
[33] N. Verma, et al., "A 256 kb 65 nm 8T Subthreshold SRAM Employing Sense-Amplifier Redundancy," in IEEE Journal of Solid-State Circuits, vol. 43, pp. 141-149, Jan. 2008.
[34] Y. Morita, et al., "An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment," in IEEE Symposium on VLSI Circuits, pp. 256-257, 2007.
[35] T. Song, et al., "A 10nm FinFET 128Mb SRAM with assist adjustment system for power, performance, and area optimization," in IEEE International Solid-State Circuits Conference, pp. 306-307, 2016.
[36] Y. H. Chen, et al., "A 16 nm 128 Mb SRAM in High- kappa Metal-Gate FinFET Technology With Write-Assist Circuitry for Low-VMIN Applications," in IEEE Journal of Solid-State Circuits, vol. 50, Issue 1, pp. 170-177, 2015.
[37] E. Karl, et al., "A 4.6GHz 162Mb SRAM design in 22nm tri-gate CMOS technology with integrated active VMIN-enhancing assist circuitry," in IEEE International Solid-State Circuits Conference, pp. 230-232, 2012.
[38] J. Chang, et al., "A 20nm 112Mb SRAM in High-κ Metal-Gate with Assist Circuitry for Low-Leakage and Low-VMIN Applications," in IEEE International Solid-State Circuits Conference, pp. 316-318, 2013.
[39] T. Song, et al., "A 14nm FinFET 128Mb 6T SRAM with VMIN-enhancement techniques for low-power applications," in IEEE International Solid-State Circuits Conference, pp. 232-233, 2014.
[40] M. Yabuuchi, et al., "20nm High-density single-port and dual-port SRAMs with wordline-voltage-adjustment system for read/write assists," in IEEE International Solid-State Circuits Conference, pp. 234-235, 2014.
[41] K. S. Kim, et al., "Orthogonal transpose-RAM cell array architecture with alternate bit-line to bit-line contact scheme," in IEEE International Workshop on Memory Technology, Design and Testing, pp. 9-11, 2001.
[42] K. G. Revathi, et al., "Efficient diagonal data mapping for large size 2D DCT/IDCT using single port SRAM based transpose memory," in International Conference on Electrical, Electronics, and Optimization Techniques, pp. 4894-4898, 2016.
[43] Z. Xie, et al., "Data mapping scheme and implementation for high-throughput DCT/IDCT transpose memory," in IEEE International Conference on Solid-State and Integrated Circuit Technology, pp. 1-3, 2014.
[44] Q. Shang, et al., "Single-Port SRAM-Based Transpose Memory With Diagonal Data Mapping for Large Size 2-D DCT/IDCT," in IEEE Transactions on Very Large Scale Integration Systems, pp. 2423-2427, 2014.
[45] M. F. Chang et al., “An Offset-Tolerant Fast-Random-Read Current-Sampling-Based Sense Amplifier for Small-Cell-Current Nonvolatile Memory,” IEEE Journal of Solid-State Circuits, vol. 48, no. 3, pp. 864-877, Mar. 2013.
[46] M. F. Chang et al., “A 28nm 256kb 6T-SRAM with 280mV improvement in VMIN using a dual-split-control assist scheme,” IEEE International Solid-State Circuits Conference, pp. 314-315, 2015.