研究生: |
黃宗源 Huang, Tsung-Yuan |
---|---|
論文名稱: |
應用於非揮發性記憶體內運算架構之雙位元輸出小偏移電流感測放大器 A Dual-bit Small Offset Current Sense Amplifier for Non-volatile Computing-In-Memory |
指導教授: |
張孟凡
Chang, Meng-Fan |
口試委員: |
洪浩喬
Hong, Hao-Chiao 邱瀝毅 謝志成 Hsieh, Chih-Cheng |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電子工程研究所 Institute of Electronics Engineering |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 57 |
中文關鍵詞: | 非揮發性記憶體 、感測放大器 、記憶體內運算 、雙位元 、小偏移 |
外文關鍵詞: | Non-volatile Memory, Sense Amplifier, Computing-In-Memory, Dual-bit, Small Offset |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
目前非揮發性記憶體在記憶體市場上相當普及,尤其以快閃記憶體為大宗。但快閃記憶體需要高電壓來進行寫入和消除資料,其操作速度偏慢,而且在製成微縮時會遭遇到耦合雜訊干擾和高偏移臨界電壓等問題。反觀下世代的非揮發性記憶體(例如:ReRAM、STT-MRAM…等)只需較低的操作電壓就能達到更好的效能,所以成為取代快閃記憶體的選擇。
伴隨著深度學習神經網路的快速發展以及行動裝置的廣泛應用,資料的計算及傳輸數量急遽成長。而傳統范紐曼(Von Neumann)架構受限於處理器和存儲器之間的帶寬限制,成為提升系統性能的最大瓶頸。所以記憶體內運算(CIM)的出現解決了這個問題,因減少資料的搬移量,使得處理效率大幅提升。同時搭配上非揮發性記憶體的儲存特性,非揮發性記憶體內運算架構(nvCIM)成為了具有高能源效率的運算方式。
本論文將探討非揮發性記憶體內運算所面臨的挑戰,並提出一個電流感測放大器去解決這些問題
1.在有限的高低阻值的比例下(R-ratio),不同累加值(MACV)之間的感測裕度過小,會導致讀取良率降低。
2.為了提高準確率,非揮發性記憶體內運算架構必須支援多位元輸入和權重來確保運算後累加值(MACV)的位元數要夠多。隨著輸出的位元數越多,整體記憶體內運算的操作速度也會越慢,功率消耗提高。
因此在此篇論文中提出一個電流感測放大器,具有製程變異容忍及預先放大感測裕度的功能來提高讀取良率,且在一個操作時間內讀取兩位元(00/01/10/11)的資訊,相比傳統讀取一位元的架構,更能提高整體的操作速度,同時降低整體記憶體內運算的能量消耗。在大的累加電流下,能抑制等效偏移達傳統的2.1~2.6倍。在相同的感測裕度下,能提升至少16%的讀取良率,在多位元輸出像四位元和六位元輸出的架構下,和傳統感測電路相比,記憶體內運算巨集(macro)的速度快到1.36~1.43倍,同時使巨集能量消耗減少為1.2~1.28倍。
我們以容量為2Mb的電阻式記憶體來實作我們提出的架構。使用台積電55奈米製程。在正常操作電壓為1V下,量測提出之電路兩位元輸出的讀取速度為3.6奈秒而傳統一位元輸出的讀取速度為3.4奈秒,而應用在記憶體內運算架構具四位元輸入和四位元權重之乘積和的速度,在六位元輸出可達到22.2奈秒。
Non-volatile memory (NVM) is very popular on storage memory market nowadays. Especially, Flash memory has already been the mainstream of NVM. However, Flash memory requires high voltage to program and erase. The operation speed is still too slow, and meets the problems like coupling noise and large variation of threshold voltage for the scaling of Flash memory. On the other hand, emerging non-volatile memories have the ability to replace Flash memory due to their better performance for lower operating voltage.
Due to the rapid development of the deep neural network and the widely popular of variety mobile devices, the number of computing data and transmission has rapid grown up. However, the limited bandwidth between processor and memories has become the bottleneck to improve the system performance in conventional Von-Neumann computer architectures. Thus, the computing-in-memory (CIM) becomes a solution for improving the processing efficiency due to the data transfer reduction. With the benefits of non-volatile storage, the non-volatile based computing-in-memory (nvCIM) has the higher energy efficiency.
The paper will discuss the challenges for nvCIM, and proposed a current sense amplifier to solve these problems.
1.At a limited R-ratio, the sensing margin between neighboring MACV is too small and reduce the sensing yield.
2.To enhance the accuracy, nvCIM should support multibit input and weight to make sure the sufficient precision at output. With the output precision getting higher, it increases the access time and power consumption
Thus, this paper proposed a current sense amplifier with process variation tolerance and enhance the sensing margin. Due to the feature for sensing dual-bit at single operation cycle, the proposed scheme can enhance the sensing speed and reduce the energy consumption compared to conventional single-bit scheme. The equivalent offset suppression of proposed scheme is 2.1~2.6x smaller than conventional works under large summation current. The read yield can be improved at least 16%. By implementing in multibit input and weight nvCIM structure, the proposed scheme can achieve 1.36~1.43x faster speed and 1.2~1.28x lower energy consumption when output precision are 4bit and 6bit compared to conventional works.
Finally, the proposed scheme is verified in a 2Mb ReRAM macro fabricated in TSMC 55nm CMOS process. The operation voltage is 1V. The measured sensing time of proposed scheme is 3.6ns and conventional scheme is 3.4ns. The access time at 4bit input and 4bit weight nvCIM structure is 22.2ns with 6bit output.
[1] K. Itoh et al., “VLSI Memory Chip Design”, Springer-Verlag, pp. 1-46, 2001.
[2] M. Bohr, "The new era of scaling in an SoC world," 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2009, pp. 23-28.
[3] F. Menichelli et al., “Static Minimization of Total Energy Consumption in Memory Subsystem for Scratchpad-Based Systems-on-Chips,” IEEE Transactions on Very Large Scale Integration Systems, vol. 17, issue 2, pp. 161-171, Jan. 2009.
[4] J. Li, R. K. Montoye, M. Ishii and L. Chang, “1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing,” IEEE Journal of Solid-State Circuits, vol. 49, Issue 4, pp. 896-907, April. 2014.
[5] M. F. Chang, C. C. Lin, A. Lee, C. C. Kuo, G. H. Yang, H. J. Tsai, T. F. Chen, S. S. Sheu, P. L. Tseng, H. Y. Lee, T. K. Ku, “A 3T1R Nonvolatile TCAM Using MLC ReRAM with Sub-1ns Search Time,” IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 1-3, Feb. 2015.
[6] D. Smith, J. Zeiter, T. Bowman, J. Rahm, B. Kertis, A. Hall, S. Natan, L. Sanderson, R. Tromp, J. Tsang, “A 3.6ns 1Kb ECL I/O BiCMOS U.V. EPROM,” IEEE International Symposium on Circuits and Systems, vol. 3, pp. 1987-1990, May 1990.
[7] C. Kuo, M. Weidner, T. Toms, H. Choe, K. M. Chang, A. Harwood, J. Jelemensky, P. Smith, “A 512-kb flash EEPROM embedded in a 32-b microcontroller,” IEEE Journal of Solid-State Circuits, vol. 27, Issue 4, pp. 574-582, Apr. 1992.
[8] J. Li et al., “1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing,” IEEE Journal of Solid-State Circuits (JSSC), vol. 49, Issue 4, pp. 896-907, Apr. 2014.
[9] S. L. Min, E. H. Nam, “Current trends in flash memory technology,” IEEE Asia and South Pacific Conference on Design Automation, pp. 24-27, Jan. 2006.
[10] F. Masuoka, M. Momodomi, Y. Iwata, R. Shirota, "New ultra high density EPROM and flash EEPROM with NAND structure cell," IEEE International Electron Devices Meeting Digest of Technical Papers, vol. 33, pp. 552-555, 1987.
[11] A. Bergemont, H. Haggag, L. Anderson, E. Shacham, G. Wolstenholme, "NOR virtual ground (NVG)-a new scaling concept for very high density flash EEPROM and its implementation in a 0.5 um process," IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 15-18, Dec. 1993.
[12] R. Bez, E. Camerlenghi, A. Modelli, A. Visconti, "Introduction to Flash Memory," Proceeding of the IEEE, vol. 91, Issue 4, pp. 489-502, April 2003.
[13] Y. Koh, “NAND Flash Scaling beyond 20nm,” IEEE Internstional Memory Workshop, pp. 1-3, May 2009.
[14] K. Prall, “Scaling Non-Volatile Memory Below 30nm,” IEEE Non-Volatile Semiconductor Memory Workshop, pp. 5-10, Aug. 2007.
[15] S. Lee, "Scaling Challenges in NAND Flash Device toward 10nm Technology," IEEE International Memory Workshop, pp. 1-4, May 2012.
[16] J. Kim, A. J. Hong, S. M. Kim, E. B. Song, J. H. Park, J. Han, S. Choi, D. Jang, J. T. Moon, K. L.Wang, “Novel Vertical-Stacked-Array-Transistor (VSAT) for ultra-high-density and cost-effective NAND Flash memory devices and SSD (Solid State Drive),” IEEE Symposium on VLSI Technology Digest of Technical Papers, pp. 186-187, June 2009.
[17] H. Noguchi et al., "4Mb STT-MRAM-based cache with memory-access-aware power optimization and write-verify-write / read-modify-write scheme," IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 132-133, Feb. 2016.
[18] G. D. Sandre, L. Bettini, A. Pirola, L. Marmonier, M. Pasotti, M. Borghi, P. Mattavelli, P. Zuliani, L. Scotti, G. Mastracchio, F. Bedeschi, R. Gastaldi, R. Bez, "A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput," IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 268-269, Feb. 2010.
[19] D. Takashima, Y. Nagadomi and T. Ozaki, "A 100MHz Ladder FeRAM Design with Capacitance-Coupled-Bitline (CCB) Cell," IEEE Symposium on VLSI Circuits Digest of Technical Papers, pp. 227-228, June 2010.
[20] K. Aratani, K. Ohba, T. Mizuguchi, S. Yasuda, T. Shiimoto, T. Tsushima, T. Sone, K. Endo, A. Kouchiyama, S. Sasaki, A. Maesaka, N. Yamada, and H. Narisawa, “A Novel Resistance Memory with High Scalability and Nanosecond Switching,” IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 10-12, Dec. 2007.
[21] D. Kuzum et al., “Nanoelectronic programmable synapses based on phase change materials for brain-inspired computing,” Nano Letters 12 (5), 2179-2186, 2012.
[22] S. Li et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1-6, 2016.
[23] F. Su et al., “A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory,” IEEE Symposium on VLSI Circuits, pp. C260-C261, 2017.
[24] J. Zhang et al., “In-memory computation of a machine-learning classifier in a standard 6T SRAM array,” IEEE J. Solid State Circuits, vol. 52, no. 4, pp. 915-924, Apr. 2017.
[25] W.-S. Khwa et al., “A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors,” IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, pp. 496-497, Feb. 2018.
[26] S. K. Gonugondla et al., “A variation-tolerant in-memory machine learning classifier via on-chip training,” IEEE Journal of Solid-State Circuits, vol. 53, no. 11, pp. 3163-3173, November 2018.
[27] W.-H. Chen et al., “A 65nm 1Mb Nonvolatile Computing-in-Memory ReRAM Macro with sub-16ns Multiply-and-Accumulate for Binary DNN AI Edge Processors,” IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, pp. 494-495, Feb. 2018
[28] R. Mochida et al., “A 4M Synapses integrated Analog ReRAM based 66.5 TOPS/W Neural-Network Processor with Cell Current Controlled Writing and Flexible Network Architecture,” IEEE Symposium on VLSI Technology, pp. 175-176, 2018.
[29] A. Ney et al., “Programmable computing with a single magnetoresistive element. Nature 425, 485-487, 2003.
[30] J. Borghetti et al., “‘Memristive’ switches enable ‘stateful’ logic operations via material implication” Nature 464, 873–876, 2010.
[31] H. Li et al., “Hyperdimensional Computing with 3D VRRAM In-Memory Kernels: Device-Architecture Co-Design for Energy-Efficient, Error-Resilient Language Recognition” In Tech. Digest International Electron Devices Meeting (IEDM), 16.1.1-16.1.4, 2016
[32] B. Chen et al., “Efficient in-memory computing architecture based on crossbar arrays” In Tech. Digest International Electron Devices Meeting (IEDM), 16.5.1-16.5.4, 2015.
[33] M. Prezioso et al., “Training and operation of an integrated neuromorphic network based on metal-oxide memristors” Nature 521, 61-64, 2015.
[34] P. Yao et al., “Face classification using electronic synapses” Nat. Commu. 8, 15199, 2017.
[35] P. Sheridan et al., “Sparse coding with memristor networks” Nat. Nanotech. 12, 784–789, 2017.
[36] C. Li et al., “Analogue signal and image processing with large memristor crossbars” Nat. Electron 1, 52-59, 2018.
[37] Z. Wang et al., “Fully memristive neural networks for pattern classification with unsupervised learning” Nat. Electron 1, 137–145, 2018.
[38] S. Ambrogio et al., “Equivalent-accuracy accelerated neural-network training using analogue memory” Nature 558, 60–67, 2018.
[39] F. Wu et al., “Brain-Inspired Computing Exploiting Carbon Nanotube FETs and Resistive RAM: Hyperdimensional Computing Case Study,” IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, pp. 492-493 Feb. 2018
[40] C.-X. Xue et al., “A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing time for CNN-based AI Edge Processors,” IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, pp. 388-390 Feb. 2019
[41] Y. H. Tseng et al., “High density and ultra small cell size of Contact ReRAM (CR-RAM) in 90nm CMOS logic technology and circuits,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2009.
[42] C. H. Ho et al., “A Highly Reliable Self-Aligned Graded Oxide WOx Resistance Memory: Conduction Mechanisms and Reliability,” IEEE Symposium on VLSI Technology, pp. 228-229, Jun. 2007.
[43] M. J. Lee et al., “2-stack 1D-1R Cross-point Structure with Oxide Diodes as Switch Elements for High Density Resistance RAM Applications,” IEEE International Electron Devices Meeting (IEDM), pp. 771-774, Dec. 2007.
[44] H. Y. Lee et al., “Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM,” IEEE International Electron Devices Meeting (IEDM), pp. 1-4, Dec. 2008.
[45] M. J. Lee, Y. Park, B. S. Kang, S. E. Ahn, C. Lee, K. Kim, Wenxu. Xianyu, G. Stefanovich, J. H. Lee, S. J. Chung, Y. H. Kim, C. S. Lee, J. B. Park, I. G. Baek, I. K. Yoo,” 2-stack 1D-1R Cross-point Structure with Oxide Diodes as Switch Elements for High Density Resistance RAM Applications,” IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 771-774, Dec. 2007.
[46] B. Lee and H.S. Philip Wong, "NiO resistance change memory with a novel structure for 3D integration and improved confinement of conduction path," IEEE Symposium on VLSI Technology Digest of Technical Papers, pp. 28-29, June 2009.
[47] K. Aratani, K. Ohba, T. Mizuguchi, S. Yasuda, T. Shiimoto, T. Tsushima, T. Sone, K. Endo, A. Kouchiyama, S. Sasaki, A. Maesaka, N. Yamada, and H. Narisawa, “A Novel Resistance Memory with High Scalability and Nanosecond Switching,” IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 10-12, Dec. 2007.
[48] C. H. Wang, Y.H. Tsai, K.C. Lin, M.F. Chang, Y.C. King, C.J Lin "Three-dimensional 4F2 ReRAM cell with CMOS logic compatible process," IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 29.6.1-29.6.4, Dec. 2010.
[49] J. Lee, J. Shin, D. Lee, W. Lee, S. Jung, M. Jo, J. Park, K. P. Biju, S. Kim, S. Park, H. Hwang, "Diode-less nano-scale ZrOx/HfOx ReRAM device with excellent switching uniformity and reliability for high-density cross-point memory applications," IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 19.5.1-19.5.4, Dec. 2010.
[50] J. Colinge, et al., “Physics of Semiconductior Devices,” Springer-Verlag, NY, pp. 175-182, 2002.
[51] M. Jefremow, et al., “Time-differential sense amplifier for sub-80mV bitline voltage embedded STT-MRAM in 40nm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 216-217, 2013.
[52] C. Kim, et al., “A covalent-bonded cross-coupled current-mode sense amplifier for STT-MRAM with 1T1MTJ common source-line structure array,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 1-3, 2015.
[53] Q. Dong, et al., “A 1Mb embedded NOR flash memory with 39µW program power for mm-scale high-temperature sensor nodes,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 198-199, 2017.
[54] P. Jain, et al , “ A 3.6Mb 10.1Mb/mm2 Embedded Non-Volatile ReRAM Macro in 22nm FinFET Technology with Adaptive Forming/Set/Reset Schemes Yielding Down to 0.5V with Sensing Time of 5ns at 0.7V, ” IEEE International Solid-State Circuits Conference (ISSCC), pp. 212-213, 2019.