研究生: |
林威宇 Lin, Wei-Yu |
---|---|
論文名稱: |
應用於非揮發性記憶體低功耗與低峰值電流之高帶寬讀取架構 A Low Peak Current and Low Power Read Scheme for High Bandwidth Non-Volatile Memory |
指導教授: |
張孟凡
Chang, Meng-Fan |
口試委員: |
謝志成
Hsieh, Chih-Cheng 許世玄 Sheu, Shyh-Shyuan |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電子工程研究所 Institute of Electronics Engineering |
論文出版年: | 2018 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 87 |
中文關鍵詞: | 感測放大器 、低功耗 、低峰值電流 |
外文關鍵詞: | Sense amplifier, Low power, Low peak current |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在目前記憶體階層中,儲存裝置以快閃記憶體為主,不過因其低存取速度及低耐久性,使之與主記憶體之動態存取記憶體存在效能落差,將會造成記憶體系統之效能瓶頸。新興之記憶體概念”儲存級記憶體(SCM)”作為儲存裝置及主記憶體間溝通之橋梁,一些新興的非揮發式記憶體(例如: 電阻式記憶體、磁阻式記憶體)因其低操作電壓與優異的表現,已經成為取代快閃記憶體且應用於儲存級記憶體之最佳可能選擇。在眾多新興非揮發式記憶體中,自旋力矩轉移-磁阻式隨機存取記憶體因其高速存取、高耐久性與非揮發的特性,將在此新領域中扮演重要之角色。然而自旋力矩轉移-磁阻式隨機存取記憶體在讀取上會將面臨以下之挑戰,:
1. 自旋力矩轉移-磁阻式隨機存取記憶體之穿隧式磁阻比例(TMR-Ratio)小的特性及製程飄移現象,造成資料0與資料1之阻值差異性小,導致讀取良率低落。
2. 自旋力矩轉移-磁阻式隨機存取記憶體之元件阻值偏低,造成在長位元線的情況下,資料0與資料1的阻值差異會被位元線的寄生電阻稀釋,造成讀取良率低落。
此外,在新一波人工智慧的浪潮下,隨著深度學習網路的發展,內嵌式記憶體系統之帶寬需求不斷增加。隨著在記憶體巨集(macro)內置入更多的讀取電路,高功耗與高峰值電流帶來的問題需要被正視。
在此次碩士論文中將探討當需要在自旋力矩轉移-磁阻式隨機存取記憶體巨集中實現高帶寬的需求時會遇到的挑戰以及新提出的讀取架構是如何應對這些問題。我們提出的讀取架構包含兩個部分,分別是
1. 自適應局域參考電流產生機制
2. 兩比特電流感測放大器
此架構具備了電流域度放大、製程變異容忍、低峰值電流與低能耗的特性來解決前述困境。
在台積電22奈米製程模擬分析下,我們提出的讀取架構的相較於傳統電流感測放大器可容忍3倍至7.5倍以下之可讀取TMR-比例,並相對傳統電流讀取在位元線長度為512個記憶胞(memory cell)與1024個記憶胞減少48%及53%的能量消耗。在峰值電流方面,新的讀取架構相對於傳統電流讀取架構可以減少58%與59%之峰值電流。此外,在特別為深度學習網路設計的巨集讀取模式下,相對於傳統架構可以節省58%的能耗。
我們與台積電合作並以22奈米與55奈米製程實作2Mb/0.5Mb STT-MRAM/ReRAM記憶體晶片且本篇的量測結論暫時以55奈米為主。在正常操作電壓1V下,當位元線長度為512記憶胞,量測提出之電路讀取速度為6.5奈秒(ns)。
In current memory system hierarchy. The performance gap is occurring between main memory (DRAM) and storage (flash memory) due to slow access speed and low endurance of flash memory. The performance gap will cause memory system performance bottleneck. A new memory concept, storage class memory (SCM) is proposed, which act as the bridge between storage memory and main memory. Emerging non-volatile memories (i.e. ReRAM or MRAM) require lower operating voltage with better performance than Flash memory and aim to replace flash memory in SCM. Among all of the emerging non-volatile memories, Spin Torque Transfer-Magnetoresistive Random Access Memory (STT-MRAM) is a promising candidate for its high speed accesses, high endurance and non-volatility. However, STT-MRAM faces some challenges:
1. The low TMR-ratio of STT-MRAM cell and process variation effect, resulting in small resistance difference between data-0 and data-1. Therefore, leading to low sensing yield.
2. The low resistance value of STT-MRAM cell makes it vulnerable to the high parasitic resistance on long BL length, which resulting in low sensing yield.
In addition, in the next A.I. era, deep neural network (DNN) plays a role and requires the high throughput in embedded memory system. With more sensing amplifiers is implemented in one macro, the large energy consumption and high peak current issue should be handled.
In this work, we discuss the issue in high bandwidth reading especially with STT-MRAM, and proposed a read scheme includes:
1. Adaptive- Local Reference current generating scheme (AL-REF)
2. 2-bit Current Mode Sense Amplifier (2B-CSA)
which is featured with margin enhancement, offset suppression, low peak current, and low energy consumption capability to solve the problems mentioned before in STT-MRAM high bandwidth macro.
Under tsmc 22 nm technology analysis, our proposed work achieves >3x~7.5x lower tolerance on TMR ratio at a given R¬P than conventional scheme. Moreover, it can achieve 48% and 53% lower read energy than conventional SA under the BL length equals 512/1024 cells. The peak current is reduced by 58% and 59% compared to conventional SA under the BL length of 512/1024 cells. In addition, the macro energy consumption is improved by 58% with the specially designed operation mode for DNN.
Finally, our proposed scheme is verified in both a 0.5Mb ReRAM macro fabricated in TSMC 55nm CMOS process and a 2Mb STT-MRAM macro fabricated in TSMC 22nm CMOS process, and the measurement result is temporarily depending on 0.5Mb macro. The measured access time of proposed work is 6.5ns at typical VDD = 1V and BL-length = 512.
[1] Kiyoo Itoh, Takayasu Sakurai, “VLSI Memory Chip Design”, Springer-Verlag, NY, pp. 1-46, 2001.
[2] M. Bohr, "The new era of scaling in an SoC world," 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2009, pp. 23-28.
[3] F. Menichelli and M. Olivieri, "Static Minimization of Total Energy Consumption in Memory Subsystem for Scratchpad-Based Systems-on-Chips," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 2, pp. 161-171, Feb. 2009.
[4] D. Smith, J. Zeiter, T. Bowman, J. Rahm, B. Kertis, A. Hall, S. Natan, L. Sanderson, R. Tromp, J. Tsang, “A 3.6ns 1Kb ECL I/O BiCMOS U.V. EPROM,” IEEE International Symposium on Circuits and Systems, vol. 3, pp. 1987-1990, May 1990.
[5] C. Kuo et al., "A 512-kb flash EEPROM embedded in a 32-b microcontroller," in IEEE Journal of Solid-State Circuits, vol. 27, no. 4, pp. 574-582, Apr 1992.
[6] S. H. Kulkarni, Z. Chen, J. He, L. Jiang, M. B. Pedersen and K. Zhang, "A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37um2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS," in IEEE Journal of Solid-State Circuits, vol. 45, no. 4, pp. 863-868, April 2010.
[7] Y. H. Tsai et al., "45nm Gateless Anti-Fuse Cell with CMOS Fully Compatible Process," 2007 IEEE International Electron Devices Meeting, Washington, DC, 2007, pp. 95-98.
[8] Webfeet Inc., “Semiconductor industry outlook,” Non-Volatile Memory Conference, Santa Clara, CA., 2002
[9] Sang Lyul Min and Eyee Hyun Nam, "Current trends in flash memory technology," Asia and South Pacific Conference on Design Automation, 2006., Yokohama, 2006
[10] F. Masuoka, M. Momodomi, Y. Iwata and R. Shirota, "New ultra high density EPROM and flash EEPROM with NAND structure cell," 1987 International Electron Devices Meeting, 1987, pp. 552-555.
[11] A. Bergemont, H. Haggag, L. Anderson, E. Shacham and G. Wolstenholme, "NOR virtual ground (NVG)-a new scaling concept for very high density flash EEPROM and its implementation in a 0.5 um process," Proceedings of IEEE International Electron Devices Meeting, Washington, DC, USA, 1993, pp. 15-18.
[12] R. F. Freitas and W. W. Wilcke, “Storage-class memory: The next storage system
technology,” IBM Journal of Research and Development, vol. 52, no. 4-5, pp. 439-447,July 2008.
[13] R. Bez, E. Camerlenghi, A. Modelli, A. Visconti, "Introduction to Flash Memory," Proceeding of the IEEE, vol. 91, Issue 4, pp. 489-502, April 2003.
[14] Y. Koh, “NAND Flash Scaling beyond 20nm,” IEEE Internstional Memory Workshop, pp. 1-3, May 2009.
[15] K. Prall, “Scaling Non-Volatile Memory Below 30nm,” IEEE Non-Volatile Semiconductor Memory Workshop, pp. 5-10, Aug. 2007.
[16] S. Lee, "Scaling Challenges in NAND Flash Device toward 10nm Technology," IEEE International Memory Workshop, pp. 1-4, May 2012.
[17] H. Maejima et al., "A 512Gb 3b/Cell 3D flash memory on a 96-word-line-layer technology," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 336-338.
[18] C. Deml, M. Jankowski and C. Thalmaier, "A 0.13μm 2.125MB 23.5ns Embedded Flash with 2GB/s Read Throughput for Automotive Microcontrollers," 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, San Francisco, CA, 2007, pp. 478-617.
[19] M. Sako et al., "7.1 A low-power 64Gb MLC NAND-flash memory in 15nm CMOS technology," 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, San Francisco, CA, 2015, pp. 1-3.
[20] C. H. Hung et al., "Layer-Aware Program-and-Read Schemes for 3D Stackable Vertical-Gate BE-SONOS NAND Flash Against Cross-Layer Process Variations," in IEEE Journal of Solid-State Circuits, vol. 50, no. 6, pp. 1491-1501, June 2015.
[21] D. Kang et al., "256Gb 3b/cell V-NAND flash memory with 48 stacked WL layers," IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 130-131, Feb. 2016.
[22] H. Noguchi et al., "7.5 A 3.3ns-access-time 71.2μW/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture," 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, San Francisco, CA, 2015, pp. 1-3.
[23] H. Noguchi et al., "4Mb STT-MRAM-based cache with memory-access-aware power optimization and write-verify-write / read-modify-write scheme," IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 132-133, Feb. 2016.
[24] G. De Sandre et al., "A 90nm 4Mb embedded phase-change memory with 1.2V 12ns read access time and 1MB/s write throughput," 2010 IEEE International Solid-State Circuits Conference - (ISSCC), San Francisco, CA, 2010, pp. 268-269.
[25] D. Takashima, Y. Nagadomi and T. Ozaki, "A 100MHz Ladder FeRAM Design with Capacitance-Coupled-Bitline (CCB) Cell," IEEE Symposium on VLSI Circuits Digest of Technical Papers, pp. 227-228, June 2010.
[26] W. Otsuka et al., "A 4Mb conductive-bridge resistive memory with 2.3GB/s read-throughput and 216MB/s program-throughput," 2011 IEEE International Solid-State Circuits Conference, San Francisco, CA, 2011, pp. 210-211.
[27] K. Aratani et al., "A Novel Resistance Memory with High Scalability and Nanosecond Switching," 2007 IEEE International Electron Devices Meeting, Washington, DC, 2007, pp. 783-786.
[28] K. Rho et al., "23.5 A 4Gb LPDDR2 STT-MRAM with compact 9F2 1T1MTJ cell and hierarchical bitline architecture," 2017 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, pp. 396-397
[29] J. C. S. Kools, "Exchange-biased spin-valves for magnetic storage," in IEEE Transactions on Magnetics, vol. 32, no. 4, pp. 3165-3184, Jul 1996.
[30] S. Tehrani, J. M. Slaughter, E. Chen, M. Durlam, J. Shi and M. DeHerren, "Progress and outlook for MRAM technology," in IEEE Transactions on Magnetics, vol. 35, no. 5, pp. 2814-2819, Sep 1999.
[31] S. Tehrani et al., "Recent developments in magnetic tunnel junction MRAM," in IEEE Transactions on Magnetics, vol. 36, no. 5, pp. 2752-2757, Sep 2000.
[32] K. C. Chun, H. Zhao, J. D. Harms, T. H. Kim, J. P. Wang and C. H. Kim, "A Scaling Roadmap and Performance Evaluation of In-Plane and Perpendicular MTJ Based STT-MRAMs for High-Density Cache Memory," in IEEE Journal of Solid-State Circuits, vol. 48, no. 2, pp. 598-610, Feb. 2013.
[33] Alexander Driskill-Smith, "New Samsung Open Innovation Program For STT-MRAM Technology - An Interview With Alexander Driskill-Smith" AZO Materials Sep, 2013
[34] M. Hosomi et al., "A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram," IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest., Washington, DC, 2005, pp. 459-462.
[35] S. W. Chung et al., "4Gbit density STT-MRAM using perpendicular MTJ realized with compact cell structure," 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 27.1.1-27.1.4.
[36] C. Park et al., "Systematic optimization of 1 Gbit perpendicular magnetic tunnel junction arrays for 28 nm embedded STT-MRAM and beyond," 2015 IEEE International Electron Devices Meeting (IEDM), Washington, DC, 2015, pp. 26.2.1-26.2.4.
[37] H. Noguchi et al., "Novel voltage controlled MRAM (VCM) with fast read/write circuits for ultra large last level cache," 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 27.5.1-27.5.4.
[38] Y. J. Song et al., "Highly functional and reliable 8Mb STT-MRAM embedded in 28nm logic," 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 27.2.1-27.2.4.
[39] J. M. Slaughter et al., "Technology for reliable spin-torque MRAM products," 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 21.5.1-21.5.4.
[40] S. Song et al., "CMOS device scaling beyond 100 nm," International Electron Devices Meeting 2000. Technical Digest. IEDM (Cat. No.00CH37138), San Francisco, CA, USA, 2000, pp. 235-238.
[41] Jean-Pierre Colinge, Cynthia A. Colinge, “Physics of Semiconductior Devices.” Springer-Verlag, NY, pp. 175-182, 2002.
[42] E. Morifuji et al., "A 1.5 V high performance mixed signal integration with indium channel for 130 nm technology node," International Electron Devices Meeting 2000. Technical Digest. IEDM (Cat. No.00CH37138), San Francisco, CA, USA, 2000, pp. 459-462.
[43] C. H. Shih, Y. M. Chen and C. Lien, "Effect of insulated shallow extension for the improved short-channel effect of sub-100 nm MOSFET," International Semiconductor Device Research Symposium, pp. 158-159, Dec. 2003.
[44] S. Severi et al., "Diffusion-less junctions and super halo profiles for PMOS transistors formed by SPER and FUSI gate in 45 nm physical gate length devices," IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004., 2004, pp. 99-102.
[45] M. F. Chang et al., "An offset-tolerant current-sampling-based sense amplifier for Sub-100nA-cell-current nonvolatile memory," 2011 IEEE International Solid-State Circuits Conference, San Francisco, CA, 2011, pp. 206-208.
[46] M. Jefremow et al., "Time-differential sense amplifier for sub-80mV bitline voltage embedded STT-MRAM in 40nm CMOS," 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, 2013, pp. 216-217.
[47] C. Kim, et al., “A covalent-bonded cross-coupled current-mode sense amplifier for STT-MRAM with 1T1MTJ common source-line structure array,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 1-3, 2015.
[48] Q. Dong, et al., “A 1Mb embedded NOR flash memory with 39µW program power for mm-scale high-temperature sensor nodes,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 198-199, 2017.
[49] W. Chen et al., "A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 494-496.
[50] B. Giridhar, N. Pinckney, D. Sylvester and D. Blaauw, "13.7 A reconfigurable sense amplifier with auto-zero calibration and pre-amplification in 28nm CMOS," 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, 2014, pp. 242-243.
[51] Yi-Chun Shih et al.,, " Logic Process Compatible 40nm 16Mb, Embedded Perpendicular-MRAM with Hybrid-Resistance Reference, sub-μA Sensing Resolution, and 17.5nS Read Access Time," IEEE Symposium on VLSI Circuits Digest of Technical Papers , June 2018.
[52] G. Hu et al., "Key parameters affecting STT-MRAM switching efficiency and improved device performance of 400°C-compatible p-MTJs," 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2017, pp. 38.3.1-38.3.4.