簡易檢索 / 詳目顯示

研究生: 趙威丞
Zhao, Wei-Chang
論文名稱: 高速節能高密度的三元內容循址記憶體
A High Speed, Energy Efficient and High Density 14T Ternary Content Addressable Memory
指導教授: 張孟凡
Chang, Meng-Fan
口試委員: 鄭桂忠
Tang, Kea-Tiong
洪浩喬
Hong, Hao-Chiao
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 106
語文別: 英文
論文頁數: 58
中文關鍵詞: 三元內容循址記憶體靜態隨機存取記憶體記憶體
外文關鍵詞: TCAM, SRAM, Memory
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為了迎接物聯網所產生的大數據時代之來臨,雲端計算將會需要硬體方面能處理大量的資料傳輸、搜尋以及比對。
    三元內容循址記憶體提供了在一個計時器周期內比對整個二維陣列資料的能力。這種高速的性質造就了三元內容循址記憶體被廣泛應用在網路路由器和比較表。 然而傳統以靜態隨機存取記憶體為基底的三元內容循址記憶體將會面臨到兩大問題: 較大的記憶體單元面積以及較高的功耗。傳統的三元內容循址記憶體由16顆電晶體構成,其中以兩個六電晶體的靜態隨機存取記憶體和四電晶體組成的比較電路構成。這種16顆電晶體的結構造成過大的面積而提高了製造成本。另一方面,由於平行搜尋的特性,三元內容循址記憶體產生的大量的功耗,這會造成過低的搜尋效率以及過高的功耗密度。這個問題在先進製程下將會日趨嚴重。三大造成高功耗的部分分別為: 比對線、搜尋線以及記憶單元的漏電流,這三者占了搜尋功耗的一大部分。 而在三項之中,配對線佔據了最大量的比例。兩種降低配對線功耗最直觀的方法為: 用較低的電壓對配對線充電或是降低其上的電容。以前的設計通常採用第一種的方法。至於記憶單元的漏電流部分,降低記憶單元的電壓源可以有效的達到目的,然而這個方法將會在傳統操作上降低配對線放電速度或是在充電式操作上造成較小的電壓差異。
    總而言之,三元內容循址記憶體的設計上會遇到兩大問題為降低記憶單元面積以及功耗。在本次設計中,我們提出一種全新的14電晶體架構之記憶單元,此記憶單元將可以達到: (1) 相較於傳統設計有較小的單元面積 (2) 相較於傳統設計,配對線上有較小的負載而達到較低的功耗。
    由於這種新型的架構,其寫入以及搜尋的操作將會異於傳統。寫入部分不像傳統的雙邊寫入,我們採用單邊寫入並用直行方向的資料感應式記憶單元壓降搭配字元線升壓來幫助寫入的動作。
    本次晶片是透過二十八奈米製程技術搭配修改之後的緊密6電晶體靜態隨機存取記憶體構成一容量為三十二萬字元之14電晶體三元內容循址記憶體,藉由此新型架構以及操作技巧,本研究可達到
    1. 相較於傳統記憶單元,本作為0.9倍之記憶單元面積。
    2. 搜尋時間可達到0.9奈秒。
    3. 能量效率為0.546 fJ/search/bit。在標準化之後,在二八奈米或是更先進之技術
    中為最好的。


    With the coming of the era of Big Data caused by Internet-of-Things (IoT), cloud computing requires hardware with the ability to data transmission, searching and comparing.
    Ternary Content Addressable Memory (TCAM) provides the ability to compare input search data against an array of stored information in a single clock cycle. This high-speed property makes TCAM widely used in network routers and look-up tables. However, the conventional SRAM-based TCAM suffer from two main issues: larger bit cell area and high power. For a conventional TCAM, it is composed of 16 transistors including two 6T-SRAM bit cells and 4T-comparison logic. This 16T bit cell structure results in large area in TCAM array causing high cost. On the other hand, because the fully parallel search operation, TCAM produces large amount of power consumption, and leads to poor search efficiency and high power density. This problem will go more severe in advanced process. The three most power consuming parts in TCAM are: match-line (ML), search-line (SL) and bit cell leakage, they account for very great proportion of power consumption during searching. Among these three, ML consumes most power. The two direct methods to reduce ML power is to apply lower voltage to charge ML or reduce capacitance on ML. The previous works usually adopt the former. As for the bit cell leakage, reducing cell-VDD (CVDD) could efficiently decrease the leakage power. However, this method causes lower speed in traditional ML developing scheme or poor margin sensing margin in current racing scheme.
    Overall, the two big challenges to overcome in TCAM design is to reduce bit cell area and power consumption. In this work, we propose a brand new 14T TCAM bit cell to achieve following goals: (1) Smaller area than conventional 16T SRAM-based TCAM bit cell; (2) Less loading on ML to reduce power consumption of ML;
    Because the new structure of proposed cell, the write and search operation are different from conventional 16T-TCAM. Unlike conventional differential writing, 14T-TCAM adopt single-ended write and use column-wise data aware cell-VDD (CVDD) down scheme with word-line (WL) boost to assist the write margin.
    A 1024x288b 14T-TCAM macro is fabricated using 28nm process with modified foundry compact-cell 6T cell, which can achieve:
    1. 0.9 times area smaller than traditional 16T SRAM-based TCAM.
    2. 0.9ns search time.
    3. 0.546 fJ/search/bit Energy efficiency, which is the best in 28nm or advanced technology after normalization.

    Contents 摘要 i Abstract iii Contents v List of Figures viii List of Tables x Chapter 1 Introduction 1 1.1 Memory Landscape 1 1.1.1RAM 2 1.1.2CAM 3 1.1.3ROM 3 1.1.4Programmable NVMs 4 1.2 TCAM Background 4 1.3 Overview of the Thesis 6 Chapter 2 Introduction of SRAM 8 2.1 Conventional 6T-SRAM 8 2.1.1 Layout of 6T-SRAM 9 2.2 Write Operation 10 2.3 Read Operation 11 2.4 Static Noise Margin 12 2.4.1 Hold Static Noise Margin 12 2.4.2 Read Static Noise Margin 13 2.5 Assist Scheme for Read/Write in Previous Work 14 2.5.1 Word Line Boost 14 2.5.2 Negative Bit Line 15 2.5.3 Cell VDD Down 15 Chapter 3 16 3.1 Introduction of conventional TCAM 16 3.1.1 Schematic of conventional TCAM 16 3.1.2 Layout of conventional TCAM 17 3.2 Write Operation 17 3.3 Read Operation 18 3.4 Search Operation 19 3.5 ML structure 20 3.5.1 Nor-type Cell 21 3.5.2 Nand-type Cell 22 3.6 Common ML Developing Scheme 23 3.6.1 Low-Swing Scheme 23 3.6.2 Current-Race Scheme 25 3.6.3 Selective-Precharge Scheme 27 3.6.4 Pipelining Scheme 28 3.7 Summary 29 Chapter 4 Introduction and Analysis of 14T-TCAM 30 4.1 14T-TCAM Cell 30 4.1.1 Schematic of 14T-TCAM Cell 30 4.1.2 Layout of 14T-TCAM Cell 32 4.2 Write Operation 34 4.2.1 Challenges in Single-Ended Write 34 4.2.2 Write Operation of 14T-TCAM 35 4.3 Read Operation 37 4.4 Search Operation 38 4.4.1 Issues of 14T-TCAM Search Operation 40 4.5 Macro Structure 40 4.6 ML tracking 40 4.7 Performance Analysis 41 4.7.1 Speed Comparison 41 4.7.2 Energy Comparison 43 4.7.3 Overall Comparison 46 Chapter 5 Chip Implementation and Measurement 48 5.1 Chip Implementation 48 5.2 Measurement Result 50 5.3 Conclusion 52 Reference 53

    [1] A. G. Hanlon, "Content-Addressable and Associative Memory Systems a Survey," Electronic Computers, IEEE Transactions on , vol.EC-15, no.4, pp.509,521, Aug. 1966
    [2] C. C. Wang, C. J. Cheng, T. F. Chen and J. S. Wang, "An Adaptively Dividable Dual-Port BiTCAM for Virus-Detection Processors in Mobile Devices," Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International , vol., no., pp.390,622, 3-7 Feb. 2008
    [3] J. Li, R. K. Montoye, M. Ishii and L. Chang, “1 Mb 0.41 µm² 2T-2R Cell Nonvolatile TCAM With Two-Bit Encoding and Clocked Self-Referenced Sensing,” IEEE Journal of Solid-State Circuits, vol. 49, Issue 4, pp. 896-907, April. 2014.
    [4] M. F. Chang, C. C. Lin, A. Lee, C. C. Kuo, G. H. Yang, H. J. Tsai, T. F. Chen, S. S. Sheu, P. L. Tseng, H. Y. Lee, T. K. Ku, “A 3T1R Nonvolatile TCAM Using MLC ReRAM with Sub-1ns Search Time,” IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 1-3, Feb. 2015.
    [5] D. Smith, J. Zeiter, T. Bowman, J. Rahm, B. Kertis, A. Hall, S. Natan, L. Sanderson, R. Tromp, J. Tsang, “A 3.6ns 1Kb ECL I/O BiCMOS U.V. EPROM,” IEEE International Symposium on Circuits and Systems, vol. 3, pp. 1987-1990, May 1990.
    [6] C. Kuo, M. Weidner, T. Toms, H. Choe, K. M. Chang, A. Harwood, J. Jelemensky, P. Smith, “A 512-kb flash EEPROM embedded in a 32-b microcontroller,” IEEE Journal of Solid-State Circuits, vol. 27, Issue 4, pp. 574-582, Apr. 1992.
    [7] S. H. Kulkarni, Z. Chen, J. He, L. Jiang, M. B. Pedersen, K. Zhang, “A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 μm2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS,” IEEE Journal of Solid-State Circuits, vol. 45, Issue 4, pp. 863-868, Apr. 2010.
    [8] Y. H. Tsai, H. M. Chen, H. Y. Chiu, H. S. Shih, H. C. Lai, Y. C. King, C. J. Lin, “45nm Gateless Anti-Fuse Cell with CMOS Fully Compatible Process,” IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 95-98, Dec. 2007.
    [9] C. Kuo, M. Weidner, T. Toms, H. Choe, K. M. Chang, A. Harwood, J. Jelemensky, P. Smith, “A 512-kb flash EEPROM embedded in a 32-b microcontroller,” IEEE Journal of Solid-State Circuits, vol. 27, Issue 4, pp. 574-582, Apr. 1992.
    [10] S. H. Kulkarni, Z. Chen, J. He, L. Jiang, M. B. Pedersen, K. Zhang, “A 4 kb Metal-Fuse OTP-ROM Macro Featuring a 2 V Programmable 1.37 μm2 1T1R Bit Cell in 32 nm High-k Metal-Gate CMOS,” IEEE Journal of Solid-State Circuits, vol. 45, Issue 4, pp. 863-868, Apr. 2010.
    [11] Y. H. Tsai, H. M. Chen, H. Y. Chiu, H. S. Shih, H. C. Lai, Y. C. King, C. J. Lin, “45nm Gateless Anti-Fuse Cell with CMOS Fully Compatible Process,” IEEE International Electron Devices Meeting Digest of Technical Papers, pp. 95-98, Dec. 2007.
    [12] Webfeet Inc., “Semiconductor industry outlook,” Non-Volatile Memory Conference, Santa Clara, CA., 2002
    [13] Kostas Pagiamtzis,, Ali Sheikholeslami, “ntent-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey,” IEEE Journal of Solid-State Circuits, vol. 41, No. 3, pp. 712-726, Mar. 2006.
    [14] E. Seevinck, et al., "Static-noise margin analysis of MOS SRAM cells," IEEE J. Solid-State Circuits, vol. 22, pp. 748-754, Oct. 1987.
    [15] A. Agarwal, et al., "A 320mV-to-1.2V On-Die Fine-Grained Reconfigurable Fabric for DSP/Media Accelerators in 32nm CMOS," ISSCC Dig. Tech. Papers, pp. 328-329, Feb. 2010.
    [16] M. Wieckowski and M. Margala, "A portless SRAM Cell using stunted wordline drivers," in Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pp. 584-587, 2008.
    [17] M. Wieckowski, et al., "Portless SRAM-A High-Performance Alternative to the 6T Methodology," IEEE J. Solid-State Circuits, vol. 42, pp. 2600-2610, Nov. 2007.
    [18] D. P. Wang, et al., "A 45nm dual-port SRAM with write and read capability enhancement at low voltage," in SOC Conference, 2007 IEEE International, pp. 211-214, 2007.
    [19] Yuki Fujimura, et al., "A Configurable SRAM with Constant-Negative-Level Write Buffer for Low-Voltage Operation with 0.149µm2 Cell in 32nm High-κ Metal-Gate CMOS," IEEE ISSCC Dig. Tech. Papers, pp. 348-349, 2010.
    [20] H. Pilo, et al., "A 64Mb SRAM in 32nm High-k Metal-Gate SOI Technology with 0.7V Operation Enabled by Stability, Write-Ability and Read-Ability Enhancements," IEEE ISSCC Dig. Tech. Papers, pp. 254-256, 2011.
    [21] T. Song, et al., " A 14nm FinFET 128Mb 6T SRAM with VMIN-Enhancement Techniques for Low-Power Applications," IEEE ISSCC Dig. Tech. Papers, pp. 232-233, 2014.
    [22] Y.H Chen, et al., "A 16nm 128Mb SRAM in High-κ Metal-Gate FinFET Technology with Write-Assist Circuitry for Low-VMIN Applications," IEEE ISSCC Dig. Tech. Papers, pp. 238-239, 2014.
    [23] A. Roth, D. Foss, R. McKenzie, and D. Perry, “Advanced ternary CAM circuits on 0.13 um logic process technology,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 2004, pp. 465–468.
    [24] Yasumasa Tsukamoto, Masao Morimoto, Makoto Yabuuchi, Miki Tanaka and Koji Nii, “1.8 Mbit/mm2 Ternary-CAM macro with 484 ps Search Access Time in 16 nm Fin-FET Bulk CMOS Technology,” in in VLSI Circuits, 20015 Symposium on, pp. C274-C275, 2015.
    [25] S. Choi, K. Sohn, M.-W. Lee, S. Kim, H.-M. Choi, D. Kim, U.-R. Cho, H.-G. Byun,Y.-S. Shin, and H.-J. Yoo, “A 0.7 fJ/bit/search, 2.2 ns search time hybrid type TCAM architecture,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 2004, pp. 498–499.
    [26] S. Choi, K. Sohn, and H.-J. Yoo, “A 0.7 fJ/bit/search, 2.2-ns search time hybrid-type TCAM architecture,” IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 254–260, Jan. 2005.
    [27] K. J. Schultz, F. Shafai, G. F. R. Gibson, A. G. Bluschke, and D. E. Somppi, “Fully parallel 25 MHz, 2.5-Mb CAM,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, 1998, pp. 332–333.
    [28] F. Shafai, K. J. Schultz, G. F. R. Gibson, A. G. Bluschke, and D. E. Somppi, “Fully parallel 30-MHz, 2.5-Mb CAM,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1690–1696, Nov. 1998.
    [29] G. Kasai,Y. Takarabe, K. Furumi, and M.Yoneda, “200 MHz/200 MSPS 3.2 W at 1.5 V Vdd, 9.4 Mbits ternary CAM with new charge injection match detect circuits and bank selection scheme,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 2003, pp. 387–390.
    [30] M. M. Khellah and M. Elmasry, “Use of charge sharing to reduce energy consumption in wide fan-in gates,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 2, 1998, pp. 9–12.
    [31] I. Arsovski, T. Chandler, and A. Sheikholeslami, “A ternary contentaddressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme,” IEEE J. Solid-State Circuits, vol. 38, no. 1, pp. 155–158, Jan. 2003.
    [32] C. A. Zukowski and S.-Y. Wang, “Use of selective precharge for lowpower content-addressable memories,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 3, 1997, pp. 1788–1791.
    [33] I. Y.-L. Hsiao, D.-H. Wang, and C.-W. Jen, “Power modeling and low-power design of content addressable memories,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 4, 2001, pp. 926–929.
    [34] A. Efthymiou and J. D. Garside, “An adaptive serial-parallel CAM architecture for low-power cache blocks,” in Proc. IEEE Int. Symp. Low Power Electronics and Design (ISLPED), 2002, pp. 136–141.
    [35] “A CAM with mixed serial-parallel comparison for use in low energy caches,” IEEE Trans. VLSI Syst., vol. 12, no. 3, pp. 325–329, Mar. 2004.
    [36] N. Mohan and M. Sachdev, “Low power dual matchline content addressable memory,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 2, 2004, pp. 633–636.
    [37] K.-H. Cheng, C.-H.Wei, and S.-Y. Jiang, “Static divided word matchline line for low-power content addressable memory design,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 2, 2004, pp. 629–632.
    [38] K. Pagiamtzis and A. Sheikholeslami, “Pipelined match-lines and hierarchical search-lines for low-power content-addressable memories,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 2003, pp. 383–386.
    [39] “A low-power content-addressable memory (CAM) using pipelined hierarchical search scheme,” IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 1512–1519, Sep. 2004.
    [40] J. M. Hyjazie and C. Wang, “An approach for improving the speed of content addressable memories,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 5, 2003, pp. 177–180.
    [41] P.-T. Huang, et al. , “A 65 nm 0.165 fJ/Bit/search 256x144 TCAM Macro Design for IPv6 Lookup Tables,” IEEE J. Solid-State Circuits, vol. 46, no. 2, pp. 507–519, Feb. 2011.
    [42] [1] I. Arsovski, et al., “1.4Gsearch/s 2Mb/mm2 TCAM Using Two-Phase-Precharge ML Sensing and Power-Grid Pre-Conditioning to Reduce Ldi/dt Power-Supply Noise by 50%”, ISSCC, pp. 212-213, Feb. 2017.
    [43] Y. Tsukamoto, et al., “1.8 Mbit/mm2 Ternary-CAM macro with 484 ps Search Access Time in 16 nm Fin-FET Bulk CMOS Technology”, IEEE Symp. VLSI Circuits, pp. 274-275, June 2015.
    [44] K. Nii, et al., “A 28nm 400MHz 4-parallel 1.6Gsearch/s 80Mb ternary CAM”, ISSCC, pp. 240-241, Feb. 2014.

    QR CODE