簡易檢索 / 詳目顯示

研究生: 石修銓
Shih, Hsiu-Chuan
論文名稱: 動態記憶體的組件合成法快速模型建立與設計探索
Component-Based Fast Modeling and Design Exploration for DRAM
指導教授: 吳誠文
Wu, Cheng-Wen
口試委員: 吳誠文
張彌彰
張孟凡
李昆忠
呂學坤
黃世安
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 112
中文關鍵詞: 動態記憶體模型建立設計空間探索
外文關鍵詞: DRAM, modeling, design space exploration
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 處理器與動態記憶體 (DRAM) 的效能落差被稱為記憶體牆 (memory wall)。若研究人員與開發者沒正視此問題,記憶體牆將會日漸嚴重。因此除了加強處理器與記憶體介面外,動態記憶體本身的效能強化也被迫切需要,不論是在原件、電路、或架構上。現在正需要發展新的動態記憶體,於此同時動態記憶體的模型工具 (modeling tools) 在設計探索與效能評估上顯得重要。現有的模型工具在效率上並不足以實際面對記憶體牆,此外這些模型工具缺乏架構彈性 (architecture flexibility) 去探索不同的記憶體架構。在這篇論文中,我們導入組件 (component) 概念且提出以元件為基礎的 (component-based) 動態記憶體模型建立方法。此方法將動態記憶體抽象成框架 (framework),包含了組件層級的記憶體架構、陣列 (array)、平面圖 (floorplan)、全顆粒 (whole chipe)、以及介面 (interface)。以此抽象化為基礎,我們開發了模型建立工具能夠準確的預測矽面積 (silicon area)、延遲 (delay)、與耗能 (power),並且擁有高架構彈性與短執行時間。我們的工具被使用在最先進且當前模型無法支援的動態記憶體設計。此外我們改善傳統的電阻電容電壓模型 (RC-delay model and CV-charge models) 來達到更高的準確度,此準確度在實驗中以一商用的雙倍資料速率二代 (DDR2 DRAM) 來驗證。以原件模型為基礎,我們進一步提出一般化架構 (generalized-architecture) 的設計探索方法。此方法中,我們提出變數連結 (variable links) 的概念用來代表變數與變數之間的關係。透過處理變數與變數連結,我們的演算法可以展開設計空間並且降低複雜度。在我們與工研院合作計畫中,此方法被實作來找出全陣列測試晶片 (full array test chip) 的候選者 (candidates)。此測試晶片為一系穿孔堆三維動態記憶體 (TSV-based 3D DRAM),擁有10ns的低延遲與100GB/s的高頻寬。此外我們也運用這個探索方法來證實某一記憶體架構在寬輸出輸入 (Wide-IO) 介面上,相較於傳統架構擁有較大的發展空間。以組件為基礎的模型建立與架構探索提供系統設計人員可行的方法去辨識更好的動態記憶體系統,進而改善問題的根源 (記憶體牆)。


    It is a common understanding that there is a performance gap between the processor and the DRAM in a computer system, which is called the memory wall. The memory wall can become more and more serious if not properly addressed by researchers and practitioners. Therefore, in addition to the improvement of the processor-DRAM interface, there is also drastic need in performance enhancement of the DRAM itself, from device, circuit, to architecture. New DRAM designs need to be developed, and the DRAM modeling tools are crucial to design exploration and evaluation. The existing modeling tools used in design exploration are not efficient enough so far as closing the gap is concerned. Also, there is a lack of flexibility for such tools to explore different architectures. In this thesis, we introduce the notion of component and propose a component-based DRAM modeling method. In this method, we abstracted a DRAM design with a framework, containing the DRAM architecture at the component level, the arrays, the floorplan, the whole chip, and the interface. Based on the abstraction, a modeling tool has been developed to accurately predict the silicon area, delay, and power of the DRAM with high architecture flexibility and short computation time. Our tool has been used for modeling state-of-the-art DRAM designs not supported by the prior works. We also have improved the traditional RC-delay model and CV-charge model to achieve higher accuracy. The modeling accuracy is verified by a commodity DDR2 DRAM in our experiment. With the component-based approach, we also propose a generalized- architecture exploration algorithm, in which we introduce the concept of variable links, representing the relation between the variables. By dealing with the variables and their links, our method can expand the exploration space with the proposed algorithm to reduce the complexity. In our collaboration with ITRI, this method has successfully identified the full array architecture candidates for a DRAM test chip, which is a TSV-based 3D DRAM die (to be used for DRAM die stacking) with a low latency of 10ns and high bandwidth of 100GB/s. We also have used this method to find the array style that has higher potential for the Wide-IO interface than the traditional one. The modeling and exploration approach proposed in this thesis provides system designers with a way to efficiently identify a better DRAM memory system to minimize the memory wall.

    Abstract i Contents iii List of Figures vi List of Tables ix Chapter 1 Introduction 1 1.1. Background 1 1.2. Related Works 2 1.3. Proposed Approaches 3 1.4. Dissertation Organization 3 1.5. Acknowledgment 3 Chapter 2 Background and Related Works 5 2.1 DRAM Architecture and Interface 5 2.1.1 Floorplan 5 2.1.2 Array Architecture 7 2.1.3 Interface 9 2.2 RC-Delay Model and CV-Charge Model 9 2.3 DRAM Modeling Tools 11 2.4 Related Works for DRAM Exploration 12 Chapter 3 Component-Based Modeling Method 14 3.1 DRAM Design Abstraction 14 3.1.1 Variables 14 3.1.2 Framework 17 3.1.3 Input Data Preparation 18 3.2 Program Structure and Execution Flow 20 3.3 Components 22 3.3.1 Scaling Methods 28 3.4 Arrays 30 3.4.1 Idle Handling 40 3.4.2 Boundary Array 42 3.5 Whole Chip 44 3.5.1 Floorplanning 45 3.5.2 Routing Length Estimation 50 3.6 Interface 55 3.7 Variable Replacement 56 3.8 Accuracy Enhancement 57 3.8.1 Connection of Operations 58 3.8.2 Input Transition and Velocity Saturation 58 3.8.3 Variant Bias 59 3.8.4 Overdriving 60 3.8.5 Short Circuit Current 60 3.8.6 Simulation Inaccuracy 62 Chapter 4 Design Exploration 63 4.1 Links of Variables 64 4.2 Fixed-Framework Exploration 68 4.3 Generalized-Architecture Exploration 73 4.3.1 Pre-process for Links 75 4.3.2 Bank Array Exploration 78 4.3.3 Two-Phase Exploration 80 Chapter 5 Experimental Results 84 5.1 Accuracy Validation 84 5.1.1 Modeling of 68nm DDR2 SDRAM 87 5.1.2 Prediction of 45nm DDR2 SDRAM 89 5.2 Case Studies for Related Works 92 5.3 ITRI’s Test Chips 95 5.3.1 63nm Scribe Line Test Chip (SLTC) 95 5.3.2 45nm Full Array Test Chip (FATC) 97 5.4 Wide-I/O Array Architecture 101 Chapter 6 Conclusions and Future Work 104 6.1 Conclusions 104 6.2 Future Work 105 6.2.1 Integration with Full System Platform 105 6.2.2 DRAM Architecture Design 105 6.2.3 DRAM Compiler 106 6.2.4 Yield and Reliability Extension 106 6.2.5 Non-volatile Memory Modeling 106 Bibliography 108

    [1] W. A. Wulf and S. A. Mckee, “Hitting the memory wall: Implications of the obvious,” Computer Architecture News, Vol. 23, No. 1, pp. 20–24, Mar. 1995.
    [2] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, “Memory access scheduling,” in Proc. Int'l Symp. Computer Architecture (ISCA), May 2000, pp. 128– 138.
    [3] JEDEC Double Data Rate (DDR) SDRAM Standard JESD79F, Feb. 2008. http://www.jedec.org/sites/default/files/docs/JESD79F_0.pdf
    [4] JEDEC Double Data Rate 4 (DDR4) SDRAM Standard JESD79-4, Sep. 2012. http://www.jedec.org/sites/default/files/docs/JESD79-4.pdf
    [5] Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, “A case for exploiting subarray-level parallelism (SALP) in DRAM,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2012, pp. 368–379.
    [6] D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, “Tiered-latency DRAM: a low latency and low cost DRAM architecture,” in Proc. Int’l Symp. High Performance Computer Architecture (HPCA), Feb. 2013, pp. 615–626.
    [7] O. Seongil, S. Choo, and J. H. Ahn, “Exploring energy-efficient DRAM array organizations,” in Proc. IEEE Int'l MW Symp. Circuits and Systems (MWSCAS), Aug. 2011, pp. 1–4.
    [8] JEDEC Wide I/O Single Data Rate (Wide I/O SDR) Standard JESD229, Dec. 2011. http://www.jedec.org/sites/default/files/docs/JESD229.pdf
    [9] J.-S. Kim, C. S. Oh, H. Lee, D. Lee, H.-R. Hwang, S. Hwang, B. Na, J. Moon, J.-G. Kim, H. Park, J.-W. Ryu, K. Park, S.-K. Kang, S.-Y. Kim, H. Kim, J.-M. Bang, H. Cho, M. Jang, C. Han, J.-B. Lee, K. Kyung, J.-S. Choi, and Y.-H. Jun, “A 1.2V 12.8GB/s 2Gb mobile wide-I/O DRAM with 4×128 I/Os using TSV-based stacking,” in Dig. Tech. Papers IEEE Int'l Solid-State Circuits Conf. (ISSCC), Feb. 2011, pp. 496–498.
    [10] JEDEC Wide I/O 2 (Wide I/O 2) Standard JESD229-2, Dec. 2011. http://www.jedec.org/ sites/default/files/docs/JESD229-2.pdf
    [11] Q. Harvard and R. J. Baker, “A scalable I/O architecture for wide I/O DRAM,” in Proc. IEEE Int'l MW Symp. Circuits and Systems (MWSCAS), Aug. 2011, pp. 1–4.
    [12] JEDEC High Bandwidth Memory (HBM) DRAM Standard JESD235, Oct. 2013. https://www.jedec.org/sites/default/files/docs/JESD235.pdf
    [13] J. Jeddeloh and B. Keeth, “Hybrid memory cube: new DRAM architecture increases density and performance,” in Dig. Tech. Papers of Symp. VLSI Technology (VLSIT), June 2012, pp. 87–88.
    [14] C. Weis, I. Loi, L. Benini, and N. Wehn, “Exploration and optimization of 3-D integrated DRAM subsystems,” IEEE Tran. Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 32, No. 4, pp. 597–610, Apr. 2013.
    [15] S.-S. Chen, C.-K. Hsu, H.-C. Shih, J.-C. Yeh, and C.-W. Wu, “Processor and DRAM integration by TSV-based 3-D stacking for power-aware SOCs,” in Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC), Jan. 2013, pp. 137–142.
    [16] H. Sun, N. Zheng, J. Liu, J.-Q. Lu, K. Rose, T. Zhang, and R. S. Anigundi, “3D DRAM design and application to 3D multicore systems,” IEEE Design & Test of Computers (D&T), Vol. 26, No. 5, pp. 36–47, Sep.-Oct. 2009.
    [17] S. Beamer, C. Sun, Y.-J. Kwon, A. Joshi, C. Batten, V. Stojanovic, and K. Asanovic, “Re-architecting DRAM memory systems with monolithically integrated silicon photonics,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2010, pp. 129–140.
    [18] A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, “Rethinking DRAM design and organization for energy-constrained multi-cores,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2010, pp. 175–186.
    [19] S. J. E. Wilton, and N. P. Jouppi, “CACTI: An enhanced cache access and cycle time model,” IEEE Journal Solid-State Circuits (JSSC), Vol. 31, No. 5, pp. 677–688, May 1996.
    [20] S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi, “A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2008, pp. 51–62.
    [21] B. Keeth, R. J. Baker, B. Johnson, and F. Lin, DRAM Circuit Design: Fundamental and High-Speed Topics, Wiley-IEEE Press, 2007.
    [22] T. Vogelsang, “Understanding the energy consumption of dynamic random access memories,” in Proc. IEEE/ACM Int'l Symp. Micro-architecture (MICRO), Dec. 2010, pp. 363–374.
    [23] K. Chandrasekar, C. Weis, B. Akesson, N. Wehn, and K. Goossens, “System and circuit level power modeling of energy-efficient 3D-stacked wide I/O DRAMs,” in Proc. Design, Automation & Test in Europe Conf. (DATE), Mar. 2013, pp. 236–241.
    [24] Y.-C. Bae, J.-Y. Park, S. J. Rhee, S. B. Ko, Y. Jeong, K.-S.k Noh, Y. Son, J. Youn, Y. Chu, H. Cho, M. Kim, D. Yim, H.-C. Kim, S.-H. Jung, H.-I. Choi, S. Yim, J.-B. Lee, J. S. Choi, and K. Oh, “A 1.2V 30nm 1.6Gb/s/pin 4Gb LPDDR3 SDRAM with input skew calibration and enhanced control scheme,” in Dig. Tech. Papers IEEE Int'l Solid-State Circuits Conf. (ISSCC), Feb. 2012, pp. 44–46.
    [25] D. Skinner, “LPDDR4 Moves Mobile,” in Mobile Forum, 2013. http://www.jedec.org/ sites/default/files/D_Skinner_Mobile_Forum_May_2013_0.pdf
    [26] Wikipedia, “Race condition,” http://en.wikipedia.org/wiki/Race_condition
    [27] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, Addison Wesley, 2004.
    [28] K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, “CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory,” in Proc. Design, Automation & Test in Europe Conf. (DATE), Mar. 2012, pp. 33–38.
    [29] K. Chandrasekar, C. Weis, B. Akesson, N. Wehn, and K. Goossens, “Towards variation-aware system-level power estimation of DRAMs: an empirical approach,” in Proc. Design Automation Conf. (DAC), May, 2013, pp. 23.1–23.8.
    [30] P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A cycle accurate memory system simulator,” Computer Architecture Letters, Vol. 10, No. 1, pp. 16–19, Jan.-June 2011.
    [31] H.-C. Shih, P.-W. Luo, J.-C. Yeh, S.-Y. Lin, D.-M. Kwai, S.-L. Lu, A. Schaefer, and C.-W. Wu, “DArT: a component-based DRAM area, power, and timing modeling tool,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 33, No. 9, pp. 1356–1369, Sept. 2014.
    [32] P.-W. Luo, C.-K. Chen, Y.-H. Sung, W. Wu, H.-C. Shih, C.-H. Lee, K.-H. Lee, M.-W. Li, M.-C. Lung, C.-N. Lu, Y.-F. Chou, P.-L. Shih, C.-H. Ke, C. Shiah, P. Stolt, S. Tomishima, D.-M. Kwai, B.-D. Rong, N. Lu, S.-L. Lu, and C.-W. Wu, “A computer designed half Gb 16-channel 819Gb/s high-bandwidth and 10ns low-latency DRAM for 3D stacked memory devices using TSVs,” in Proc. IEEE Symp. VLSI Circuits (VLSIC), June 2015. (accepted)
    [33] T. Takahashi, T. Sekiguchi, R. Takemura, S. Narui, H. Fujisawa, S. Miyatake, M. Morino, K. Arai, S. Yamada, S. Shukuri, M. Nakamura, Y. Tadaki, K. Kajigaya, K. Kimura, and K. Itoh, “A multigigabit DRAM technology with 6F2 open-bitline cell, distributed overdriven sensing, and stacked-flash fuse,” IEEE Journal Solid-State Circuits (JSSC), Vol. 36, No. 11, pp. 1721–1727, Nov. 2001.
    [34] Q. Wang, and S. B.K. Vrudhula, “On short circuit power estimation of CMOS inverters,” in Proc. Int'l Conf. Computer Design: VLSI in Computers and Processors, Oct. 1998, pp. 5–7.
    [35] H. J. M. Veendrick, “Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits,” IEEE Journal Solid-State Circuit (JSSC), Vol. 19, No. 4, pp. 468–473, Aug. 1984.
    [36] Wikipedia, “Design space exploration,” http://en.wikipedia.org/wiki/Design_space_ exploration
    [37] Wikipedia, “Prefetch buffer,” http://en.wikipedia.org/wiki/Prefetch_buffer
    [38] D. Apalkov, A. Khvalkovskiy, S. Watts, V. Nikitin, X. Tang, D. Lottis, K. Moon, X. Luo, E. Chen, A. Ong, A. Driskill-Smith, and M. Krounbi, “Spin-transfer torque magnetic random access memory (STT-MRAM),” ACM Journal on Emerging Tech. in Computing Systems, Vol. 9, No. 2, May 2013.
    [39] H. Y. Lee, P. S. Chen, T. Y. Wu, Y. S. Chen, C. C. Wang, P. J. Tzeng, C. H. Lin, F. Chen, C. H. Lien, and M.-J. Tsai, “Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM,” in Proc. IEEE Int'l Electron Devices Meeting (IEDM), Dec. 2008, pp. 15–17.
    [40] G. Dhian, R. Ayoub, and T. Rosing, “PDRAM: a hybrid PRAM and DRAM main memory system,” in Proc. Design Automation Conf. (DAC), July 2009, pp. 664–669.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE