研究生: |
石修銓 Shih, Hsiu-Chuan |
---|---|
論文名稱: |
動態記憶體的組件合成法快速模型建立與設計探索 Component-Based Fast Modeling and Design Exploration for DRAM |
指導教授: |
吳誠文
Wu, Cheng-Wen |
口試委員: |
吳誠文
張彌彰 張孟凡 李昆忠 呂學坤 黃世安 |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 112 |
中文關鍵詞: | 動態記憶體 、模型建立 、設計空間探索 |
外文關鍵詞: | DRAM, modeling, design space exploration |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
處理器與動態記憶體 (DRAM) 的效能落差被稱為記憶體牆 (memory wall)。若研究人員與開發者沒正視此問題,記憶體牆將會日漸嚴重。因此除了加強處理器與記憶體介面外,動態記憶體本身的效能強化也被迫切需要,不論是在原件、電路、或架構上。現在正需要發展新的動態記憶體,於此同時動態記憶體的模型工具 (modeling tools) 在設計探索與效能評估上顯得重要。現有的模型工具在效率上並不足以實際面對記憶體牆,此外這些模型工具缺乏架構彈性 (architecture flexibility) 去探索不同的記憶體架構。在這篇論文中,我們導入組件 (component) 概念且提出以元件為基礎的 (component-based) 動態記憶體模型建立方法。此方法將動態記憶體抽象成框架 (framework),包含了組件層級的記憶體架構、陣列 (array)、平面圖 (floorplan)、全顆粒 (whole chipe)、以及介面 (interface)。以此抽象化為基礎,我們開發了模型建立工具能夠準確的預測矽面積 (silicon area)、延遲 (delay)、與耗能 (power),並且擁有高架構彈性與短執行時間。我們的工具被使用在最先進且當前模型無法支援的動態記憶體設計。此外我們改善傳統的電阻電容電壓模型 (RC-delay model and CV-charge models) 來達到更高的準確度,此準確度在實驗中以一商用的雙倍資料速率二代 (DDR2 DRAM) 來驗證。以原件模型為基礎,我們進一步提出一般化架構 (generalized-architecture) 的設計探索方法。此方法中,我們提出變數連結 (variable links) 的概念用來代表變數與變數之間的關係。透過處理變數與變數連結,我們的演算法可以展開設計空間並且降低複雜度。在我們與工研院合作計畫中,此方法被實作來找出全陣列測試晶片 (full array test chip) 的候選者 (candidates)。此測試晶片為一系穿孔堆三維動態記憶體 (TSV-based 3D DRAM),擁有10ns的低延遲與100GB/s的高頻寬。此外我們也運用這個探索方法來證實某一記憶體架構在寬輸出輸入 (Wide-IO) 介面上,相較於傳統架構擁有較大的發展空間。以組件為基礎的模型建立與架構探索提供系統設計人員可行的方法去辨識更好的動態記憶體系統,進而改善問題的根源 (記憶體牆)。
It is a common understanding that there is a performance gap between the processor and the DRAM in a computer system, which is called the memory wall. The memory wall can become more and more serious if not properly addressed by researchers and practitioners. Therefore, in addition to the improvement of the processor-DRAM interface, there is also drastic need in performance enhancement of the DRAM itself, from device, circuit, to architecture. New DRAM designs need to be developed, and the DRAM modeling tools are crucial to design exploration and evaluation. The existing modeling tools used in design exploration are not efficient enough so far as closing the gap is concerned. Also, there is a lack of flexibility for such tools to explore different architectures. In this thesis, we introduce the notion of component and propose a component-based DRAM modeling method. In this method, we abstracted a DRAM design with a framework, containing the DRAM architecture at the component level, the arrays, the floorplan, the whole chip, and the interface. Based on the abstraction, a modeling tool has been developed to accurately predict the silicon area, delay, and power of the DRAM with high architecture flexibility and short computation time. Our tool has been used for modeling state-of-the-art DRAM designs not supported by the prior works. We also have improved the traditional RC-delay model and CV-charge model to achieve higher accuracy. The modeling accuracy is verified by a commodity DDR2 DRAM in our experiment. With the component-based approach, we also propose a generalized- architecture exploration algorithm, in which we introduce the concept of variable links, representing the relation between the variables. By dealing with the variables and their links, our method can expand the exploration space with the proposed algorithm to reduce the complexity. In our collaboration with ITRI, this method has successfully identified the full array architecture candidates for a DRAM test chip, which is a TSV-based 3D DRAM die (to be used for DRAM die stacking) with a low latency of 10ns and high bandwidth of 100GB/s. We also have used this method to find the array style that has higher potential for the Wide-IO interface than the traditional one. The modeling and exploration approach proposed in this thesis provides system designers with a way to efficiently identify a better DRAM memory system to minimize the memory wall.
[1] W. A. Wulf and S. A. Mckee, “Hitting the memory wall: Implications of the obvious,” Computer Architecture News, Vol. 23, No. 1, pp. 20–24, Mar. 1995.
[2] S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, “Memory access scheduling,” in Proc. Int'l Symp. Computer Architecture (ISCA), May 2000, pp. 128– 138.
[3] JEDEC Double Data Rate (DDR) SDRAM Standard JESD79F, Feb. 2008. http://www.jedec.org/sites/default/files/docs/JESD79F_0.pdf
[4] JEDEC Double Data Rate 4 (DDR4) SDRAM Standard JESD79-4, Sep. 2012. http://www.jedec.org/sites/default/files/docs/JESD79-4.pdf
[5] Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, “A case for exploiting subarray-level parallelism (SALP) in DRAM,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2012, pp. 368–379.
[6] D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, “Tiered-latency DRAM: a low latency and low cost DRAM architecture,” in Proc. Int’l Symp. High Performance Computer Architecture (HPCA), Feb. 2013, pp. 615–626.
[7] O. Seongil, S. Choo, and J. H. Ahn, “Exploring energy-efficient DRAM array organizations,” in Proc. IEEE Int'l MW Symp. Circuits and Systems (MWSCAS), Aug. 2011, pp. 1–4.
[8] JEDEC Wide I/O Single Data Rate (Wide I/O SDR) Standard JESD229, Dec. 2011. http://www.jedec.org/sites/default/files/docs/JESD229.pdf
[9] J.-S. Kim, C. S. Oh, H. Lee, D. Lee, H.-R. Hwang, S. Hwang, B. Na, J. Moon, J.-G. Kim, H. Park, J.-W. Ryu, K. Park, S.-K. Kang, S.-Y. Kim, H. Kim, J.-M. Bang, H. Cho, M. Jang, C. Han, J.-B. Lee, K. Kyung, J.-S. Choi, and Y.-H. Jun, “A 1.2V 12.8GB/s 2Gb mobile wide-I/O DRAM with 4×128 I/Os using TSV-based stacking,” in Dig. Tech. Papers IEEE Int'l Solid-State Circuits Conf. (ISSCC), Feb. 2011, pp. 496–498.
[10] JEDEC Wide I/O 2 (Wide I/O 2) Standard JESD229-2, Dec. 2011. http://www.jedec.org/ sites/default/files/docs/JESD229-2.pdf
[11] Q. Harvard and R. J. Baker, “A scalable I/O architecture for wide I/O DRAM,” in Proc. IEEE Int'l MW Symp. Circuits and Systems (MWSCAS), Aug. 2011, pp. 1–4.
[12] JEDEC High Bandwidth Memory (HBM) DRAM Standard JESD235, Oct. 2013. https://www.jedec.org/sites/default/files/docs/JESD235.pdf
[13] J. Jeddeloh and B. Keeth, “Hybrid memory cube: new DRAM architecture increases density and performance,” in Dig. Tech. Papers of Symp. VLSI Technology (VLSIT), June 2012, pp. 87–88.
[14] C. Weis, I. Loi, L. Benini, and N. Wehn, “Exploration and optimization of 3-D integrated DRAM subsystems,” IEEE Tran. Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 32, No. 4, pp. 597–610, Apr. 2013.
[15] S.-S. Chen, C.-K. Hsu, H.-C. Shih, J.-C. Yeh, and C.-W. Wu, “Processor and DRAM integration by TSV-based 3-D stacking for power-aware SOCs,” in Proc. Asia and South Pacific Design Automation Conf. (ASP-DAC), Jan. 2013, pp. 137–142.
[16] H. Sun, N. Zheng, J. Liu, J.-Q. Lu, K. Rose, T. Zhang, and R. S. Anigundi, “3D DRAM design and application to 3D multicore systems,” IEEE Design & Test of Computers (D&T), Vol. 26, No. 5, pp. 36–47, Sep.-Oct. 2009.
[17] S. Beamer, C. Sun, Y.-J. Kwon, A. Joshi, C. Batten, V. Stojanovic, and K. Asanovic, “Re-architecting DRAM memory systems with monolithically integrated silicon photonics,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2010, pp. 129–140.
[18] A. N. Udipi, N. Muralimanohar, N. Chatterjee, R. Balasubramonian, A. Davis, and N. P. Jouppi, “Rethinking DRAM design and organization for energy-constrained multi-cores,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2010, pp. 175–186.
[19] S. J. E. Wilton, and N. P. Jouppi, “CACTI: An enhanced cache access and cycle time model,” IEEE Journal Solid-State Circuits (JSSC), Vol. 31, No. 5, pp. 677–688, May 1996.
[20] S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi, “A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies,” in Proc. Int'l Symp. Computer Architecture (ISCA), June 2008, pp. 51–62.
[21] B. Keeth, R. J. Baker, B. Johnson, and F. Lin, DRAM Circuit Design: Fundamental and High-Speed Topics, Wiley-IEEE Press, 2007.
[22] T. Vogelsang, “Understanding the energy consumption of dynamic random access memories,” in Proc. IEEE/ACM Int'l Symp. Micro-architecture (MICRO), Dec. 2010, pp. 363–374.
[23] K. Chandrasekar, C. Weis, B. Akesson, N. Wehn, and K. Goossens, “System and circuit level power modeling of energy-efficient 3D-stacked wide I/O DRAMs,” in Proc. Design, Automation & Test in Europe Conf. (DATE), Mar. 2013, pp. 236–241.
[24] Y.-C. Bae, J.-Y. Park, S. J. Rhee, S. B. Ko, Y. Jeong, K.-S.k Noh, Y. Son, J. Youn, Y. Chu, H. Cho, M. Kim, D. Yim, H.-C. Kim, S.-H. Jung, H.-I. Choi, S. Yim, J.-B. Lee, J. S. Choi, and K. Oh, “A 1.2V 30nm 1.6Gb/s/pin 4Gb LPDDR3 SDRAM with input skew calibration and enhanced control scheme,” in Dig. Tech. Papers IEEE Int'l Solid-State Circuits Conf. (ISSCC), Feb. 2012, pp. 44–46.
[25] D. Skinner, “LPDDR4 Moves Mobile,” in Mobile Forum, 2013. http://www.jedec.org/ sites/default/files/D_Skinner_Mobile_Forum_May_2013_0.pdf
[26] Wikipedia, “Race condition,” http://en.wikipedia.org/wiki/Race_condition
[27] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, Addison Wesley, 2004.
[28] K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi, “CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory,” in Proc. Design, Automation & Test in Europe Conf. (DATE), Mar. 2012, pp. 33–38.
[29] K. Chandrasekar, C. Weis, B. Akesson, N. Wehn, and K. Goossens, “Towards variation-aware system-level power estimation of DRAMs: an empirical approach,” in Proc. Design Automation Conf. (DAC), May, 2013, pp. 23.1–23.8.
[30] P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A cycle accurate memory system simulator,” Computer Architecture Letters, Vol. 10, No. 1, pp. 16–19, Jan.-June 2011.
[31] H.-C. Shih, P.-W. Luo, J.-C. Yeh, S.-Y. Lin, D.-M. Kwai, S.-L. Lu, A. Schaefer, and C.-W. Wu, “DArT: a component-based DRAM area, power, and timing modeling tool,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol. 33, No. 9, pp. 1356–1369, Sept. 2014.
[32] P.-W. Luo, C.-K. Chen, Y.-H. Sung, W. Wu, H.-C. Shih, C.-H. Lee, K.-H. Lee, M.-W. Li, M.-C. Lung, C.-N. Lu, Y.-F. Chou, P.-L. Shih, C.-H. Ke, C. Shiah, P. Stolt, S. Tomishima, D.-M. Kwai, B.-D. Rong, N. Lu, S.-L. Lu, and C.-W. Wu, “A computer designed half Gb 16-channel 819Gb/s high-bandwidth and 10ns low-latency DRAM for 3D stacked memory devices using TSVs,” in Proc. IEEE Symp. VLSI Circuits (VLSIC), June 2015. (accepted)
[33] T. Takahashi, T. Sekiguchi, R. Takemura, S. Narui, H. Fujisawa, S. Miyatake, M. Morino, K. Arai, S. Yamada, S. Shukuri, M. Nakamura, Y. Tadaki, K. Kajigaya, K. Kimura, and K. Itoh, “A multigigabit DRAM technology with 6F2 open-bitline cell, distributed overdriven sensing, and stacked-flash fuse,” IEEE Journal Solid-State Circuits (JSSC), Vol. 36, No. 11, pp. 1721–1727, Nov. 2001.
[34] Q. Wang, and S. B.K. Vrudhula, “On short circuit power estimation of CMOS inverters,” in Proc. Int'l Conf. Computer Design: VLSI in Computers and Processors, Oct. 1998, pp. 5–7.
[35] H. J. M. Veendrick, “Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits,” IEEE Journal Solid-State Circuit (JSSC), Vol. 19, No. 4, pp. 468–473, Aug. 1984.
[36] Wikipedia, “Design space exploration,” http://en.wikipedia.org/wiki/Design_space_ exploration
[37] Wikipedia, “Prefetch buffer,” http://en.wikipedia.org/wiki/Prefetch_buffer
[38] D. Apalkov, A. Khvalkovskiy, S. Watts, V. Nikitin, X. Tang, D. Lottis, K. Moon, X. Luo, E. Chen, A. Ong, A. Driskill-Smith, and M. Krounbi, “Spin-transfer torque magnetic random access memory (STT-MRAM),” ACM Journal on Emerging Tech. in Computing Systems, Vol. 9, No. 2, May 2013.
[39] H. Y. Lee, P. S. Chen, T. Y. Wu, Y. S. Chen, C. C. Wang, P. J. Tzeng, C. H. Lin, F. Chen, C. H. Lien, and M.-J. Tsai, “Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM,” in Proc. IEEE Int'l Electron Devices Meeting (IEDM), Dec. 2008, pp. 15–17.
[40] G. Dhian, R. Ayoub, and T. Rosing, “PDRAM: a hybrid PRAM and DRAM main memory system,” in Proc. Design Automation Conf. (DAC), July 2009, pp. 664–669.