超大型積體電路設計下，針對效能、功率、散熱之快取記憶體架構探索及設計最佳化

簡易檢索 / 詳目顯示

回結果列表

研究生：	許博揚 Hsu, Po-Yang
論文名稱：	超大型積體電路設計下，針對效能、功率、散熱之快取記憶體架構探索及設計最佳化 Exploration of cache architecture and design optimization for performance, power, thermal issues in VLSI technology
指導教授：	黃婷婷 Hwang, Tingting
口試委員:	金仲達 Chung-Ta King 張世杰 Shih-Chieh Chang 黃婷婷 Tingting Hwang 王廷基 Ting-Chi Wang 楊佳玲 Chia-Lin Yang 黃俊達 Juinn-Dar Huang
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2014
畢業學年度：	103
語文別：	英文
論文頁數：	108
中文關鍵詞：	三維晶片、散熱、快取記憶體組態、直通矽穿孔通道、壓縮式快取記憶體、多核心系統
外文關鍵詞：	3D IC, Thermal dissipation, Cache configuration, Trough silicon via, Compressed cache, Multi-core system
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著半導體製程進步的發展，單位晶片面積下能放置的電晶體數
量也隨之增加，因此電子元件在設計時能更加複雜且功能越強。除此
之外，目前先進的整合封裝技術如系統級封裝技術(System in
Package)及直通矽穿孔技術(Trough Silicon Via)，能將許多不同功
能的元件整合在同一塊晶片上，使系統及電路的效能更進一步地提升。然而這些先進的技術雖然帶來許多好處，卻也帶來了新的挑戰。當單位晶片面積上，放置越多電子元件時，電路功耗密度(Power Density)也隨之增加。這種現象會造成晶片散熱的問題，並降低系統電路的可靠度(Reliability)及執行效能。除此之外，功能越強大的電子元件也代表著需要更多的電源提供，為了達到低功率且高效能的系統，如何有效地管理系統內的電子元件運作是相當重要的問題。在本論文中，針對快取記憶體架構和電路設計這兩個層級，我們提出了新的架構、管理方法以及優化技術，以提升系統效能、功耗，以及電路散熱的表現。

首先，在實體電路設計最佳化的部分，我們將探討有關三維晶片
的散熱問題。在先前文獻中，Chen 所提出之堆疊式直通矽穿孔通道
(Stacked Trough Silicon Via)技術能夠有效地改善三維晶片的導熱
效益，然而此架構只有應用在電源供應網路。因此在此研究中，我們
將利用堆疊式直通矽穿孔通道(Stacked Trough Silicon Via)架構，
在不增加過多之繞線線長的條件下，降低電路在執行時的溫度。我們
開發了一個三階段擺置演算法，能於三維晶片(3D IC)繞線階段時重
新擺放並堆疊直通矽穿孔通道(Trough Silicon Via)。

接下來針對快取記憶體系統的部分，我們開發了一個考量執行緒
重要性(Thread Criticality)動態調整快取記憶體組態的方法。先前
文獻中，Zhang 提出可重組式快取記憶體架構(Reconfigurable Cache)改善系統執行效能和功率消耗，然而先前的研究只應用在單核心系統架構。因此我們會在平行程式於多核心系統執行時預測每個執行緒的效能重要性，並根據此資訊調整多核心系統內的快取記憶體組態。

最後，針對壓縮快取記憶體(Compressed Cache)架構的部分，我
們提出了免簡縮(Compaction-free)式壓縮快取記憶體架構，用以實
現高效能多核心系統。壓縮快取記憶體架構通常應用在最底層之快取
記憶體階級(Last Level Cache)，透過壓縮存放資料的大小，快取記
憶體能夠存放資料數量便隨之增加。然而由於壓縮資料大小不一致的
關係，因此在此快取記憶體架構中會發生存取空間破碎之現象
(Storage Fragmentation)。當發生此現象時，簡縮機制(Compaction
Process)便會啟動並重整資料的存放位址，製造足夠的連續空間存放
壓縮資料。然而執行簡縮機制會需要許多額外執行時間，影響壓縮快
取記憶體的執行效益。因此在此研究中，我們將設計免簡縮
(Compaction-free)式壓縮快取記憶體架構，消除所有減縮機制所需
要的效能負擔。

Due to the advanced VLSI technology process, the transistor count in a single IC continues to grow so that more complex and powerful devices can be manufactured within small areas. Furthermore, the modern integration technology such as system in package (SIP) and through silicon via (TSV) provides a good ability to integrate heterogeneous devices within the same chip. Based on these technologies, the circuit and system performance can be greatly improved. However, these technologies also bring some challenges. Since more and more devices (transistors) are placed in a given area, the power density of chip is also increased. This effect severely results the thermal problem which can degrade the system reliability and performance. Moreover, increasing the number of device in a single IC needs the additional power budget. To alleviate the power wall, the device management is required to achieve high performance and low energy consumption system. In this dissertation, the exploration of cache architecture and design optimization techniques are proposed to improve system performance, power, and thermal issues in system and physical design levels.

First, in physical design level, a study of stacked signal TSV for thermal dissipation in global routing for 3D IC is introduced. Stacked TSV structure proposed by Chen et al. is very efficiency in dissipating the heat flow for 3D. However, the original stacked TSV structure is only used in power network. In this work, we leverage the integrated architecture of stacked signal TSV to minimize temperature with small wiring overhead. Based on the structure of stacked signal TSV, a three-stage TSV locating algorithm in global routing is designed.

Second, in system design level, a study of thread-criticality aware dynamic cache reconfiguration for multi-core system is proposed. Reconfigurable cache proposed by Zhang et al. can improve system performance and energy consumption. However, the original reconfigurable cache only used in single core system. In this work, we dynamically predict thread criticality of a parallel application and tune our cache memory architecture accordingly in multi-core system.

Finally, a study of compaction-free compressed cache for high performance multi-core system is introduced. Compressed cache is usually used in last level cache to increase the effective capacity. However, because of various data compression sizes, fragmentation problem of storage is inevitable in this cache design. When it happens, usually, a compaction process is invoked to make contiguous storage space. This compaction process induces extra cycle penalty and degrades the effectiveness of compressed cache design. In this work, we propose a compaction-free compressed cache architecture which can completely eliminate the time for executing compaction.

Introduction 1
Stacked Signal TSV for Thermal Dissipation in Global Routing for 3D IC 5
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Location of Stacked Signal TSV . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Thermal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Distance of stacked signal TSV to heat source . . . . . . . . . . . . . 11
3 Signal TSV Assignment and Relocation for Thermal Dissipation . . . . . . . . 13
3.1 Overview of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Initial TSV assignment stage . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Stacked-TSV relocation stage . . . . . . . . . . . . . . . . . . . . . . 20
3.4 TSV restoration for timing-violation stage . . . . . . . . . . . . . . . 32
4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Experimental environment . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Thread-criticality aware Dynamic Cache Reconfiguration for Multi-core System 44
1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Thread-criticality aware Dynamic Cache Reconfiguration . . . . . . . . . . . 52
2.1 Thread criticality computation . . . . . . . . . . . . . . . . . . . . . 52
2.2 Overview of our cache reconfiguration . . . . . . . . . . . . . . . . . . 53
2.3 Cache line-size adjustment . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4 Cache capacity adjustment . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1 Experimental environment . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Compaction-free Compressed Cache for High Performance Multi-core System 70
1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
1.1 Review of decoupled variable-segment cache architecture . . . . . . . 72
1.2 Compaction overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2 Compaction-free Compressed Cache . . . . . . . . . . . . . . . . . . . . . . . 81
2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.2 Hardware implementation . . . . . . . . . . . . . . . . . . . . . . . . 90
3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.1 Experimental environment . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Performance and energy results . . . . . . . . . . . . . . . . . . . . . 93
3.3 Area overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.4 Analysis of Performance and Area Overhead with Different Compression Segment Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Conclusions and Future Work 98

                                

[1] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC benchmark suite: Characterization and architectural implications," Tech. Rep., 2008.
[2] Y.-T. Chen, J. Cong, H. Huang, B. Liu, C. Liu, M. Potkonjak, and G. Reinman, “Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design," in Proc. Design, Automation, and Test in Europe (DATE), 2012, pp. 45-50.
[3] A. R. Alameldeen and D. A.Wood, “Frequent pattern compression: A significance-based compression scheme for L2 caches," Technical Report 1500, University of Wisconsin-Madison, Computer Sciences Department, Tech. Rep., 2004.
[4] S. Sardashti and D. A. Wood, “Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching," in Proc. Microarchitecture (MICRO), 2013, pp. 62-73.
[5] J. Cong and Y. Zhang, “Thermal via planning for 3-D ICs," in Proc. Computer-Aided Design (ICCAD), 2005, pp. 745-752.
[6] M. Pathak and S. K. Lim, “Performance and thermal-aware steiner routing for 3-D
stacked ICs," IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 28, no. 9, pp. 1373-1386, 2009.
[7] A. R. Alameldeen and D. A.Wood, “Adaptive cache compression for high-performance processors," in Proc. International Symposium on Computer Architecture (ISCA), 2004, pp. 212-223.
[8] K. Athikulwongse, M. Pathak, and S. K. Lim, “Exploiting die-to-die thermal coupling in 3D IC placement," in Proc. Design Automation Conference (DAC), 2012, pp. 741-746.
[9] B. Goplen and S. S. Sapatnekar, “Thermal via placement in 3D ICs," in International Symposium on Physical Design (ISPD), 2005, pp. 309-314.
[10] B. Goplen and S. S. Sapatnekar, “Placement of thermal vias in 3-D ICs using various thermal objectives," IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 25, no. 4, pp. 692-709, 2006.
[11] J. Cong and Y. Zhang, “Thermal-driven multilevel routing for 3-D ICs," in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), 2005, pp. 121-126.
[12] X. Li, Y. Ma, X. Hong, S. Dong, and J. Cong, “LP based white space redistribution for thermal via planning and performance optimization in 3D ICs," in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), 2008, pp. 209-212.
[13] T. Zhang, Y. Zhan, and S. S. Sapatnekar, “Temperature-aware routing in 3D ICs," in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), 2006, pp. 309-314.
[14] S. Onakaraiah and C. S. Tan, “Mitigating heat dissipation and thermo-mechanical stress challenges in 3-D IC using thermal through silicon via (TTSV)," in Proc. Electronic Components and Technology (ECTC), 2010, pp. 411-416.
[15] M. Pathak, Y.-J. Lee, T. Moon, and S. K. Lim, “Through-silicon-via management during 3D physical design: When to add and how many?" in Proc. Computer-Aided Design (ICCAD), 2010, pp. 387-394.
[16] H.-T. Chen, H.-L. Lin, T.-C. Wang, and T. Hwang, “A new design architecture for power network in 3D ICs," in Proc. Design, Automation, and Test in Europe (DATE), 2011, pp. 401-406.
[17] A. Malik, B. Moyer, and D. Cermak, “A low power unified cache architecture providing power and performance flexibility," in International Symposium on Low Power Electronics and Design (ISLPED), 2000, pp. 241-243.
[18] R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas, “Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures," in Proc. International Symposium on Microarchitecture (MICRO), 2000, pp. 245-257.
[19] K. Inoue, K. Kai, and K. Murakami, “A high-performance/low-power on-chip memory-path architecture with variable cache-line size," in IEICE Transactions on Electronics, 2000, pp. 1716-1723.
[20] S. Banerjee, S. G, and S. K. Nandy, “Program phase directed dynamic cache way
reconfiguration for power efficiency," in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), 2007, pp. 884-889.
[21] C. Zhang, F. Vahid, and W. Najjar, “A highly configurable cache for low energy embedded systems," ACM Transactions on Embedded Computing Systems (TECS), vol. 4, pp. 363-387, May 2005.
[22] A. Gordon-Ross and F. Vahid, “A self-tuning configurable cache," in Proc. Design Automation Conference (DAC), 2007, pp. 234-237.
[23] C. Zhang, F. Vahid, and W. Najjar, “A self-tuning cache architecture for embedded systems," ACM Transactions on Embedded Computing Systems (TECS), vol. 3, pp. 407-425, May 2004.
[24] M. Rawlins and A. Gordon-Ross, “An application classification guided cache tuning heuristic for multi-core architectures," in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), 2012, pp. 23-28.
[25] A. Gordon-Ross, F. Vahid, and N. Dutt, “Automatic tuning of two-level caches to embedded applications," in Proc. Design, Automation, and Test in Europe (DATE),
2004, pp. 208-213.
[26] W. Wang, P. Mishra, and A. Gordon-Ross, “Dynamic cache reconfiguration for soft real-time systems," vol. 11, no. 28, 2012.
[27] D. H. Albonesi, “Selective cache ways: on-demand cache resource allocation," in Proc. International Symposium on Microarchitecture (MICRO), 1999, pp. 248-259.
[28] E. Ahn, S.-M. Yoo, and S.-M. S. Kang, “Effective algorithms for cache-level compression," in Proc. Great Lakes symposium on VLSI (GLSVLSI), 2001, pp. 89-92.
[29] F. Douglis, “The compression cache: Using on-line compression to extend physical memory," in Proc. Winter USENIX Conference, 1993, pp. 519-529.
[30] M. J. Freedman, “The compression cache: Virtual memory compression for handheld computers," Parallel and Distributed Operating Systems Group, MIT Lab for Computer Science, Cambridge, Tech. Rep., 2000.
[31] X. Chen, L. Yang, R. Dick, L. Shang, and H. Lekatsas, “C-Pack: A high-performance microprocessor cache compression algorithm," IEEE Transaction of Very Large Scale Integration Systems (TVLSI), vol. 18, no. 8, pp. 1196 -1208, 2010.
[32] L. Villa, M. Zhang, and K. Asanovic, “Dynamic zero compression for cache energy reduction," in Proc. Microarchitecture (MICRO), 2000, pp. 214-220.
[33] J. Yang, Y. Zhang, and R. Gupta, “Frequent value compression in data caches," in Proc. Microarchitecture (MICRO), 2000, pp. 258-265.
[34] G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C. Mowry, “Base-delta-immediate compression: Practical data compression for on-chip caches," in Proc. Parallel Architectures and Compilation Techniques (PACT), 2013, pp. 377-388.
[35] J. Dusser, T. Piquet, and A. Seznec, “Zero-content augmented caches," in Proc. Super-computing (ICS), 2009, pp. 46-55.
[36] L. Benini, D. Bruni, B. Ricco, A. Macii, and E. Macii, “An adaptive data compression scheme for memory traffic minimization in processor-based systems," in Proc. International Symposium on Circuits and Systems (ISCAS), 2002, pp. 866-869.
[37] L. Benini, D. Bruni, A. Macii, and E. Macii, “Hardware-assisted data compression for energy minimization in systems with embedded processors," in Proc. the Design, Automation and Test in Europe (DATE), 2002, pp. 449-453.
[38] J.-S. Lee, W.-K. Hong, and S.-D. Kim, “Design and evaluation of a selective compressed memory system," in Proc. International Conference on Computer Design (ICCD), 1999, pp. 184-191.
[39] E. Hallnor and S. Reinhardt, “A compressed memory hierarchy using an indirect index cache," in Proc. Workshop on Memory Performance Issues (WMPI), 2004, pp. 9-15.
[40] E. Hallnor and S. Reinhardt, “A unified compressed memory hierarchy," in Proc. High-Performance Computer Architecture (HPCA), 2005, pp. 201-212.
[41] A.-R. Adl-Tabatabai, A. M. Ghuloum, and S. O. Kanaujia, “Compression in cache design," in Proc. Supercomputing (ICS), 2007, pp. 190-201.
[42] S. Kim, J. Lee, J. Kim, and S. Hong, “Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits," in Proc. Microarchitecture (MICRO), 2011, pp. 420-429.
[43] W. Chen, W. R. Bottoms, K. Pressel, and J. Wolf, “The next step in assembly and packaging: System level integration in the package (SiP)," ITRS, Tech. Rep. White Paper v9.0.
[44] P. K. K. Banerjee, S. Souri and K. Saraswat, “3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration," in Proc. IEEE, 2001, pp. 602-633.
[45] J. Cong, G. Luo, and Y. Shi, “Thermal-aware cell and through-silicon-via co-placement for 3D ICs," in Proc. Design Automation Conference (DAC), 2011, pp. 670-675.
[46] P. Wilkerson, A. Raman, and M. Turowski, “Fast, automated thermal simulation of three-dimensional integrated circuits," in Proc. Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), 2004, pp. 706-713.
[47] T.-Y. Wang and C. C.-P. Chen, “3-D thermal-ADI: A linear-time chip level transient thermal simulator," IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 21, no. 12, pp. 1434-1445, 2002.
[48] W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusamy, “Compact thermal modeling for temperature-aware design," in Proc. Design Automation Conference (DAC), 2004, pp. 878-883.
[49] HSPICE Simulation and Analysis User Guide, V-2004.03, Synopsys Inc, Mountain View, CA, 2004.
[50] C.-H. Tsai and S.-M. S. Kang, “Standard cell placement for even on-chip thermal distribution," in International Symposium on Physical Design (ISPD), 1999, pp. 179-184.
[51] Y. K. Cheng, C. C. Teng, A. Dharchoudhury, E. Rosenbaum, and S. M. Kang, “iCET: a complete chip-level thermal reliability diagnosis tool for CMOS VLSI chips," in Proc. Design Automation Conference (DAC), 1996, pp. 548-551.
[52] C. Chu and Y. C. Wong, “Flute: Fast lookup table based rectilinear steiner minimal tree algorithm for VLSI design," IEEE Transaction on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 27, no. 1, pp. 70-83, 2008.
[53] M. C. Yildiz and P. H. Madden, “Improved cut sequences for partitioning driven placement," in Proc. Design Automation Conference (DAC), 2001, pp. 776-779.
[54] K. Mehlhorn and S. Na"her, LEDA: a platform for combinatorial and geometric computing. Cambridge University Press, 1999.
[55] Intel. Intel Core i7 processor. http://www.intel.com.
[56] ARM. ARM11MPCore processor. http://www.arm.com.
[57] W. Wang, P. Mishra, and S. Ranka, “Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems," in Proc. Design Automation Conference (DAC), 2011, pp. 948-953.
[58] A. Bhattacharjee and M. Martonosi, “Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors," in Proc. International Symposium on Computer Architecture (ISCA), 2009, pp. 290-301.
[59] Intel Threading Building Blocks 2.0.
[60] S. P. Muralidhara, M. Kandemir, and P. Raghavan, “Intra-application cache partitioning," in Proc. Parallel and Distributed Processing (IPDPS), 2012, pp. 1 - 12.
[61] A. Sharifi, S. Srikantaiah, M. Kandemir, and M. J. Irwin, “Courteous cache sharing: Being nice to others in capacity management," in Proc. Design Automation Conference (DAC), 2012, pp. 678 - 687.
[62] S. Foroutan, A. Sheibanyrad, and F. Petrot, “Cost-efficient buffer sizing in shared-memory 3D-MPSoCs using wide I/O interfaces," in Proc. Design Automation Conference (DAC), 2012, pp. 366 - 375.
[63] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, “Simics: A full system simulation platform," IEEE Computer, pp. 50-58, 2002.
[64] M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood, “Multifacet’s general execution-driven multiprocessor simulator(GEMS) toolset," Computer Architecture News, pp. 92-99, 2005.
[65] HP Laboratories Palo Alto, “CACTI 6.5." [Online]. Available: http://www.hpl.hp.com/
[66] Y. Xie and G. Loh, “Thread-aware dynamic shared cache compression in multi-core processors," in Proc. Computer Design (ICCD), 2011, pp. 135-141.
[67] S. Baek, H. G. Lee, C. Nicopoulos, J. Lee, and J. Kim, “ECM: Effective capacity maximizer for high-performance compressed caching," in Proc. High-Performance Computer Architecture (HPCA), 2013, pp. 131-142.
[68] J. S. Lee, W. K. Hong, and S. D. Kim, “An on-chip cache compression technique to reduce decompression overhead and design complexity," Journal of Systems Architecture (JSA), vol. 46, no. 15, pp. 1365-1382, 2000.
[69] D. Chen, E. Peserico, and L. Rudolph, “A dynamically partitionable compressed cache," in Proc. the Singapore-MIT Alliance Symposium, 2003.
[70] A. Seznec, “Decoupled sectored caches: conciliating low tag implementation cost," in Proc. International Symposium on Computer architecture (ISCA), 1994, pp. 384-393.
[71] “SPEC2006 benchmarks." [Online]. Available:
http://www.specbench.org/osg/cpu2006/
[72] D. Williamson, ARM Cortex A8: A High Performance Processor for Low Power Applications. ARM.
[73] T. R. Halfhill, “ARM's midsize multiprocessor," Microprocessor Report, vol. 23, no. 10, pp. 17-24, 2009.
[74] E. Rotenberg, S. Bennett, and J. E. Smith, “Trace cache: A low latency approach to high bandwidth instruction fetching," in Proc. Microarchitecture (MICRO), 1996, pp. 24-34.
[75] T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel, “Optimization of instruction fetch mechanisms for high issue rates," in Proc. International Symposium on Computer Architecture (ISCA), 2005, pp. 333-344.
[76] T. K. Prakash and L. Peng, “Performance characterization of spec cpu2006 benchmarks on intel core 2 duo processor," in ISAST Transactions on Computers and Software Engineering, 2008, pp. 36-41.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文