在高效能多核心系統平台的免重整之壓縮式快取記憶體架構

簡易檢索 / 詳目顯示

回結果列表

研究生：	林佩藍 LIN, PEI-LAN
論文名稱：	在高效能多核心系統平台的免重整之壓縮式快取記憶體架構 Compaction-free Compressed Cache for High Performance Multi-core System
指導教授：	黃婷婷 Hwang, Ting Ting
口試委員:	金仲達 King, Chung-Ta 黃俊達 Huang, Juinn-Dar
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2015
畢業學年度：	103
語文別：	英文
論文頁數：	29
中文關鍵詞：	壓縮式快取記憶體、高效能多核心系統、末級快取記憶體、快取記憶體重整
外文關鍵詞：	Compressed Cache, High performance Multi-core System, Last Level Cache, Cache Compaction
相關次數：	點閱：85 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

壓縮式快取記憶體通常被利用在末級快取記憶體上來增加有效的儲存空間 [1]。然而，因為各種資料壓縮的大小不同，導致在這種快取記憶體架構無法避免的儲存碎片問題。當儲存碎片問題發生時，通常會執行重整處理來挪出連續的儲存空間。這種重整處理會造成額外的時間週期負擔還有降低壓縮式快取記憶體架構的效率。在此，我們提出了一個免重整的壓縮式快取記憶體架構，可以完全減少執行重整處理所需要的時間。基於此架構，我們證明了我們的結果比傳統的快取記憶體架構增加了16%的系統性能並且減少了16%的能量消耗。我們的架構也比 Alameldeen 等人 [1]所提出的架構增加了5%的系統性能並且減少了3%的能量消耗。與 Sardashti等人 [2]所提出的架構，我們也多增進了3%的系統性能還有降低了2%的能量消耗。

Compressed cache was used in shared last level cache (LLC) to increase the effective capacity [1]. However, because of various data compression sizes, fragmentation problem of storage is inevitable in this cache design. When it happens, usually, a compaction process
is invoked to make contiguous storage space. This compaction process induces extra cycle penalty and degrades the effectiveness of compressed cache design. In this paper, we propose a compaction-free compressed cache architecture which can completely eliminate the time for executing compaction. Based on this cache design, we demonstrate that our results, compared with the conventional cache, have system performance improvement by 16% and energy reduction by 16% . Compared with the work by Alameldeen et al. [1], our design
has 5% more performance improvement and 3% more energy reduction. Compared with the work by Sardashti et al. [2], our design has 3% more performance improvement and 2% more
energy reduction.
i

Introduction 1
Previous Work 4
1 Data Compression Algorithms for Compressed Cache . . . . . . . . . . . . . 4
2 Compressed Cache Management . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Compressed LLC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Background and Motivation 6
1 Review of Decoupled Variable-Segment Cache Architecture . . . . . . . . . . 6
2 Compaction Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Methodology 12
1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Experimental Results 19
1 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Performance and Energy Results . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Analysis of Performance and Area Overhead with Dierent Compression Seg-
ment Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Conclusions 25
                                

[1] A. R. Alameldeen and D. A.Wood, “Adaptive cache compression for high-performance
processors,” Proc. the 31st Annual International Symposium on Computer Architecture
(ISCA), pp. 212–223, 2004.
[2] S. Sardashti and D. A. Wood, “Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching,” Proc. the 46th Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO), pp. 62–73, 2013.
[3] A. R. Alameldeen and D. A.Wood, “Frequent pattern compression: A signiﬁcance-based
compression scheme for l2 caches,” Technical Report 1500, University of WisconsinMadison, Computer Sciences Department, Tech. Rep., 2004.
[4] E. Ahn, S.-M. Yoo, and S.-M. S. Kang, “Eﬀective algorithms for cache-level compression,” Proc. the 11th Great Lakes symposium on VLSI (GLSVLSI), pp. 89–92, 2001.
[5] F. Douglis, “The compression cache: Using on-line compression to extend physical
memory,” Proc. 1993 Winter USENIX Conference, pp. 519–529, 1993.
[6] M. J. Freedman, “The compression cache: Virtual memory compression for handheld
computers,” Parallel and Distributed Operating Systems Group, MIT Lab for Computer
Science, Cambridge, Tech. Rep., 2000.
[7] X. Chen, L. Yang, R. Dick, L. Shang, and H. Lekatsas., “Cpack: A high-performance
microprocessor cache compression algorithm,” IEEE Trans. Very Large Scale Integration
(VLSI) Systems, vol. 18, no. 8, pp. 1196 –1208, 2010.
[8] L. Villa, M. Zhang, and K. Asanovic, “Dynamic zero compression for cache energy
reduction,” Proc. the 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 214–220, 2000.
[9] J. Yang, Y. Zhang, and R. Gupta, “Frequent value compression in data caches,” Proc.
the 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO),
pp. 258–265, 2000.
[10] G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C.
Mowry, “Base-delta-immediate compression: Practical data compression for on-chip
caches,” Proc. the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 377–388, 2013.
[11] J. Dusser, T. Piquet, and A. Seznec, “Zero-content augmented caches,” Proc. the 23rd
international conference on Supercomputing (ICS), pp. 46–55, 2009.
[12] E. Hallnor and S. Reinhardt, “A compressed memory hierarchy using an indirect index
cache,” Proc. the 3rd Workshop on Memory performance issues: in con-junction with
the 31st international symposium on computer architecture (WMPI), pp. 9–15, 2004.
[13] ——, “A uniﬁed compressed memory hierarchy,” Proc. High-Performance Computer
Architecture (HPCA), pp. 201–212, 2005.
[14] S. Kim, J. Lee, J. Kim, and S. Hong, “Residue cache: A lowenergy low-area l2 cache
architecture via compression and partial hits,” Proc. the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 420–429, 2011.
[15] Y. Xie and G. Loh, “Thread-aware dynamic shared cache compression in multi-core
processors,” Proc. Computer Design (ICCD), pp. 135–141, 2011.
[16] S. Baek, H. G. Lee, C. Nicopoulos, J. Lee, and J. Kim, “ECM:Eﬀective capacity maximizer for high-performance compressed caching,” Proc. High-Performance Computer
Architecture (HPCA), pp. 131–142, 2013.
[17] J.-S. Lee, W.-K. Hong, and S.-D. Kim, “An on-chip cache compression technique to
reduce decompression overhead and design complexity,” Journal of Systems Architecture
(JSA), vol. 46, no. 15, pp. 1365–1382, 2000.
[18] D. Chen, E. Peserico, and L. Rudolph, “A dynamically partitionable compressed cache,”
Proc. the Singapore-MIT Alliance Symposium, 2003.
[19] L. Benini, D. Bruni, B. Ricco, A. Macii, and E. Macii, “An adaptive data compression
scheme for memory traﬃc minimization in processor-based systems,” IEEE International Symposium on Circuits and Systems (ISCAS), pp. 866–869, 2002.
[20] L. Benini, D. Bruni, A. Macii, and E. Macii, “Hardware-assisted data compression for
energy minimization in systems with embedded processors,” Proc. the Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 449–453, 2002.
[21] J.-S. Lee, W.-K. Hong, and S.-D. Kim, “Design and evaluation of a selective compressed
memory system,” Proc. Internationl Conference on Computer Design (ICCD), pp. 184–
191, 1999.
[22] A.-R. Adl-Tabatabai, A. M. Ghuloum, and S. O. Kanaujia, “Compression in cache
design,” Proc. the 21st annual international conference on Supercomputing (ICS), pp.
190–201, 2007.
[23] A. Seznec, “Decoupled sectored caches: conciliating low tag implementation cost,” Proc.
the 21st Annual International Symposium on Computer architecture (ISCA), pp. 384–
393, 1994.
[24] “SPEC2006 benchmarks.” [Online]. Available: http://www.specbench.org/osg/cpu2006/
[25] E. Rotenberg, S. Bennett, and J. E. Smith, “Trace cache: A low latency approach to
high bandwidth instruction fetching,” Proc. the 29th Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO), pp. 24–34, 1996.
[26] T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel, “Optimization of instruction fetch mechanisms for high issue rates,” Proc. the 22nd Annual International Symposium on Computer Architecture (ISCA), pp. 333–344, 2005.
[27] P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg,
F. Larsson, A. Moestedt, and B. Werner, “Simics: A full system simulation platform,”
IEEE Computer, pp. 50–58, 2002.
[28] M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill,
and D. Wood, “Multifacets general execution-driven multiprocessor simulator(GEMS)
toolset,” Computer Architecture News, pp. 92–99, 2005.
[29] T. K. Prakash and L. Peng, “Performance characterization of spec cpu2006 benchmarks
on intel core 2 duo processor,” ISAST Transactions on Computers and Software Engineering, pp. 36–41, 2008.
[30] C. Zhang, F. Vahid, and W. Najjar, “A highly conﬁgurable cache for low energy embedded systems,” ACM Transactions on Embedded Computing Systems, TECS, pp.
363–387, 2005.
[31] HP Laboratories Palo Alto, “CACTI 6.5.” [Online]. Available: ttp://www.hpl.hp.com/

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文