一套以再使用距離的分析方法在設計初期去快速且有效的針對多層快取做最佳化

簡易檢索 / 詳目顯示

回結果列表

研究生：	蔡政霖 Tsai, Cheng Lin
論文名稱：	一套以再使用距離的分析方法在設計初期去快速且有效的針對多層快取做最佳化 A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method Based on Reuse-Distance Analysis
指導教授：	蔡仁松 Tsay, Ren Song
口試委員:	許雅三 Hsu, Yar Sun 楊佳玲 Yang, Chia Lin
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2016
畢業學年度：	104
論文頁數：	34
中文關鍵詞：	再使用距離、多層快取
外文關鍵詞：	reuse distance, multi-level cache
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本篇論文裡，我們去歸納「再使用之距離」的分析方法，並且用來發展一套有效率而且有實用性地去探索多層快取記憶體的最佳化結果的方法。針對快取記憶體的大小、成本、功耗、或者平均存取資料的延遲等最佳化問題，我們採用簡單的掃描式搜索來找到上述所列的問題的最佳解。而我們提出的方法比過去傳統用模擬的方式經驗證後顯示快上150~250倍不等，因此對於系統設計者在設計初期是非常有用的。此外，我們也提出一個簡單的分析式模型來提供系統設計者了解快取記憶體的設計參數是如何影響最佳化的結果。因此系統設計者能夠在系統設計的早期階段就能夠對最佳化的設計方相作出一個適當的設計目標走向。

In this paper we generalize the reuse distance analysis method and develop an effective and practical multi-level cache design optimization approach. We adopt a simple scanning search method to locate optimal cache solution in terms of cache size, power consumption or average data access delay. The proposed approach is particularly useful for early-phase system designers and is verified to be 150 to 250 times faster than the traditional simulation-based approach and. In addition, we also propose a simplified analytical model and provide designers insights about how cache design parameters may affect the expected results. As a result, designers can make adequate decision at early system design phase.

Contents    5
List of Tables    6
List of Figures    7
   Introduction    8
   Related Work    12
1    Design Space Exploration through Simulations    12
2    Replacement Policy Optimization    13
3    Reuse Distance Analysis    14
   The Reuse-Distance Approach for Multi-level Cache Designs    16
1    Exclusive/Inclusive Cache    16
2    Applying Reuse Distance in Multi-level Designs    18
   A Systematic Cache Optimization Method    20
1     Scanning Search for Optimal Designs    20
2     Analytical Model based-on Reuse Distance    24
   Experiments    27
1     Methodology    27
2     Evaluation Results    28
3     Verify Insensitivity to Replacement Policy and Way-associativity    29
4     Discussions    30
   Conclusions    32
Bibliography    33

                                

[1] Jaleel, Aamer. "Memory characterization of workloads using instrumentation-driven simulation." Web Copy: http://www. glue. umd. edu/ajaleel/workload (2010).
[2] Alipour, Mehdi, Kamran Moshari, and Mohammad Reza Bagheri. "Performance per power optimum cache architecture for embedded applications, a design space exploration." Networked Embedded Systems for Enterprise Applications (NESEA), 2011.
[3] Alipour, Mehdi, and Mostafa E. Salehi. "Design Space Exploration to Find the Optimum Cache and Register File Size for Embedded Applications." Internatinal Conference on Embedded System and Applications , 2012.
[4] Nawinne, Isuru, and Sri Parameswaran. "A survey on exact cache design space exploration methodologies for application specific SoC memory hierarchies." Industrial and Information Systems (ICIIS), 2013.
[5] Mattson, Richard L., et al. "Evaluation techniques for storage hierarchies." IBM Systems journal 9.2 (1970): 78-117.
[6] Berg, Erik, and Erik Hagersten. "StatCache: a probabilistic approach to efficient and accurate data locality analysis." Performance Analysis of Systems and Software, 2004 IEEE International Symposium on-ISPASS. IEEE, 2004.
[7] Niu, Qingpeng, et al. "PARDA: A fast parallel reuse distance analysis algorithm." Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International. IEEE, 2012.
[8] Qureshi, Moinuddin K., et al. "Adaptive insertion policies for high performance caching." ACM SIGARCH Computer Architecture News. Vol. 35. No. 2. ACM, 2007.
[9] Jaleel, Aamer, et al. "High performance cache replacement using re-reference interval prediction (RRIP)." ACM SIGARCH Computer Architecture News. Vol. 38. No. 3. ACM, 2010.
[10] Qureshi, Moinuddin K., and Yale N. Patt. "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches." Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006.
[11] Sanchez, Daniel, and Christos Kozyrakis. "Vantage: scalable and efficient fine-grain cache partitioning." ACM SIGARCH Computer Architecture News. Vol. 39. No. 3. ACM, 2011.
[12] Sundararajan, Karthik T., et al. "Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs." High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012.
[13] Jouppi, Norman P., and Steven JE Wilton. "Tradeoffs in two-level on-chip caching." Computer Architecture, 1994., Proceedings the 21st Annual International Symposium on. IEEE, 1994.
[14] Hamerly, Greg, et al. "Simpoint 3.0: Faster and more flexible program phase analysis." Journal of Instruction Level Parallelism 7.4 (2005): 1-28.
[15] Luk, Chi-Keung, et al. "Pin: building customized program analysis tools with dynamic instrumentation." ACM Sigplan Notices. Vol. 40. No. 6. ACM, 2005.
[16] Burger, Doug, and Todd M. Austin. "The SimpleScalar tool set, version 2.0." ACM SIGARCH Computer Architecture News 25.3 (1997): 13-25.
[17] Henning, John L. "SPEC CPU2006 benchmark descriptions." ACM SIGARCH Computer Architecture News 34.4 (2006): 1-17.
[18] Guan, Nan, et al. "WCET analysis with MRU cache: Challenging LRU for predictability." ACM Transactions on Embedded Computing Systems (TECS) 13.4s (2014): 123.
[19] Wu, C. J., Jaleel, A., Martonosi, M., Steely Jr, S. C., & Emer, J. (2011, December). PACMan: prefetch-aware cache management for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 442-453). ACM.
[20] [1] Zhang, Chuanjun, Frank Vahid, and Walid Najjar. "A highly configurable cache architecture for embedded systems." Computer Architecture, 2003. Proceedings. 30th Annual International Symposium on. IEEE, 2003.
[21] [2] Sharifi, Akbar, et al. "PEPON: performance-aware hierarchical power budgeting for NoC based multicores." Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 2012.
[22] Wilton, Steven JE, and Norman P. Jouppi. "CACTI: An enhanced cache access and cycle time model." Solid-State Circuits, IEEE Journal of 31.5 (1996): 677-688.
[23] http://www.intel.com/content/www/us/en/processors/core/core-i7-processor.html

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文