簡易檢索 / 詳目顯示

研究生: 蔡政霖
Tsai, Cheng Lin
論文名稱: 一套以再使用距離的分析方法在設計初期去快速且有效的針對多層快取做最佳化
A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method Based on Reuse-Distance Analysis
指導教授: 蔡仁松
Tsay, Ren Song
口試委員: 許雅三
Hsu, Yar Sun
楊佳玲
Yang, Chia Lin
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2016
畢業學年度: 104
論文頁數: 34
中文關鍵詞: 再使用距離多層快取
外文關鍵詞: reuse distance, multi-level cache
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本篇論文裡,我們去歸納「再使用之距離」的分析方法,並且用來發展一套有效率而且有實用性地去探索多層快取記憶體的最佳化結果的方法。針對快取記憶體的大小、成本、功耗、或者平均存取資料的延遲等最佳化問題,我們採用簡單的掃描式搜索來找到上述所列的問題的最佳解。而我們提出的方法比過去傳統用模擬的方式經驗證後顯示快上150~250倍不等,因此對於系統設計者在設計初期是非常有用的。此外,我們也提出一個簡單的分析式模型來提供系統設計者了解快取記憶體的設計參數是如何影響最佳化的結果。因此系統設計者能夠在系統設計的早期階段就能夠對最佳化的設計方相作出一個適當的設計目標走向。


    In this paper we generalize the reuse distance analysis method and develop an effective and practical multi-level cache design optimization approach. We adopt a simple scanning search method to locate optimal cache solution in terms of cache size, power consumption or average data access delay. The proposed approach is particularly useful for early-phase system designers and is verified to be 150 to 250 times faster than the traditional simulation-based approach and. In addition, we also propose a simplified analytical model and provide designers insights about how cache design parameters may affect the expected results. As a result, designers can make adequate decision at early system design phase.

    Contents 5 List of Tables 6 List of Figures 7 1. Introduction 8 2. Related Work 12 2.1 Design Space Exploration through Simulations 12 2.2 Replacement Policy Optimization 13 2.3 Reuse Distance Analysis 14 3. The Reuse-Distance Approach for Multi-level Cache Designs 16 3.1 Exclusive/Inclusive Cache 16 3.2 Applying Reuse Distance in Multi-level Designs 18 4. A Systematic Cache Optimization Method 20 4.1 Scanning Search for Optimal Designs 20 4.2 Analytical Model based-on Reuse Distance 24 5. Experiments 27 5.1 Methodology 27 5.2 Evaluation Results 28 5.3 Verify Insensitivity to Replacement Policy and Way-associativity 29 5.4 Discussions 30 6. Conclusions 32 Bibliography 33

    [1] Jaleel, Aamer. "Memory characterization of workloads using instrumentation-driven simulation." Web Copy: http://www. glue. umd. edu/ajaleel/workload (2010).
    [2] Alipour, Mehdi, Kamran Moshari, and Mohammad Reza Bagheri. "Performance per power optimum cache architecture for embedded applications, a design space exploration." Networked Embedded Systems for Enterprise Applications (NESEA), 2011.
    [3] Alipour, Mehdi, and Mostafa E. Salehi. "Design Space Exploration to Find the Optimum Cache and Register File Size for Embedded Applications." Internatinal Conference on Embedded System and Applications , 2012.
    [4] Nawinne, Isuru, and Sri Parameswaran. "A survey on exact cache design space exploration methodologies for application specific SoC memory hierarchies." Industrial and Information Systems (ICIIS), 2013.
    [5] Mattson, Richard L., et al. "Evaluation techniques for storage hierarchies." IBM Systems journal 9.2 (1970): 78-117.
    [6] Berg, Erik, and Erik Hagersten. "StatCache: a probabilistic approach to efficient and accurate data locality analysis." Performance Analysis of Systems and Software, 2004 IEEE International Symposium on-ISPASS. IEEE, 2004.
    [7] Niu, Qingpeng, et al. "PARDA: A fast parallel reuse distance analysis algorithm." Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International. IEEE, 2012.
    [8] Qureshi, Moinuddin K., et al. "Adaptive insertion policies for high performance caching." ACM SIGARCH Computer Architecture News. Vol. 35. No. 2. ACM, 2007.
    [9] Jaleel, Aamer, et al. "High performance cache replacement using re-reference interval prediction (RRIP)." ACM SIGARCH Computer Architecture News. Vol. 38. No. 3. ACM, 2010.
    [10] Qureshi, Moinuddin K., and Yale N. Patt. "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches." Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006.
    [11] Sanchez, Daniel, and Christos Kozyrakis. "Vantage: scalable and efficient fine-grain cache partitioning." ACM SIGARCH Computer Architecture News. Vol. 39. No. 3. ACM, 2011.
    [12] Sundararajan, Karthik T., et al. "Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs." High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012.
    [13] Jouppi, Norman P., and Steven JE Wilton. "Tradeoffs in two-level on-chip caching." Computer Architecture, 1994., Proceedings the 21st Annual International Symposium on. IEEE, 1994.
    [14] Hamerly, Greg, et al. "Simpoint 3.0: Faster and more flexible program phase analysis." Journal of Instruction Level Parallelism 7.4 (2005): 1-28.
    [15] Luk, Chi-Keung, et al. "Pin: building customized program analysis tools with dynamic instrumentation." ACM Sigplan Notices. Vol. 40. No. 6. ACM, 2005.
    [16] Burger, Doug, and Todd M. Austin. "The SimpleScalar tool set, version 2.0." ACM SIGARCH Computer Architecture News 25.3 (1997): 13-25.
    [17] Henning, John L. "SPEC CPU2006 benchmark descriptions." ACM SIGARCH Computer Architecture News 34.4 (2006): 1-17.
    [18] Guan, Nan, et al. "WCET analysis with MRU cache: Challenging LRU for predictability." ACM Transactions on Embedded Computing Systems (TECS) 13.4s (2014): 123.
    [19] Wu, C. J., Jaleel, A., Martonosi, M., Steely Jr, S. C., & Emer, J. (2011, December). PACMan: prefetch-aware cache management for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 442-453). ACM.
    [20] [1] Zhang, Chuanjun, Frank Vahid, and Walid Najjar. "A highly configurable cache architecture for embedded systems." Computer Architecture, 2003. Proceedings. 30th Annual International Symposium on. IEEE, 2003.
    [21] [2] Sharifi, Akbar, et al. "PEPON: performance-aware hierarchical power budgeting for NoC based multicores." Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 2012.
    [22] Wilton, Steven JE, and Norman P. Jouppi. "CACTI: An enhanced cache access and cycle time model." Solid-State Circuits, IEEE Journal of 31.5 (1996): 677-688.
    [23] http://www.intel.com/content/www/us/en/processors/core/core-i7-processor.html

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE