簡易檢索 / 詳目顯示

研究生: 何芯瑀
Ho,Hsin-Yu
論文名稱: 運用「資訊再使用距離」的多核心多層快取系統高效能初期設計最佳化方法
An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis
指導教授: 蔡仁松
Tsay, Ren-Song
口試委員: 許雅三
Hsu, Yar-Sun
呂士濂
Lu, Shih-Lien
學位類別: 碩士
Master
系所名稱:
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 36
中文關鍵詞: 多核心多層快取記憶體累加的資訊再使用距離
外文關鍵詞: multi-core, multi-level-caches, aggregated-reuse-distance
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本篇論文中,我們利用「資訊再使用距離」對目標程式的資料記錄做分析,並提出一個有效的方式對多核心的共享快取記憶體做最佳的設計。由於程式的資料記錄與系統硬體架構是彼此獨立的,因此設計者可以在設計初期輕易地使用我們的方式找到最佳的快取設計。我們想出一個非常有效且正確的方式對同時執行的程式產生累加的資訊再使用距離的圖形,並準確地對其效能做分析與最佳化。更重要的是,真正的共享快取記憶體的內容與累加的資訊再使用距離的圖形是相似的,因此我們提出的方法是有效的。在實驗中,利用我們的方法分析出來的快取未擊中的次數與真實的快取未擊中次數做比較,其錯誤率低於3.2%。使用一個簡單的掃描搜尋就可以在早期系統設計時決定出一個快取設計的最佳解。


    In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware architectures, a designer can easily compute the best cache design at early system design phase using our approach. We devise a very efficient and yet accurate method to derive the aggregated reuse-distance histograms of concurrent applications for accurate cache performance analysis and optimization. Essentially, the actual shared-cache contention results of concurrent applications are embedded in the aggregated reuse-distance histograms and therefore the approach is very effective. The experimental results show that the average error rate of shared-cache miss-count estimations of our approach is less than 3.2%. Using a simple scanning search method, one can easily determine the true optimal cache configurations at early system design phase.

    Contents----------------------------------------------- 4 List of Tables----------------------------------------- 5 List of Figures---------------------------------------- 6 1. Introduction----------------------------------- 7 2. Related Work----------------------------------- 12 3. Shared Cache Design Optimization--------------- 16 3.1 Aggregated reuse-distance computation---------- 17 3.2 Cache configuration optimization--------------- 21 4. Experiments------------------------------------ 25 4.1 Experimental setup----------------------------- 25 4.2 Results of One-level Shared Cache Designs------ 25 4.3 Results of Two-level Cache Designs------------- 26 4.4 Optimal Cache Designs-------------------------- 28 4.5 Discussions------------------------------------ 29 5. Conclusion------------------------------------- 31 Bibliography------------------------------------------- 32

    [1] Basu, Arkaprava, et al. "Scavenger: A new last level cache architecture with global block priority." Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2007. M. K.
    [2] Qureshi, Moinuddin K., et al. "Adaptive insertion policies for high performance caching." ACM SIGARCH Computer Architecture News. Vol. 35. No. 2. ACM, 2007.
    [3] Jaleel, Aamer, et al. "High performance cache replacement using re-reference interval prediction (RRIP)." ACM SIGARCH Computer Architecture News. Vol. 38. No. 3. ACM, 2010.
    [4] Khan, Samira, Yingying Tian, and Daniel Jiménez. "Sampling dead block prediction for last-level caches." Microarchitecture (MICRO), 2010 43rd Annual IEEE/ACM International Symposium on. IEEE, 2010.
    [5] Duong, Nam, et al. "Improving cache management policies using dynamic reuse-distances." Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2012.
    [6] Qureshi, Moinuddin K., and Yale N. Patt. "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches." Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006.
    [7] Xie, Yuejian, and Gabriel H. Loh. "PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches." ACM SIGARCH Computer Architecture News. Vol. 37. No. 3. ACM, 2009.
    [8] Kim, Seongbeom, Dhruba Chandra, and Yan Solihin. "Fair cache sharing and partitioning in a chip multiprocessor architecture." Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, 2004.
    [9] Chandra, Dhruba, et al. "Predicting inter-thread cache contention on a chip multi-processor architecture." High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on. IEEE, 2005.
    [10] Xu, Chi, et al. "Cache contention and application performance prediction for multi-core systems." Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. IEEE, 2010.
    [11] Sandberg, Andreas, David Black-Schaffer, and Erik Hagersten. "Efficient techniques for predicting cache sharing and throughput." Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM, 2012.
    [12] Liu, Chun, Anand Sivasubramaniam, and Mahmut Kandemir. "Organizing the last line of defense before hitting the memory wall for CMPs." Software, IEE Proceedings-. IEEE, 2004.
    [13] Brock, Jacob, et al. "Optimal cache partition-sharing." Parallel Processing (ICPP), 2015 44th International Conference on. IEEE, 2015.
    [14] Chang, Jichuan, and Gurindar S. Sohi. "Cooperative cache partitioning for chip multiprocessors." ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, 2014.
    [15] Suh, G. Edward, Larry Rudolph, and Srinivas Devadas. "Dynamic partitioning of shared cache memory." The Journal of Supercomputing 28.1 (2004): 7-26.
    [16] Mattson, Richard L., et al. "Evaluation techniques for storage hierarchies." IBM Systems journal 9.2 (1970): 78-117.
    [17] Subramanian, Lavanya, et al. "The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory." Proceedings of the 48th International Symposium on Microarchitecture. ACM, 2015.
    [18] Eklov, David, David Black-Schaffer, and Erik Hagersten. "Fast modeling of shared caches in multicore systems." Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM, 2011.
    [19] Chen, Xi E., and Tor M. Aamodt. "Modeling cache contention and throughput of multiprogrammed manycore processors." Computers, IEEE Transactions on61.7 (2012): 913-927.
    [20] Chen, Xi E., and Tor M. Aamodt. "A first-order fine-grained multithreaded throughput model." High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on. IEEE, 2009.
    [21] Carlson, Trevor E., Wim Heirman, and Lieven Eeckhout. "Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation." Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 2011.
    [22] Henning, John L. "SPEC CPU2006 benchmark descriptions." ACM SIGARCH Computer Architecture News 34.4 (2006): 1-17.
    [23] Jaleel, Aamer. "Memory characterization of workloads using instrumentation-driven simulation." Web Copy: http://www. glue. umd. edu/ajaleel/workload(2010).
    [24] Cheng-Lin Tsai, et al. "A Fast-and-Effective Early-Stage Multi-level Cache Optimization Method Based on Reuse-Distance Analysis." National Tsing Hua University, 2016.
    [25] Jaleel, Aamer, et al. "High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches." High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 2015.

    QR CODE