簡易檢索 / 詳目顯示

研究生: 曹靜
Tsao, Jean
論文名稱: 在窺探式快取記憶體資料一致性協定下之無干擾快取架構
Cache Interference Free Architecture in Snoop Based Cache Coherence Protocol
指導教授: 張世杰
Chang, Shih Chieh
口試委員: 金仲達
King, Chung Ta
鍾文邦
Jone, Wen Ben
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 40
中文關鍵詞: 窺探式快取記憶體資料一致性協定無干擾快取架構
外文關鍵詞: Snoop Based Cache Coherence Protocol, Cache Interference Free
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在寫無效化之窺探式快取記憶體資料一致性協定下,會影響一致性的寫動作需要被傳播給其他快取記憶體以無效化資料。通常,由於每筆無效化都需要進行快取記憶體位置查詢,所以快取記憶體無效化會干擾正常情況的快取記憶體之運行,並對其造成效能方面的影響。然而,許多快取記憶體窺探而得的無效化訊息是多餘的。在這篇論文中,我們將多餘的快取記憶體無效化訊息做分類,並且提出一個能消除所有種類的無效化訊息對快取記憶體造成干擾的新穎架構。概念上來說,我們用一個表格來存取進來的無效化訊息。然而為了實現這個表格,我們必須解決有限容量的問題。在這篇論文中,我們也提出了一個能刪去過時的無效化訊息並解決表格大小限制的方法。我們的實驗結果顯示,平均來說,在四個處理器的情況下,我們的方法比MESI協定快8.0%,而在八個處理器的情況下,我們的方法比MESI協定快30.01%。


    In write-invalidate snoop based cache coherence protocol, write operations that affect coherence are broadcasted to invalidate data in other caches. In general, cache invalidation interferes normal cache operation and has performance impact because each invalidation need to perform cache address lookup. However, a lot of invalidation messages snooped by the cache are redundant. In this paper, we classify the redundant cache invalidation messages and propose a novel architecture that can eliminate the interference of all categories of redundant invalidation messages to the cache. Conceptually, we use a table to store the incoming invalidation messages. To realize this table, nevertheless, we have to resolve the limit-capacity issue. In this paper, we also propose a methodology for deleting those outdated invalidation messages and resolving the size limitation of the table. Our results show that on average, our approach achieves 8.0% faster than MESI protocol running on four processors, and 30.01% faster than MESI on eight processors.

    List of Contents VII List of Figures IX List of Tables X CHAPTER 1 INTRODUCTION 1 CHAPTER 2 BACKGROUND 7 CHAPTER 3 CACHE INTERFERENCE FREE ARCHITECTURE 9 3.1 Definition 10 3.2 Concept Overview 11 CHAPTER 4 RESOLVING LIMIT-CAPACITY ISSUE OF INVALIDATION TABLE 13 CHAPTER 5 IMPLEMENTATION DETAILS 17 5.1 Hardware Structure 17 5.2 Implementation of Invalidation Table 19 5.3 Multi-level Caches 19 5.4 Detemination of Reloading 20 CHAPTER 6 EVALUATION METHODOLOGY 25 6.1 Simulation and System Configuration 25 6.2 Benchmarks 25 CHAPTER 7 EXPERIMENTAL RESULTS 28 7.1 Performance Results 28 7.2 Impact on Checking the Invalidation Table 29 7.3 Limit-capacity of Invalidation Table 30 7.4 Different Numbers of Processors 32 CHAPTER 8 RELATED WORK 34 CHAPTER 9 CONCLUSIONS 36 REFERENCES 37

    [1] S.V. Adve and K. Gharachorloo, “Shared Memory Consistency Models: A Tutorial,” IEEE Computer, 1996.
    [2] T. J. Ashby, P. Diaz, and M. Cintra, “Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters,” IEEE Transactions on Computers (TC), 2011.
    [3] F. Briggs, S. Chittor, and K. Cheng, “Micro-architecture Techniques in the Intel E8870 Scalable Memory Controller,” in 3rd Workshop on Memory Performance Issues (WMPI), 2004.
    [4] B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C. T. Chou, “DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism,” in 20th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2011.
    [5] M. Elver and V. Nagarajan, “TSO-CC: Consistency directed cache coherence for TSO,” in 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.
    [6] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors,” in 17th International Symposium on Computer Architecture (ISCA), 1990.
    [7] S. Kaxiras and G. Keramidas, “SARC Coherence: Scaling Directory Cache Coherence in Performance and Power,” IEEE Micro, 2010.
    [8] C. Keltcher, K. McGrath, A. Ahmed, and P. Conway, “The AMD Opteron processor for multiprocessor servers,” IEEE Micro, 2003.
    [9] A. C. Lai and B. Falsafi, “Selective, Accurate, and Timely Self-Invalidation Using Last-Touch Prediction,” in 27th International Symposium on Computer Architecture (ISCA), 2000.
    [10] A. R. Lebeck and D. A. Wood, “Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors,” in 22nd International Symposium on Computer Architecture (ISCA), 1995.
    [11] P. E. McKenney, “Memory Barriers: a Hardware View for Software Hackers,” Linux Technology Center, IBM Beaverton, 2010.
    [12] A. Moshovos, G. Memik, B. Falsafi, and A. Choudhary, “JETTY: Filtering snoops for reduced energy consumption in SMP servers,” in 7th International Symposium on High Performance Computer Architecture (HPCA), 2001.
    [13] A. Naeem, A. Jantsch, and Z. Lu, “Scalability Analysis of Memory Consistency Models in NoC-based Distributed Shared Memory SoCs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2013.
    [14] M. S. Papamarcos and J. H. Patel, “A Low-overhead Coherence Solution for Multiprocessors with Private Cache Memories,” in 11th International Symposium on Computer Architecture (ISCA), 1984.
    [15] A. Ranganathan, A. G. Bayrak, T. Kluter, P. Brisk, E. Charbon, and P. Ienne, “Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture,” in International Conference on Embedded Computer Systems (SAMOS), 2012.
    [16] J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos, “SESC simulator”, http://sesc.sourceforge.net, 2005.
    [17] A. Ros and S. Kaxiras, “Complexity-Effective Multicore Coherence,” in 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.
    [18] V. Salapura, M. Blumrich, and A. Gara, “Design and Implementation of the Blue Gene/P Snoop Filter,” in 14th International Symposium on High Performance Computer Architecture (HPCA), 2008.
    [19] H. Song, S. Dharmapurikar, J. Turner, and J. Lockwood, “Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing,” in Special Interest Group on Data Communication (SIGCOMM), 2005.
    [20] D. J. Sorin, M. D. Hill, and D. A. Wood, “A Primer on Memory Consistency and Cache Coherence,” Synthesis Lectures on Computer Architecture, 2011.
    [21] H. Sung and S. V. Adve, “DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations,” in 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.
    [22] H. Sung, R. Komuravelli, and S. V. Adve, “DeNovoND: Efficient Hardware Support for Disciplined Non-Determinism,” in 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2013.
    [23] P. Sweazey and A. J. Smith, “A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus,” in 13th International Symposium on Computer Architecture (ISCA), 1986.
    [24] R. Ulfsnes, “Design of a Snoop Filter for Snoop Based Cache Coherency Protocols,” Thesis, Norwegian University of Science and Technology, 2013.
    [25] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” in 22nd International Symposium on Computer Architecture (ISCA), 1995.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE