簡易檢索 / 詳目顯示

研究生: 黎晉丞
Li, Jing Cheng
論文名稱: 解決探聽過濾器過時化問題的高效架構
An Efficient Architecture for Resolving the Aging Problem of Snoop Filter
指導教授: 張世杰
Chang, Shih Chieh
口試委員: 金仲達
King, Chung Ta
鍾文邦
Jone, Wen Ben
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 42
中文關鍵詞: 探聽式一致性協定探聽式過濾器過濾器復興
外文關鍵詞: Snoop-based coherence protocol, Snoop filter, Filter rejuvenation
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 快取一致性(Cache coherence)是指保留在快取記憶體(Cache)中的共享資源必須保持資料一致性的機制。其中,探聽式一致性協定由於其簡單的特性在多系統晶片應用非常普遍。快取控制器(Cache controller)會藉由對快取中快取塊(Cache line)相對應的標籤(Cache tag)進行快取標籤查詢(Cache tag lookup)來決定一筆資料是否存在快取中來回應每筆探聽式要求(Snoop request)。根據以往的研究表示,由於共享資源在不同的端點之間數量是有限的,約90%的探聽式要求是多餘的。這些多餘的要求會因為使快取控制器進行快取標籤查詢而浪費系統的能源。因此,探聽式過濾器(Snoop filter)就是被提出應用在篩選出無用的探聽式要求。探聽過濾器必須將所有快取讀取過的資料的位置(Address)壓縮進過濾器中。由於壓縮的特性,探聽式過濾器可能會做出錯誤的篩選又稱為假陽性(False positive)。所謂假陽性要求是指通過了過濾器並進入到快取中進行快取標籤查詢,才發現這是一筆多餘的要求。然而隨著時間,在過濾器中大量的壓縮資料會導致過濾器產生假陽性的篩選機率變高。所以一個低效率的過時化過濾器會導致許多浪費的標籤查詢。
    為了解決低效率的過時化過濾器所導致的問題,IBM提出了一個使過濾器更新的方法,並提出更新的時機點為發生快取掩蓋時(Cache wrap)。如果發生快取掩蓋的時機點太長,過濾器就會開始降低效率,甚至在過濾器更新後不能達到更新的目的。我們發現在一些應用(SPLASH 2)中,快取掩蓋發生的時機點很長,同時過濾器產生假陽性的篩選機率會升高。因此在這篇論文中,我們專注在如何更新一個發生過時化的過濾器而不是在如何設計一個過濾器上。我們提出我們的過濾器復興技術 (Filter rejuvenation technique) 來解決低效率的過時化過濾器所導致的問題。


    Snoop-based coherence protocol is very popular in multiprocessor systems because of its simplicity. In a snoop-based, many cache tag lookups are needed for snoop requests. However, it has been shown about 90% snoop requests are useless and therefore cache lookups are redundant. To reduce unnecessary cache lookups, the snoop filter scheme was proposed. However, it is known that the efficiency of a snoop filter decreases with time. In other words, an aging filter cannot filter out unnecessary requests. To solve the problem of an aging snoop filter, [8] has proposed a novel way to rejuvenate an aging snoop filter so that an aging filter can be refreshed to have high efficiency again. We observe that in several real designs, [8] fail to achieve effective rejuvenation. In this paper, we focus on how to rejuvenate a snoop filter design rather than to design the snoop filter itself. We propose a novel way of rejuvenating an aging snoop filter by four filter rejuvenation techniques. Our experimental results show that the proposed techniques, when works together, reduce the number of unnecessary requests to 62.23% and the energy consumption to 67.58% averagely. For the best case, we approximately reduce the number to 30% compared to [8].

    CONTENTS 中文摘要 III ABSTRACT VI CONTENTS VII LIST OF TABLES IX LIST OF FIGURES X Chapter 1 INTRODUCTION 1 Chapter 2 BACKGROUND AND MOTIVATION 6 2.1 Snoop Filter 6 2.1.1 Removing addresses from snoop filter 8 2.2 Filter Rejuvenation 8 2.3 Motivation 10 2.3.1 Slow cache wrap 10 2.3.2 Stubborn set 12 Chapter 3 FILTER REJUVENATION TECHNIQUE 14 3.1 Architecture Assumption 14 Chapter 4 FOUR TYPES OF REJUVENATION TECHNIQUES 15 4.1 Type 0 rejuvenation technique 15 4.2 Type 1 rejuvenation technique 16 4.3 Type 2 rejuvenation technique 17 4.3.1 Self-invalidation of a cache line 18 4.3.2 Problems of self-invalidation 18 4.3.3 No stubborn line in L1 19 4.3.4 L1 cache wrap detection 21 4.3.5 Reduce redundant learnings 23 4.4 Type 3 rejuvenation technique 23 4.4.1 L1 cache hardware modification 24 4.5 Cache wrap condition 26 Chapter 5 INTEGRATION FOR FOUR FILTER REJUVENATION TECHNIQUES 28 5.1 Type 0 state 28 5.2 Type 1 state 28 5.3 Type 2 state or type 3 state 29 Chapter 6 EXPERIMENTAL RESULTS 31 6.1 Architectural Simulation Setup 31 6.2 Filter rejuvenation analysis 32 6.2.1 False positive analysis 33 6.2.2 Filter learning analysis 35 Chapter 7 RELATED WORK 37 Chapter 8 CONCLUSIONS 40 REFERENCE 41 LIST OF TABLES Table 1: Conditions of cache wrap for all four techniques 26 Table 2: SPLASH­2 benchmark characteristics 31 Table 3: Multicore architecture modeled for SESC 32 Table 4: Number of cache wraps 34   LIST OF FIGURES Figure 1: Cache system with a snoop filter 2 Figure 2: Snoop filter false positive rate with cache wrap segment 10 Figure 3: Two stubborn set examples 11 Figure 4: Memory architecture assumption 13 Figure 5: Type 1 rejuvenation technique architecture 17 Figure 6: Type 2 rejuvenation technique architecture. 20 Figure 7: Type 3 rejuvenation technique architecture 24 Figure 8: Two kinds of the state machines to integrate our filter rejuvenation techniques 29 Figure 9: SPLASH­2 benchmarks with our filter rejuvenation techniques 36 Figure 10: Snoop based system with source-based snoop filters 38 Figure 11: Snoop based system with destination-based snoop filters 49

    REFERENCES
    [1] E. Atoofian and A. Baniasadi, “Using supplier locality in poweraware interconnects and caches in chip multiprocessors,” J. Systems Architecture 54(5): 507-518, 2008.
    [2] E. Atoofian, A. Baniasadi and K. Aasaraai, “Speculative supplier identification for reducing power of interconnects in snoopy cache coherence protocols,” CF 2007: 259-266.
    [3] M. Blumrich, V. Salapura and A. Gara, “Exploring the architecture of a stream register-based snoop filter,” 2011.
    [4] A. Moshovos, “RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence,”
    [5] A. Moshovos, G. Memik, B. Falsafi and A. Choudhary, “JETTY Filtering Snoops for Reduced Energy Consumption in SMP Servers,” HPCA, 2001.
    [6] J. Nilsson, A. Landin and Per Stenstrom, “The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure,” IPDPS, 2003.
    [7] J. Renau et al. SESC simulator, January 2005. http://sesc.sourceforge.net.
    [8] V. Salapura, M. A. Blumrich and A. Gara, “Design and implementation of the Blue Gene/P snoop filter,” HPCA, 2008.
    [9] V. Salapura, M. Blumrich and A. Gara, “Improving the accuracy of snoop filtering using stream registers,” MEDEA, 2007.
    [10] J. Singh, W.-D. Weber, and A. G. Splash, “Stanford parallel applications for shared memory. Computer Architecture News,” 1992.
    [11] D. Tarjan, S. Thoziyoor and N. P. Jouppi, “Cacti 4.0. Technical report,” Compaq Research Lab, 2006.
    [12] R. Ulfsnes, “A survey of low power design techniques for cache coherency in multiprocessor memory systems,” Semester project NTNU, 2012.
    [13] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” in 22nd International Symposium on Computer Architecture (ISCA), 1995.
    [14] D. H. Woo, M. Ghosh, E. Ozer, S. Biles and H.-H. S. Lee, “Reducing Energy of Virtual Cache Synonym Lookup using Bloom Filters,” CASES, 2006.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE