簡易檢索 / 詳目顯示

研究生: 蔡昕霓
Tsai, Shin-Ni
論文名稱: 異質多核心系統的晶片網路設計之效能分析
Evaluation of NoC Design for Heterogeneous Multicore Systems
指導教授: 金仲達
King, Chung-Ta
口試委員: 黃婷婷
Huang, Ting-Ting
黃稚存
Huang, Chih-Tsun
學位類別: 碩士
Master
系所名稱:
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 21
中文關鍵詞: 晶片網路異質系統
外文關鍵詞: multiprocessor interconnection network, heterogeneous system
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 異質多核心系統可結合各種不同功能的處理器在同一晶片上,使得此晶片上的負
    載可於與之最適的處理器上被執行。而在這些處理器中,圖形處理單元(Graphic
    Processing Unit, 後簡稱GPU) 為其中之一個使用頻率最高的處理器,不僅是因為其科技發展度足夠成熟,並且具備高運算效能。晶片網路(Network-on-Chip, NoC)常作為連結各個中央處理單元(Central Processing Unit, 後簡稱CPU)、快取記憶體(Caches)、及記憶體控制器(Memory Controller),因此晶片網路的設計會影響到整個系統的效能。一般而言CPU 和GPU 扮演不同的角色,負責執行不同類型的工作負載,產生至晶片網路的流量也是有不一樣的特徵。如GPU 為產生一次產生大量的流量,易受到整個網路的吞吐量影響。所以對於GPU 於晶片網路中的放置位置和配置給GPU 的網路資源皆需考量不影響GPU 的效能。本文主要對於晶片網路在不同處理器的擺置與不同的網路資源配置上作效能分析。使用gem5-gpu 模擬器模擬執行在異質CPU-GPU 多核心系統的程式並記錄對於記憶體存取的狀況。之後修改Garnet2.0 並使用記錄好的記憶體存取資訊,使之模擬CPU 和GPU 核心在晶片網路上不同的放置,對其作效能分析。


    Heterogeneous multicore systems integrate different types of processors on the same
    chip in order to match the workloads with the most appropriate processors. Among
    the different types of processors, Graphic Processing Unit (GPU) is one of the most
    commonly used processors, not only because of their technology maturity but also
    because of their high computing-power ratio. To interconnect the multiple CPUs,
    GPUs, caches, and memory controllers on a chip, a network-on-chip (NoC) is often
    used, whose design is very critical to the performance of the whole system. As
    CPUs and GPUs often play different roles and execute different workloads, the traffic
    injected into the NoC from CPUs and GPUs may have very different characteristics.
    GPU cores tend to generate bursty high-volume traffic, which is throughputsensitive.
    Therefore, the placement of GPU cores in the NoC and the network resources
    allocated to the GPUs should be designed carefully in order not to hinder
    the performance of GPUs. In this thesis, we evaluate the performance of NoC under
    different processor placement and network resource allocation. We use gem5-gpu
    simulator to simulate applications running in a heterogeneous CPU-GPU multicore
    system and record the memory access trace. We then modify Garnet2.0 to take the
    trace as input and evaluate the performance of different system configurations. It is
    observed that GPUs are sensitive to the proximity to memory from the evaluations.
    Therefore it can be referred as a critical impacting factor in terms of heterogeneous NoC design.

    摘要ii Abstract iii 1 Introduction 1 2 Methodology 3 2.1 The default gem5-gpu architecture . . . . . . . . . . . . . . . . . . . . . 3 2.2 The techniques of simulating Mesh topology for CPU-GPU heterogeneous system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 NoC simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.2 Lifetime of memory access in gem5-gpu simulator . . . . . . . . 5 2.2.3 Ensnare and process the memory access event from Ruby memory system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Implementation of simulating heterogeneous mesh NoC . . . . . . . . 6 2.3.1 Implementation issues . . . . . . . . . . . . . . . . . . . . . . . . 7 Issue 1: Virtual network mapping . . . . . . . . . . . . . . . . . 7 Issue 2: Response message delivery . . . . . . . . . . . . . . . . 8 Issue 3: Message delivery between cache controller and Garnet2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Evaluation of the heterogeneous NoC design . . . . . . . . . . . . . . . 8 3 Evaluation 9 3.1 Evaluation Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Application Workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Conclusions 19 vi Bibliography 20

    [1] Jieming Yin et al. “Energy-Efficient Time-Division Multiplexed Hybrid-Switched
    NoC for Heterogeneous Multicore Systems”. In: Proceedings of the 2014 IEEE
    28th International Parallel and Distributed Processing Symposium. IPDPS ’14.Washington,
    DC, USA: IEEE Computer Society, 2014, pp. 293–303. ISBN: 978-1-4799-
    3800-1. DOI: 10.1109/IPDPS.2014.40. URL: http://dx.doi.org/10.
    1109/IPDPS.2014.40.
    [2] Ali Bakhoda, John Kim, and Tor M. Aamodt. “Throughput-Effective On-Chip
    Networks for Manycore Accelerators”. In: Proceedings of the 2010 43rd Annual
    IEEE/ACM International Symposium on Microarchitecture. MICRO ’43. Washington,
    DC, USA: IEEE Computer Society, 2010, pp. 421–432. ISBN: 978-0-7695-
    4299-7. DOI: 10.1109/MICRO.2010.50. URL: http://dx.doi.org/
    10.1109/MICRO.2010.50.
    [3] Yaniv Ben-Itzhak, Israel Cidon, and Avinoam Kolodny. “Optimizing Heterogeneous
    NoC Design”. In: Proceedings of the International Workshop on System Level
    Interconnect Prediction. SLIP ’12. San Francisco, California: ACM, 2012, pp. 32–
    39. ISBN: 978-1-4503-1437-4. DOI: 10.1145/2347655.2347670. URL: http:
    //doi.acm.org/10.1145/2347655.2347670.
    [4] Jason Power et al. “gem5-gpu:AHeterogeneous CPU-GPU Simulator”. In: Computer
    Architecture Letters 13.1 (2014). ISSN: 1556-6056. DOI: 10 . 1109 / LCA .
    2014.2299539. URL: http://gem5-gpu.cs.wisc.edu.
    [5] Niket Agarwal et al. “GARNET: A detailed on-chip network model inside a
    full-system simulator”. In: Performance Analysis of Systems and Software, 2009.
    ISPASS 2009. IEEE International Symposium on. IEEE. 2009, pp. 33–42.
    [6] Shuai Che et al. “Rodinia: A Benchmark Suite for Heterogeneous Computing”.
    In: Proceedings of the 2009 IEEE International Symposium onWorkload Characterization
    (IISWC). IISWC ’09. Washington, DC, USA: IEEE Computer Society, 2009,
    pp. 44–54. ISBN: 978-1-4244-5156-2. DOI: 10.1109/IISWC.2009.5306797.
    URL: https://doi.org/10.1109/IISWC.2009.5306797

    QR CODE