簡易檢索 / 詳目顯示

研究生: 彭飛
Peng, Fei
論文名稱: 支援OpenCL 2.0 HSA硬體平台模擬
OpenCL 2.0 Enabled HSA Hardware Platform Emulation
指導教授: 鍾葉青
Chung, Yeh Ching
口試委員: 徐慰中
Hsu, Wei Chung
金仲達
King, Chung Ta
洪士灝
Hung, Shih Hao
陳添福
Chen, Tien Fu
鍾葉青
Chung, Yeh Ching
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 34
中文關鍵詞: OpenCL 2.0異質系統架構HSA模擬器GPGPU
外文關鍵詞: OpenCL 2.0
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 異質系統架構是一個緊密地結合了中央處理器與多種類加速處理器的開放工業標準,而且支持廣泛資料平行的程式模型。雖然,現今已經有與異質系統架構相容的硬體不過並不是完全的支援。而且大多數運用異質系統架構的軟體都還在開發中,於此時提供一個可驗證程式與工具列的模擬環境是非常有幫助的。
    本文在這裡提出一個支援OpenCL 2.0版本的異質運算架構模擬器,是基於異質計算模擬器框架的系統模擬器。該模擬器提供了支援OpenCL 2.0新特性並能結合異質系統的系統模擬器,以異質系統架構的Shared Virtual Memory來支援OpenCL 2.0相同的特性,並額外提供Generic Address Space、Device Enqueue、Pipe、C11 atomic幾項OpenCL 2.0規範的特性,以利程式開發者能透過撰寫OpenCL的程式來對程式在異質系統架構上是否有利來進行驗證。在我們初步的實驗中,該模擬器已經能夠通過以上特性的基準測試,可以幫助開發人員驗證程式結果與程式效能。


    Heterogeneous System Architecture (HSA) is an open industry standard that tightly coupled the CPU with variety accelerators and also designed to support data-parallel programming models. Although there is a HSA-compatible machine, it is still not the HSA fully supported machine. A lot of software components using HSA is in development, so it is useful by providing an emulation environment for verifying HSA software components and tool-chains
    In this paper, we introduce a HSA emulation platform that can support OpenCL 2.0, which is based on HSAemu framework. HSA emulation platform provides a plat-form that combines OpenCL 2.0 with Heterogeneous System Architecture, using Shared Virtual Memory in HSA to achieve the same feature in OpenCL 2.0, and sup-ply with other new features such as Generic Address Space, Device Enqueue, Pipe, C11 atomic. Programmers can verify whether the program written in OpenCL can leverage the Heterogeneous System Architecture. In our preliminary experiments, the HSAemu has been validated by those extra features mentioned above. It can help developers to verify the program results and performances.

    Chapter 1 Introduction 1 Chapter 2 Related Work 4 Chapter 3 Architecture 6 3.1 CPU Simulation Module 7 3.2 GPU Task Dispatcher 8 3.3 GPU Simulators 8 3.4 GPU Helper Functions 9 Chapter4 Improvement 10 4.1 Floating Pointer Exception Handler 10 4.1.1 Specifications and Strategies 10 4.1.2 Implementation on Emulation Platform 12 4.2 Cache Coherence Profiling 13 4.2.1 Specifications 13 4.2.2 Design of CPU and GPU Share Cache 14 CHAPTER 5 OpenCL 2.0 Hardware Implementation 16 5.1 Shared Virtual Memory 17 5.2 Generic Address Space 18 5.3 Atomic C11 19 5.4 Device Enqueue 20 5.5 Pipe 23 Chapter 6 Experiment and Result 24 6.1 Benchmarks 24 6.2 Experimental Results 26 Chapter 7 Conclusion and Future Work 30 REFERENCE 31

    [1] Khronos Working Group. (2014) The OpenCL Specification Version: 2.0.
    [2] Khronos Working Group. (2014) The OpenCL C Specifications Version: 2.0.
    [3] Fabrice Bellard, "QEMU, a fast and portable dynamic translator," in USENIX Annual Technical Conference, 2005, pp. 41-46.
    [4] HSA foundation. (2015) HSA Platform System Architecture Specification 1.0.
    [5] HSA foundation. (2015) HSA Programmer Reference Manual Specification 1.0.
    [6] HSA foundation. (2015) HSA Runtime Specification 1.0.
    [7] Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R, Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, David A. Wood Nathan Binkert, "The gem5 simulator," ACM SIGARCH Computer Architecture News, pp. 1-7, 2011.
    [8] Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, Vijay Janapa Reddi Jingwen Leng, "GPUWattch: Enabling Energy Optimizations in GPGPUs," in International Symposium on Computer Architecture, 2013, pp. 487-498.
    [9] Emmett Witchel and Mendel Rosenblum, "Embra: fast and flexible machine simulation," Proceedings of ACM SIGMETRICS international conference on Measurement and Modeling of computer systems, pp. 68-79, 1996.
    [10] Tei-Wei Kuo, Chi-Sheng Shih, and Chia-Heng Tu Shih-Hao Hung, "System wide profiling and optimization with virtual machines," Asia and South Pacific Design Automation Conference, pp. 395-400, 2012.
    [11] Robert E. Lantz, "Fast Functional Simulation with Parallel Embra," in In proceeedings of Workshop on Modeling, Benchmarking and Simulation (MoBS), 2008.
    [12] Po-Chun Chang, Wei-Chung Hsu, and Yeh-Ching Chung Jiun-Hung Ding, "PQEMU: A Parallel System Emulator Based on QEMU," in IEEE 17th International Conference on Parallel and Distributed Systems, 2011, pp. 276-283.
    [13] Z. Wang et al., "COREMU: a scalable and portable parallel full-system emulator," in Principles and Practice of Parallel Programming, 2011, pp. 213-222.
    [14] George L. Yuan, Wilson W. L. Fung, Henry Wong and Tor M. Aamodt Ali Bakhoda, "Analyzing CUDA workloads using a detailed GPU simulator," Performance Analysis of Systems and Software, pp. 163-174, 2009.
    [15] Marc Daumas, David Defour and David Parellol Sylvain Collange, "Barra: a parallel functional simulator for GPGPU," IEEE international Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 251-360, 2010.
    [16] Joel Hestness, Marc S. Orr, Mark D. Hill, and David A. Wood Jason Power, "gem5-gpu: A Heterogeneous CPU-GPU Simulator," Computer Architecture Letters, p. 1, 2014.
    [17] Sahuquillo, S.Petit, and P.Lopez R. Ubal, "Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors," in Computer Architecture and High PerformanceComputing, 2007, pp. 62-68.
    [18] Andrew Robert Kerr, Sudhakar Yalamanchili and Nathan Clark Gregory Frederick Diamos, "Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems," in Proceedings of international conference on Parallel architectures and compilation techniques, 2010.
    [19] Bai-Cheng Jeng, Shih-Hai Hung, Wei-Chung Hsu, and Yeh-Ching Chung Jiun-Hung Ding, "HSAemu - A Full System Emulator for HSA Platforms," Proceedings of ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES + ISSS), p. Article 26, October 2014.
    [20] Kuo-Min Lin and Yeh-Ching Chung, "A Compilation Framework for HSA," , 2014.
    [21] Zhou-Dong Guo and Yeh-Ching Chung, "HSA emulator design based on QEMU," , 2013.
    [22] Chung-Min Kao and Yeh-Ching Chung, "The LLVM based GPU Compiler in Heterogeneous System Architecture Emulator: HTranslator," , 2013.
    [23] Jui Hsiao and Yeh-Ching Chung, "An OpenCL 2.0 Compilation Framework for HSA," , 2015.
    [24] Wei-Chih Sun and Yeh-Ching Chung, "An OpenCL 2.0 Runtime based on HSA Runtime," , 2015.
    [25] Che-Yang Kuo and Yeh-Ching Chung, "Implementation Of Image Feature Supports in HSAemu Framework," , 2015.
    [26] Bai-Cheng Jeng and Yeh-Ching Chung, "HSAemu Framework," , 2014.
    [27] Advanced Micro Devices. (2013) AMD OpenCL™ Accelerated Parallel Processing SDK. [Online]. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/

    [28] Robert Ioffe (intel) and Adam Lake (intel). (2015) The Generic Address Space in OpenCL™ 2.0. [Online]. https://software.intel.com/en-us/articles/the-generic-address-space-in-opencl-20

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE