基於LLVM技術開發之異質核心模擬器中GPU編譯器 : HTranslator

簡易檢索 / 詳目顯示

回結果列表

研究生：	高崇閔 Kao, Chung-Min
論文名稱：	基於LLVM技術開發之異質核心模擬器中GPU編譯器 : HTranslator The LLVM based GPU Compiler in Heterogeneous System Architecture Emulator: HTranslator
指導教授：	鍾葉青
口試委員:	徐慰中洪士灝
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2013
畢業學年度：	101
語文別：	英文
論文頁數：	31
中文關鍵詞：	異質架構系統、SIMD 、GPU 、模擬器、編譯器
外文關鍵詞：	heterogeneous system architecture, SIMD, GPU, emulator, compiler
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

異質系統架構(HSA)是由HSA基金會制定之工業標準，許多重要的應用處理器廠商皆為此基金會的成員，如：超微、安謀、聯發科技、三星以及高通，本論文將基於根據此標準開發之模擬器，闡述模擬器中GPU部分之編譯器設計及實作，並且產生Single Instruction Multiple Data(SIMD)指令進行優化。
模擬異質系統架構GPU執行的過程中，CPU相較於實體GPU在執行緒數目上顯得相當缺乏，倘若每次GPU的執行都交由一個CPU的執行緒執行，每個執行緒都將被分配到多個原先實體GPU的工作並依序執行之，在大部分的情況下，GPU皆是在不同的資料上執行相同的指令，在這種情況下加入SIMD指令，便可藉由硬體的幫助在一個SIMD指令內同時處理數筆資料，讓一個執行緒完成原先需要數個執行緒才能完成的工作，進而提升模擬器執行效率並更貼近GPU實際運作。
在條件跳躍指令存在的情況下，不同的GPU其跳躍目的位址可能不同，進而無法直接使用SIMD的指令進行模擬，因此，編譯器產生機器碼之前須重新建構程式執行流程，確保任一目的位址所指向區塊中所有指令都將被執行，同時為了確保執行結果的正確性，使用bitmap紀錄各GPU條件跳躍的結果，條件跳躍發生的同時，會將各GPU是否跳躍寫入bitmap中，對於那些GPU不該執行此目的位址指令的部分，則利用此bitmap遮蔽其執行結果。

Heterogeneous System Architecture (HSA) is an open industry standard formulated by HSA foundation. Many Application processor vendors, such as AMD, ARM, Me-dia Tek, Samsung, qualcomm are member of it. This thesis will focus on emulator base on this standard, and descript GPU compiler design. In additional, add Single Instruction Multiple Data (SIMD) instruction to speed up emulator’s execution.
In the procedure of simulation GPU’s execution with the heterogeneous system ar-chitecture, the number of threads in CPU is far less than in physical GPU. If emulator assigns each physical GPU’s task to a CPU thread, each thread will receive more than one task and iterate complete them. In most situations, physical GPUs are executing same instructions to deal with different data. In these cases, it can add SIMD instruc-tion to speed up the execution. With the help of hardware, emulator can handle dif-ferent data at the same time in the clocks of a SIMD instruction and make a thread completes tasks assigned few threads before. Then improve emulator’s performance and this way is much closer physical GPU’s execution.
When conditional branch instructions exist, different GPU may jump to different target address and can’t be simulated by SIMD instructions straightly. To resolve this case, before compiler generates target code, it should reconstruct the control flow of program to make sure each instruction in blocks pointed by target address will be ex-ecuted. To avoid adding SIMD instruction in emulator and reconstructing control flow can still get correct result, it’s necessary to use a bitmap to record conditional jump’s result of GPUs. When compiler finds conditional jump instructions, it writes result of GPUs into bitmap. For these GPUs should not execution instructions in the block, emulator using bitmap to mask the result.

Chapter 1    Introduction    6
Chapter 2    Background    8
2.1 Heterogeneous System Architecture    8
2.2 Heterogeneous System Architecture Intermediate Language    8
Chapter 3    Related Work    9
3.1 HSA Emulator: HSAemu    9
3.2 Tiny Code Generator    9
3.3 Multi2Sim    9
3.4 Whole-Function Vectorization    10
3.5 LLVM infrastructure    10
Chapter 4 Translator in GPU Simulation    11
4.1 GPU simulation in HSA Emulator    11
4.2 GPU Just-In-Time Translator    12
4.2.1 Just-In-Time Translator    12
4.2.2 Linker and Loader    13
4.2.3 Special Instruction Simulation    13
    Memory Relative Instruction    14
    Mathematical Instruction    14
    Kernel Information Instruction    14
    Synchronization Instruction    14
CHAPTER 5    SIMD Instruction in GPU Simulation    16
5.1 Single Instruction Multiple Data    17
5.2 The Control Flow Graph Reconstruction    17
5.3 How to Do Bitmap Masking?    21
CHAPTER 6    Experiment Results and Discussion    25
6.1 Benchmarks    25
6.2 Experimental results    26
CHAPTER 7    Conclusion and Future Work    29
REFERENCE    30

                                

[1] HSA_PRM_Proposed_Version_1.2_27_August_2012
[2] HSA_Software_System_Architecture_Specification_Version_1.1_27_July_2012
[3] HSA_Hardware_System_Architecture_Specification_Version_1.1_27_July_2012
[4] OpenCL http://www.khronos.org/opencl/
[5] Zhou-Dong Guo; Yeh-Ching Chung, HSA emulator design based on QEMU
[6] Rafael Ubal; Byunghyun Jang; Perhaad Mistry; Dana Schaa; David Kaeli, Multi2Sim: A Simulation Framework for CPU-GPU Computing, Computer Ar-chitecture and High Performance Computing, 2007. SBAC-PAD 2007. 19th In-ternational Symposium on
[7] Karrenberg, R.; Hack, S. Whole-Function Vectorization Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on
[8] R. Karrenberg; S. Hank, Improving Performance of OpenCL on CPUs, Compiler Construction, 2012
[9] Chris Lattner and Vikram Adve, LLVM: "A Compilation Framework for Lifelong Program Analysis & Transformation", Proceedings of the 2004 Interna-tional Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar. 2004.
[10] Zhaoguo Wang; Ran Liu; Yufei Chen; Xi Wu; Haibo Chen; Weihua Zhang; Binyu Zang, COREMU: A Scalable and Portable Parallel Full-system Emulator, proceedings of the 16 th ACM symposium on Principles and practice of parallel programming.
[11] Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, Yeh-Ching Chung, "PQEMU: A Parallel System Emulator Based on QEMU," icpads, pp.276-283, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011
[12] The LLVM Compiler Infrastructure, http://llvm.org/
[13] LLVM Language Reference Manual, http://llvm.org/docs/LangRef.html

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文