研究生: |
高崇閔 Kao, Chung-Min |
---|---|
論文名稱: |
基於LLVM技術開發之異質核心模擬器中GPU編譯器 : HTranslator The LLVM based GPU Compiler in Heterogeneous System Architecture Emulator: HTranslator |
指導教授: | 鍾葉青 |
口試委員: |
徐慰中
洪士灝 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 31 |
中文關鍵詞: | 異質架構系統 、SIMD 、GPU 、模擬器 、編譯器 |
外文關鍵詞: | heterogeneous system architecture, SIMD, GPU, emulator, compiler |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
異質系統架構(HSA)是由HSA基金會制定之工業標準,許多重要的應用處理器廠商皆為此基金會的成員,如:超微、安謀、聯發科技、三星以及高通,本論文將基於根據此標準開發之模擬器,闡述模擬器中GPU部分之編譯器設計及實作,並且產生Single Instruction Multiple Data(SIMD)指令進行優化。
模擬異質系統架構GPU執行的過程中,CPU相較於實體GPU在執行緒數目上顯得相當缺乏,倘若每次GPU的執行都交由一個CPU的執行緒執行,每個執行緒都將被分配到多個原先實體GPU的工作並依序執行之,在大部分的情況下,GPU皆是在不同的資料上執行相同的指令,在這種情況下加入SIMD指令,便可藉由硬體的幫助在一個SIMD指令內同時處理數筆資料,讓一個執行緒完成原先需要數個執行緒才能完成的工作,進而提升模擬器執行效率並更貼近GPU實際運作。
在條件跳躍指令存在的情況下,不同的GPU其跳躍目的位址可能不同,進而無法直接使用SIMD的指令進行模擬,因此,編譯器產生機器碼之前須重新建構程式執行流程,確保任一目的位址所指向區塊中所有指令都將被執行,同時為了確保執行結果的正確性,使用bitmap紀錄各GPU條件跳躍的結果,條件跳躍發生的同時,會將各GPU是否跳躍寫入bitmap中,對於那些GPU不該執行此目的位址指令的部分,則利用此bitmap遮蔽其執行結果。
Heterogeneous System Architecture (HSA) is an open industry standard formulated by HSA foundation. Many Application processor vendors, such as AMD, ARM, Me-dia Tek, Samsung, qualcomm are member of it. This thesis will focus on emulator base on this standard, and descript GPU compiler design. In additional, add Single Instruction Multiple Data (SIMD) instruction to speed up emulator’s execution.
In the procedure of simulation GPU’s execution with the heterogeneous system ar-chitecture, the number of threads in CPU is far less than in physical GPU. If emulator assigns each physical GPU’s task to a CPU thread, each thread will receive more than one task and iterate complete them. In most situations, physical GPUs are executing same instructions to deal with different data. In these cases, it can add SIMD instruc-tion to speed up the execution. With the help of hardware, emulator can handle dif-ferent data at the same time in the clocks of a SIMD instruction and make a thread completes tasks assigned few threads before. Then improve emulator’s performance and this way is much closer physical GPU’s execution.
When conditional branch instructions exist, different GPU may jump to different target address and can’t be simulated by SIMD instructions straightly. To resolve this case, before compiler generates target code, it should reconstruct the control flow of program to make sure each instruction in blocks pointed by target address will be ex-ecuted. To avoid adding SIMD instruction in emulator and reconstructing control flow can still get correct result, it’s necessary to use a bitmap to record conditional jump’s result of GPUs. When compiler finds conditional jump instructions, it writes result of GPUs into bitmap. For these GPUs should not execution instructions in the block, emulator using bitmap to mask the result.
[1] HSA_PRM_Proposed_Version_1.2_27_August_2012
[2] HSA_Software_System_Architecture_Specification_Version_1.1_27_July_2012
[3] HSA_Hardware_System_Architecture_Specification_Version_1.1_27_July_2012
[4] OpenCL http://www.khronos.org/opencl/
[5] Zhou-Dong Guo; Yeh-Ching Chung, HSA emulator design based on QEMU
[6] Rafael Ubal; Byunghyun Jang; Perhaad Mistry; Dana Schaa; David Kaeli, Multi2Sim: A Simulation Framework for CPU-GPU Computing, Computer Ar-chitecture and High Performance Computing, 2007. SBAC-PAD 2007. 19th In-ternational Symposium on
[7] Karrenberg, R.; Hack, S. Whole-Function Vectorization Code Generation and Optimization (CGO), 2011 9th Annual IEEE/ACM International Symposium on
[8] R. Karrenberg; S. Hank, Improving Performance of OpenCL on CPUs, Compiler Construction, 2012
[9] Chris Lattner and Vikram Adve, LLVM: "A Compilation Framework for Lifelong Program Analysis & Transformation", Proceedings of the 2004 Interna-tional Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar. 2004.
[10] Zhaoguo Wang; Ran Liu; Yufei Chen; Xi Wu; Haibo Chen; Weihua Zhang; Binyu Zang, COREMU: A Scalable and Portable Parallel Full-system Emulator, proceedings of the 16 th ACM symposium on Principles and practice of parallel programming.
[11] Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, Yeh-Ching Chung, "PQEMU: A Parallel System Emulator Based on QEMU," icpads, pp.276-283, 2011 IEEE 17th International Conference on Parallel and Distributed Systems, 2011
[12] The LLVM Compiler Infrastructure, http://llvm.org/
[13] LLVM Language Reference Manual, http://llvm.org/docs/LangRef.html