簡易檢索 / 詳目顯示

研究生: 吳東育
Wu, Tung-Yu
論文名稱: OpenCL Runtime Supports for Multi-core PAC DSPs
支援多核心PAC DSP的OpenCL 執行期函式庫
指導教授: 李政崑
Lee, Jenq-Kuen
口試委員: 許雅三
游逸平
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 32
中文關鍵詞: 開放運算語言多核心執行期函式庫
外文關鍵詞: PAC DSP
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • OpenCL是一個為了整合異質多核心編程的工業標準,在程式執行期間,OpenCL以work-group的形式管理程式的運算,每個work-group則是由work-item所組合而成的,每個work-item可以根據特定的globalID或localID在一個執行元素(processing element)上平行執行,而一個work-group則是在一個運算單元(compute unit)上執行,雖然OpenCL已經成功的被廣泛應用在CPU、GPU和GPGPU這些平台上,但是卻很少被使用在嵌入式多核心訊號處理器平台。這篇論文主要的貢獻是對於一個名叫PACDUO的嵌入式多核心訊號處理器平台,提出了OpenCL的執行期函式庫和一套編譯器的流程。PACDUO是由一顆MPU和兩顆five-way issue的VLIW DSP所組合而成的,每個DSP含有三個集群(cluster),其中一個是控制程式的流程,另外兩個則是專門處理運算,為了減少MPU和DSP之間的溝通成本和充分利用DSP的硬體資源,本論文將核心序列化(kernel serialization)和核心向量化(kernel vectorization)整合進編譯的流程中。在本篇論文的實驗,我們使用一系列的OpenCL程式來測試OpenCL執行期函式庫的可用性以及編譯器的最佳化,根據實驗結果,在兩顆DSP的加速下,我們可以得到1.99倍的程式效能加速。


    OpenCL is an industry open standard and an attempt to integrate heterogeneous multi-core programming. In order to unify parallel computing, OpenCL organizes computations into work-groups, and each group consists of work-items.
    A work-item executes independently on a processing element by its globalID and localID and a work-group executes on a compute unit. Although OpenCL framework implementations have been with early success on CPU, GPU and GPGPU, they are rarely implemented on embedded multi-core DSP systems.This paper presents an OpenCL runtime library and a compiler flow support for embedded multi-core DSP system. The target platform in this paper is a heterogeneous multi-core embedded system called PACDUO.
    The system consists of one MPU and 2 five-way issue VLIW DSPs. Each DSP includes three cluster where one for control flow and the other two for computation. In order to reduce context switch overhead and utilize the benefit of clusters, kernel serialization and kernel vectorization and integrated into the complier flow. In the experiment, this paper apply a set of OpenCL benchmark programs to evaluate the runtime library availability and compiler optimizations. Through the experimental result, this work reports near 2-fold multi-core performance speedup with two DSPs.

    Abstract i Contents ii List of Figures iv 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Background 4 2.1 OpenCL Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Target Multicore DSP Platforms . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 PACDUO Evaluation Boards . . . . . . . . . . . . . . . . . . 5 2.2.2 ESL Eirtual Platforms . . . . . . . . . . . . . . . . . . . . . . 7 3 OpenCL Runtime Supports for Multi-core PAC DSPs 9 3.1 OpenCL Platform Mapping . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 OpenCL Runtime Support . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.1 Platform APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.2 Runtime APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.3 Built-in Functions . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 OpenCL Compiler Support . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.1 Kernel Serialization . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.2 Kernel Vectorization . . . . . . . . . . . . . . . . . . . . . . . 23 4 Experiment 25 5 Conclusions 29 References 31

    [1] Khronos OpenCL Working Group, The OpenCL Specivication Version 1.1,Khronos OpenCL Working Group Std., Sep. 2010.
    [2] AMD, AMD developer central - OpenCL zone." [Online]. Available:http://developer.amd.com/zones/openclzone/Pages/default.aspx
    [3] Nvidia, NVIDIA developer zone - OpenCL." [Online]. Available: http://eveloper.nvidia.com/opencl
    [4] Intel, Intel SDK for OpenCL applications 2012 release." [Online]. Available:http://software.intel.com/en-us/articles/opencl-release-notes/
    [5] J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo,
    S. H. Lee, S. M. Cho, H. J. Song, S.-B. Suh, and J.-D. Choi, An OpenCL frame-
    work for heterogeneous multicores with local memory," in Proceedings of the 19th
    international conference on Parallel architectures and compilation techniques, ser.
    PACT '10. New York, NY, USA: ACM, 2010, pp. 193-204.
    [6] D. C.-W. Chang, PAC digital signal processor," in Proceedings of Fall Microprocessor Forum, 2006.
    [7] C. Kuan and J. Lee, Compiler supports for VLIW DSP processors with SIMD intrinsics," Concurrency and Computation: Practice and Experience, 2011.
    [8] Y.-C. Lin, C.-L. Tang, C.-J. Wu, M.-Y. Hung, Y.-P. You, Y.-C. Moo, S.-Y. Chen,and J. K. Lee, Compiler supports and optimizations for PAC VLIW DSP
    processors," in Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing, 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE