研究生: |
簡士桓 Chien, Shih-Huan |
---|---|
論文名稱: |
ViennaCL++: 針對機器學習運用之線性代數加速程式庫建立 OpenCL C++ 流程 ViennaCL++: Enabling OpenCL C++ Flow in Linear Algebraic Acceleration Library for Machine Learning |
指導教授: |
李政崑
Lee, Jenq-Kuen |
口試委員: |
蘇泓萌
Su, Hong-Men 陳鵬升 Chen, Peng-Sheng |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 英文 |
論文頁數: | 37 |
中文關鍵詞: | 異質運算 、OpenCL 、C++ 、SPIR-V 、Eigen 、ViennaCL 、TensorFlow |
外文關鍵詞: | Heterogeneous Computing, OpenCL, C++, SPIR-V, Eigen, ViennaCL, TensorFlow |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來各種運算裝置如 CPU、GPU、DSP、FPGA 與其他特有硬體加速器等上之異質運算因高效能運算需求增長而備受矚目。異質多核心系統之運用可對於計算密集之應用程式,如深度學習之訓練與推論,帶來相當助益。現今而言,深度學習應用程式之生態系統十分仰賴框架程式庫來處理各種深度學習或機器學習之底層實作。舉例而言,TensorFlow 即是可用於此類深度學習應用程式的機器學習框架程式庫,使用 Eigen ,一個 C++ 線性代數加速標頭程式庫,用於其主要運算核心。另一方面, OpenCL 是一個異質多核心系統上的平行運算開放標準,在最新的標準當中支援先進的 OpenCL C++ 核心語言。
在此論文中,透過新提出之 ViennaCL++ 線性代數加速程式庫 TensorFlow/Eigen 上以 OpenCL 與 OpenCL C++ 啟用異質運算之嶄新軟體流程被提出。 ViennaCL++ 是一個基於 ViennaCL 擴展的程式庫,一個支援 OpenCL 後端及與 Eigen 之資料介面的 C++ 線性代數加速程式庫。 ViennaCL++ 使用最先進的 OpenCL 2.1/2.2 標準以及 SPIR-V 流程與 OpenCL C++ 核心語言。另外,我們提出 ViennaCL++ 中之 OpenCL C++ 運算核心之強化,包括程式碼重構,函數模板,向量與矩陣類別模板,移動語意及共享虛擬記憶體。我們的實驗顯示 ViennaCL++ 中的 OpenCL C++ 核心與基線相較在一級 BLAS 測量中效能相近,在二級 BLAS 測量中增快 3 至 12 倍,在三級 BLAS 測量中增快 30 至 67 倍,證明我們的方案可以有效透過 OpenCL 加速 TensorFlow/Eigen 流程。
Heterogeneous computing on various computing devices including CPUs, GPUs, DSPs, FPGAs and other specialized hardware accelerators has received great attention in recent years due to the increasing demand on high-performance computing. Utilization of heterogeneous multi-core systems could be very beneficial for compute-intensive applications such as training and inferencing in deep learning. In modern days, the ecosystem of deep learning application relies heavily on framework libraries to handle the underlying implementation of various kinds of deep learning or machine learning computations. In this case, TensorFlow is one of such machine learning framework library which could be used for such deep learning applications using Eigen, a C++ header library for linear algebraic acceleration, for its core computational kernels. On the other hand, OpenCL is an open standard for parallel programming in heterogeneous multi-core systems which supports advanced OpenCL C++ kernel language in the latest specification.
In this thesis, a new software flow which enables heterogeneous computing through OpenCL and OpenCL C++ on TensorFlow/Eigen using the newly proposed ViennaCL++ linear algebraic library is presented. ViennaCL++ is an extended library based ViennaCL, a C++ linear algebraic acceleration library supporting OpenCL backend and data interfacing with Eigen. ViennaCL++ uses the state-of-the-art OpenCL 2.1/2.2 standard along with SPIR-V flow and OpenCL C++ kernel language. Moreover, we present the enhancement of OpenCL C++ computational kernels in ViennaCL++, including the code refactoring, the function templates, the vector/matrix class templates, the move semantics and shared virtual memory. Our evaluation shows that OpenCL C++ kernels in ViennaCL++ are similar in performance compared to the baseline for level 1 BLAS benchmarks, 3 to 12 times faster for level 2 BLAS benchmarks, operations while outperforms it for level 3 BLAS benchmarks with a speedup of 30 to 67 times, demonstrating that our scheme effectively enables acceleration on TensorFlow/Eigen through OpenCL.
[1] SPIR Overview, Khronos Group. [Online]. Available:
https://www.khronos.org/spir/
[2] The OpenCL Specification, version 1.2, Khronos
OpenCL Working Group, 2012. [Online]. Available:
http://www.khronos.org/registry/cl/spec/opencl-1.2.pdf
[3] The OpenCL Specification, version 2.0, Khronos
OpenCL Working Group, 2015. [Online]. Available:
https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf
[4] CUDA C Programming Guide, NVIDIA, 2016. [Online]. Available:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/
[5] The SYCL Specification, version 1.2, Khronos
OpenCL Working Group, 2015. [Online]. Available:
https://www.khronos.org/registry/SYCL/specs/sycl-1.2.pdf
[6] L. Dagum and R. Menon, “Openmp: an industry standard api for
shared-memory programming,” Computational Science & Engineering,
IEEE, vol. 5, no. 1, pp. 46–55, 1998.
[7] The OpenCL C++ Specification, version 1.0, Khronos
OpenCL Working Group, 2018. [Online]. Available:
https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL -
Cxx.pdf
[8] The OpenCL Specification, version 2.2, Khronos
OpenCL Working Group, 2018. [Online]. Available:
https://www.khronos.org/registry/OpenCL/specs/2.2/pdf/OpenCL -
API.pdf
[9] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro,
G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J.
Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. J´ozefowicz,
L. Kaiser, M. Kudlur, J. Levenberg, D. Man´e, R. Monga, S. Moore,
D. G. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,
K. Talwar, P. A. Tucker, V. Vanhoucke, V. Vasudevan, F. B. Vi´egas,
O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and
X. Zheng, “Tensorflow: Large-scale machine learning on heterogeneous
distributed systems,” CoRR, vol. abs/1603.04467, 2016. [Online].
Available: http://arxiv.org/abs/1603.04467
[10] B. Jacob and G. Guennebaud, Eigen. [Online]. Available:
http://eigen.tuxfamily.org/index.php
[11] K. Rupp, P. Tillet, F. Rudolf, J. Weinbub, A. Morhammer, T. Grasser,
A. Jngel, and S. Selberherr, “Viennacl—linear algebra library for
multi- and many-core architectures,” SIAM Journal on Scientific Computing, vol. 38, no. 5, pp. S412–S439, 2016. [Online]. Available:
https://doi.org/10.1137/15M1026419
[12] OpenCL C++ Compiler Reference Implementation, Khronos
OpenCL Working Group, 2018. [Online]. Available:
https://github.com/KhronosGroup/SPIR/tree/spirv-1.1
[13] OpenCL C++ Standard Library Reference Implementation,
Khronos OpenCL Working Group, 2018. [Online]. Available:
https://github.com/KhronosGroup/libclcxx
[14] Bazel, Google. [Online]. Available: https://bazel.build/