研究生: |
王鈺晟 Wang, Yu-Chen |
---|---|
論文名稱: |
Investigation of Polyhedral Transformations on CPU and GPU 探討在CPU與GPU上之多面體轉換最佳化 |
指導教授: |
李政崑
Lee, Jenq-Kuen |
口試委員: |
蘇泓萌
陳呈瑋 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 英文 |
論文頁數: | 35 |
中文關鍵詞: | 開放運算語言 、多面體模型 |
外文關鍵詞: | Polyhedral Model, RenderScript |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多面體模型是一個對迴圈優化和平行化功能強大的數學架構,此模型已經發展了幾十年。由於豐富的數學理論背景,多面體模型變得受歡迎。目前有許多基於多面體模型所開發的架構,這些架構可以與編譯器技術結合以對迴圈程式碼做轉換以及分析。在這篇論文中,我們的目標是探討多面體轉換如何適用於不同的異質多核心平台。為了發現更多多面體轉換潛在的適用性,我們選擇對於多面體模型較少相關研究的Android RenderScript和OpenCL作為我們的目標。
RenderScript是Android作業系統的一個部分,它提供了針對異質計算的低層級API,我們藉由整合LLVM Polly到RenderScript online JIT compiler (libbcc)以達到經由多面體轉換的內核層級優化。在實驗中,我們重新用RenserScript改寫了PolyBench並且比較優化前和優化後的效能差異。
OpenCL是另一個可以在異質平台上編寫程式的架構。在這篇論文中,我們針對OpenCL的內核函數來進行多面體轉換的實驗研究。在實驗中,我們應用PolyBench/GP來評估優化後的性能。
通過實驗結果,多面體轉型超乎我們所預期的可以廣泛地應用。
The polyhedral model which is a powerful mathematical framework for loop nested optimization and parallelization has been developed for decades. It becomes popular because of the abundant mathematical theory background. There are many frameworks developed based on the polyhedral model, and these frameworks could be combined with compiler techniques for transformations of loop nested program codes and analysis. In this thesis, we aim to investigate how polyhedral transformations could be applied on different heterogeneous multi-core platforms. To discover more potential applicability of polyhedral transformations, we choose RenderScript on Android platform and OpenCL which have less research about polyhedral model as our target.
RenderScript is a component of Android operating system, it provides low-level APIs for heterogeneous computing. We perform kernel level optimization with polyhedral transformation by integrating LLVM Polly into RenderScript online JIT compiler (libbcc). In the experiment, we re-program PolyBench benchmark in RenderScript and compare the performance differences after the optimizations on Android 4.1.1 Jelly Bean, average we could speed up 58% in execution time.
OpenCL is another framework for writing programs which could be executed on heterogeneous platforms. In this thesis, we have an experimental research about performing polyhedral transformation on OpenCL kernel function. In the experiment, we apply PolyBench/GPU benchmark to evaluate the performance after the optimization. In loop tiling, we get 58% improvement in average. In loop interchange, we get over 2 times speed up.
Through the experimental result, polyhedral transformation is more widely applicable than we expect.
[1] M.-W. Benabderrahmane, L.-N. Pouchet, A.Cohen, and C. Bastoul, “The poly- hedral model is more widely applicable than you think,” in Compiler Construc-tion. Springer, 2010, pp. 283–303.
[2] C. Bastoul, “Improving data locality in static control programs,” Ph.D. disser-tation, University Paris 6, Pierre et Marie Curie, France, Dec. 2004.
[3] C. Chen, “Polyhedra scanning revisited,” in Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation. ACM, 2012, pp. 499–508.
[4] C. Bastoul, “Efficient code generation for automatic parallelization and optimiza- tion,” in ISPDC2 IEEE International Symposium on Parallel and Distributed Computing, 2003, pp. 23–30.
[5] S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, “Auto- tuning a high-level language targeted to gpu codes,” in Innovative Parallel Com-puting (InPar), 2012. IEEE, 2012, pp. 1–10.
[6] C. A. Lattner, “Llvm: An infrastructure for multi-stage optimization,” Ph.D. dissertation, University of Illinois, 2002.
[7] T. Grosser, H. Zheng, R. Aloor, A. Simburger, A. Groblinger, and L.-N. Pouchet, “Polly-polyhedral optimization in llvm,” in Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), vol. 2011, 2011.
[8] C. Lattner, “Llvm and clang: Next generation compiler technology,” in The BSD Conference, 2008, pp. 1–2.
[9] D. Khaldi, C. Ancourt, and F. Irigoin, “Towards automatic c programs optimiza- tion and parallelization using the pips-pocc integration,” PDF from http://www. rocq. inria. fr/˜ pouchet/software/pocc/doc/ht mldoc/htmldoc/index. html, 2011.
[10] G. Rudy, “Cuda-chill: A programming language interface for gpgpu optimiza- tions and code generation,” Ph.D. dissertation, The University of Utah, 2010.
[11] M. M. Baskaran, J. Ramanujam, and P. Sadayappan, “Automatic c-to-cuda code generation for affine programs,” in Compiler Construction. Springer, 2010, pp. 244–263.
[12] O. Kayiran, A. Jog, M. T. Kandemir, and C. R. Das, “Neither more nor less: Optimizing thread-level parallelism for gpgpus,” CSE Penn State Tech Report, TR-CSE-2012-006, 2012.
[13] J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee et al., “An opencl framework for heterogeneous multicores with local memory,” in Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 2010, pp. 193–204.