Investigation of Polyhedral Transformations on CPU and GPU｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	王鈺晟 Wang, Yu-Chen
論文名稱：	Investigation of Polyhedral Transformations on CPU and GPU 探討在CPU與GPU上之多面體轉換最佳化
指導教授：	李政崑 Lee, Jenq-Kuen
口試委員:	蘇泓萌陳呈瑋
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2013
畢業學年度：	101
語文別：	英文
論文頁數：	35
中文關鍵詞：	開放運算語言、多面體模型
外文關鍵詞：	Polyhedral Model, RenderScript
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

多面體模型是一個對迴圈優化和平行化功能強大的數學架構，此模型已經發展了幾十年。由於豐富的數學理論背景，多面體模型變得受歡迎。目前有許多基於多面體模型所開發的架構，這些架構可以與編譯器技術結合以對迴圈程式碼做轉換以及分析。在這篇論文中，我們的目標是探討多面體轉換如何適用於不同的異質多核心平台。為了發現更多多面體轉換潛在的適用性，我們選擇對於多面體模型較少相關研究的Android RenderScript和OpenCL作為我們的目標。
RenderScript是Android作業系統的一個部分，它提供了針對異質計算的低層級API，我們藉由整合LLVM Polly到RenderScript online JIT compiler (libbcc)以達到經由多面體轉換的內核層級優化。在實驗中，我們重新用RenserScript改寫了PolyBench並且比較優化前和優化後的效能差異。
OpenCL是另一個可以在異質平台上編寫程式的架構。在這篇論文中，我們針對OpenCL的內核函數來進行多面體轉換的實驗研究。在實驗中，我們應用PolyBench/GP來評估優化後的性能。
通過實驗結果，多面體轉型超乎我們所預期的可以廣泛地應用。

The polyhedral model which is a powerful mathematical framework for loop nested optimization and parallelization has been developed for decades. It becomes popular because of the abundant mathematical theory background. There are many frameworks developed based on the polyhedral model, and these frameworks could be combined with compiler techniques for transformations of loop nested program codes and analysis. In this thesis, we aim to investigate how polyhedral transformations could be applied on different heterogeneous multi-core platforms. To discover more potential applicability of polyhedral transformations, we choose RenderScript on Android platform and OpenCL which have less research about polyhedral model as our target.
RenderScript is a component of Android operating system, it provides low-level APIs for heterogeneous computing. We perform kernel level optimization with polyhedral transformation by integrating LLVM Polly into RenderScript online JIT compiler (libbcc). In the experiment, we re-program PolyBench benchmark in RenderScript and compare the performance differences after the optimizations on Android 4.1.1 Jelly Bean, average we could speed up 58% in execution time.
OpenCL is another framework for writing programs which could be executed on heterogeneous platforms. In this thesis, we have an experimental research about performing polyhedral transformation on OpenCL kernel function. In the experiment, we apply PolyBench/GPU benchmark to evaluate the performance after the optimization. In loop tiling, we get 58% improvement in average. In loop interchange, we get over 2 times speed up.
Through the experimental result, polyhedral transformation is more widely applicable than we expect.

Abstract i
Contents iii
List of Figures v
Introduction 1
1 Introduction .  . . . . . . . . . . . . . . . . . . . 1
2 Polyhedral Framework Overview . . . . . . . . . . . . 2
2.1 Program Analysis Phase . . . . . . .  . . . . . . . 2
2.2 Program Transformation Phase . . . . .  . . . . . . 4
2.3 Code Generation Phase . . . . . . . . . . . . . . . 5
3 Thesis Overview . . . . . . . . . . . . . . . . . . . 5
Kernel Level Optimization for RenderScript 7
1 RenderScript Overview . . . . . . . . . . . . . . . . 7
1.1 RenderScript Design Principles . . . . . .  . . . . 8
1.2 RenderScript Compilation Flow . . . . . . . . . . . 8
2 LLVM Polly Introduction . . . . . . . . . . . . . . . 10
3 Kernel Level Optimization for RenderScript. . . . . . 11
4 Experiment . . . .. . . . . . . . . . . . . . . . . . 14
4.1 Experimental Setup .  . . . . . . . . . . . . . . . 14
4.2 Experimental Result . . . . . . . . . . . . . . . . 15
Experimental Research of OpenCL Kernel Optimization 17
1 OpenCL Overview . . . . . . . . . . . . . . . . . . . 17
2 PoCC Introduction . . . . . . . . . . . . . . . . . . 20
3 Experimental Research of OpenCL Kernel Optimization . 21
3.1 Motivation  . . . . . . . . . . . . . . . . . . . . 21
3.2 Optimization Process  . . . . . . . . . . . . . . . 23
3.3 A Case Study  . . . . . . . . . . . . . . . . . . . 26
4 Experiment  . . . . . . . . . . . . . . . . . . . . . 27
4.1 Experimental Setup  . . . . . . . . . . . . . . . . 28
4.2 Experimental Result . . . . . . . . . . . . . . . . 29
Conclusion 32
References 33
                                

[1] M.-W. Benabderrahmane, L.-N. Pouchet, A.Cohen, and C. Bastoul, “The poly- hedral model is more widely applicable than you think,” in Compiler Construc-tion. Springer, 2010, pp. 283–303.

[2] C. Bastoul, “Improving data locality in static control programs,” Ph.D. disser-tation, University Paris 6, Pierre et Marie Curie, France, Dec. 2004.

[3] C. Chen, “Polyhedra scanning revisited,” in Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation. ACM, 2012, pp. 499–508.

[4] C. Bastoul, “Efficient code generation for automatic parallelization and optimiza- tion,” in ISPDC2 IEEE International Symposium on Parallel and Distributed Computing, 2003, pp. 23–30.

[5] S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos, “Auto- tuning a high-level language targeted to gpu codes,” in Innovative Parallel Com-puting (InPar), 2012. IEEE, 2012, pp. 1–10.

[6] C. A. Lattner, “Llvm: An infrastructure for multi-stage optimization,” Ph.D. dissertation, University of Illinois, 2002.

[7] T. Grosser, H. Zheng, R. Aloor, A. Simburger, A. Groblinger, and L.-N. Pouchet, “Polly-polyhedral optimization in llvm,” in Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT), vol. 2011, 2011.

[8] C. Lattner, “Llvm and clang: Next generation compiler technology,” in The BSD Conference, 2008, pp. 1–2.

[9] D. Khaldi, C. Ancourt, and F. Irigoin, “Towards automatic c programs optimiza- tion and parallelization using the pips-pocc integration,” PDF from http://www. rocq. inria. fr/˜ pouchet/software/pocc/doc/ht mldoc/htmldoc/index. html, 2011.

[10] G. Rudy, “Cuda-chill: A programming language interface for gpgpu optimiza- tions and code generation,” Ph.D. dissertation, The University of Utah, 2010.

[11] M. M. Baskaran, J. Ramanujam, and P. Sadayappan, “Automatic c-to-cuda code generation for affine programs,” in Compiler Construction. Springer, 2010, pp. 244–263.

[12] O. Kayiran, A. Jog, M. T. Kandemir, and C. R. Das, “Neither more nor less: Optimizing thread-level parallelism for gpgpus,” CSE Penn State Tech Report, TR-CSE-2012-006, 2012.

[13] J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee et al., “An opencl framework for heterogeneous multicores with local memory,” in Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 2010, pp. 193–204.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文