研究生: |
莫亞橋 Ya-Chiao Moo |
---|---|
論文名稱: |
支援平行核心數位訊號處理器之巢狀迴圈最佳化與跨程序分析 Loop Nested Optimizations and Interprocedural Analysis for PAC DSP |
指導教授: |
李政崑
Jenq-Kuen Lee |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 英文 |
論文頁數: | 55 |
中文關鍵詞: | 中繼表示法 、巢狀迴圈最佳化 、跨程序分析 |
外文關鍵詞: | ORC, Intermediate Representation, Loop Nested Optimization, Interprocedural Analysis |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
ORC (Open Research Compiler)編譯器提供了一個強大而完整的兩階段最佳化架構,可以在編譯時期對程式最佳化。
其一是低階最佳化,施行於與組合語言一一對應的CGIR (Intermediate Representation for Code Generation)
來作為中繼表示(Intermediate Representation)。另一個是程式碼階段最佳化,
實行於WHIRL (Winning Hierarchical Intermediate Representation)。
本篇論文會對ORC做一個概略的敘述,介紹編譯器結構和實際ORC的建構方式;
並描述每個階段最佳畫的工作方式,包含了中繼表示的轉換和編譯流程。
我們的研究主要在兩種WHIRL階段的最佳化上,巢狀迴圈最佳化和跨程序分析。
分別說明在ORC上巢狀迴圈最佳化和跨程序分析的理論和演算法。
另外,將這兩種最佳化移植到平行核心數位訊號處理器(PAC DSP)上。
詳細說明移植的過程和困難點。最後在現有的平台上測試最佳化的效果和對一些有趣的結果做一些討論。
To optimize computer program in compiling time, the Open Research Compiler (ORC) provides a well integrated optimization framework with two phases of optimizations. One is the machine level applied on Intermediate Representation for Code Generation (CGIR), which is a machine level of Intermediate Representation (IR) with one to one mapping instructions. The other is the source level phase applied on Winning Hierarchical Intermediate Representation Language (WHIRL). This thesis gives an overview of ORC, introduces the compiler architecture that implemented in ORC and the evolution of ORC, and illustrates how these compiler phases work including the IR flow and the compilation process.
The research is focused on two optimizations in the WHIRL level, Loop Nested Optimization (LNO) and Interprocedural Optimization (IPA). The theoretical concepts, algorithms and the way practiced in ORC are individually illustrated. Besides, LNO and IPA are ported to a new specific architecture, Parallel Architecture Core (PAC) Digital Signal Processor (DSP). The solved porting process and issues are presented in detail. The experiments of LNO and IPA are conducted on the proposed platform to show interesting results and to demonstrate performance improvements.
[1] D. Chang and M. Baron, “Taiwan’s roadmap to leadership in design,”
http://www.mdronline.com/mpr/archive/mpr 2004.html,” Microprocessor Re-
port, In-Stat/MDR, December 2004.
[2] D. C.-W. Chang, C.-W. Jen, I.-T. Liao, J.-K. Lee, W.-F. Chen, and S.-Y. Tseng,
“Pac dsp core and application processors,” in IEEE International Conference on
Multimedia and Expo (ICME), July 2006.
[3] Y.-C. Lin, C.-L. Tang, C.-J.Wu, M.-Y. Hung, Y.-P. You, Y.-C. Moo, S.-Y. Chen,
and J. K. Lee, “Compiler supports and optimizations for pac vliw dsp proces-
sors,” in Proc. of Languages and Compilers for Parallel Computing (LCPC)
2005, October 2005.
[4] Y. C. Lin, Y. P. You, and J. K. Lee, “Register allocation for vliw dsp processors
with irregular register files,” in Proc. of Compilers for Parallel Computers (CPC)
2006, January 2006.
[5] C. Wu, K.-Y. Hsieh, Y.-C. Lin, C.-J. Wu, W. li Shih, S. C. Chen, C.-K. Chen,
C.-C. Huang, Y.-P. You, and J.-K. Lee, “Integrating compiler and system toolkit
flow for embedded vliw dsp processors,” in RTCSA06, August 2006.
[6] S. Pop, Sebastian Pop at INRIA. http://www-rocq.inria.fr/ pop/, September
5 2002.
52
BIBLIOGRAPHY 53
[7] ——, Interface and Extension of the Open Research Compiler. http://www-
rocq.inria.fr/ pop/rapport/rapport-open64.ps, September 2002.
[8] E. Ayguade, X. Martorell, J. Labarta, M. Gonzalez, and N. Navarro, “Exploit-
ing multiple levels of parallelism in openmp: A case study,” in International
Conference on Parallel Processing, 1999, pp. 172–180.
[9] Sourceforge, Open64 Compiler Tools. http://open64.sourceforge.net/.
[10] U. of Houston Computer Science Department High Performance Computing
Tools Group, “Overview of the open64 compiler infrastructure,” November 12
2002.
[11] Sourceforge, Open Research Compiler for Itanium Processor Family. http://ipf-
orc.sourceforge.net.
[12] sgi, Silicon Graphics Ind. http://www.sgi.com/.
[13] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, “Ef-
ficiently computing static single assignment form and the control dependence
graph,” in ACM Transactions on Programming Languages and Systems, vol.
13(4), October 1990, pp. 451–490.
[14] F. Chow, S. Chan, R. Kennedy, S. M. Liu, R. Lo, and P. Tu, “A new algorithm
for partial redundancy elimination based on ssa form,” in Proc. of Special Interest
Group on Programming Languages (SIGPLAN) 97, May 1997, pp. 273–286.
[15] S. Townsend, “Itanium processor family performance advantages: Register stack
architecture,” http://www.intel.com/cd/ids/developer/asmo-na/eng/20314.htm.
BIBLIOGRAPHY 54
[16] J. Mellor and his team at Rice University, Open64 Compiler Whirl Intermediate
Representation. http://open64.sourceforge.net/documentation.html, November
29 2002.
[17] J. Mellor-Crummey, D. Whalley, and K. Kennedy, “Improving memory hierar-
chy performance for irregular applications,” in Proc. of the 13th international
conference on Supercomputing, Rhodes, Greece, June 1999.
[18] S.Carr, K.S.Kinley, and C.W.Tseng, “Improving data locality with loop trans-
formations,” in ACM Transactions on Programming Languages and Systems,
1996.
[19] U. Banerjee, Loop Transformations for Restructuring Compilers: The Founda-
tions. Kluwer Academic Publishers, 1993.
[20] ——, “Unimodular transformations of double loops,” in In Proc. of the 3rd
Workshop on Programming Languages and Compilers for Parallel Computing,
pp. 192–219.
[21] R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures.
MORGAN KAUFMANN, 2002.
[22] M. Wolfe and M. Lam, “A data locality optimizing algorithm,” in in Proceed-
ings of the SIGPLAN ’91 Conference on Programming Language Design and
Implementation, pp. 30–44.
[23] M. Griebl and C. Lengauer, “The loop parallelizer loopo - announcement,” in
Languages and Compilers for Parallel Computing, Lecture Notes in Computer
Science 1239, pp. 603–604.
BIBLIOGRAPHY 55
[24] J. R. Allen and K. Kennedy, “Automatic translation of fortran programs to
vector form,” ACM Transactions on Programming Languages and Systems, vol.
9(4), pp. 491–542.
[25] M. E. Wolf, D. E. Maydan, and D.-K. Chen, “Combining loop transformations
considering caches and scheduling,” International Journal of Parallel Program-
ming, vol. 26(4), 1998.
[26] A. Qasem and K. Kennedy, “A cache-conscious profitability model for empirical
tuning of loop fusion,” in Proceedings of LCPC 2005 Springer-Verlag, Lecture
Notes in Computer Science, 2005.
[27] K. Kennedy and K. S. McKinley, “Optimizing for parallelism and data locality,”
in Proceedings of the 1992 ACM International Conference on Supercomputing,
1992.
[28] M. E. Wolfe and M. S. Lam, “A loop transformation theory and an algorithm to
maximize parallelism,” IEEE Transactions on Parallel and Distributed Systems,
vol. 2(4), October 1991.
[29] R. Sass and M. Mutka, “Enabling unimodular transformations,” Department
of Computer Science, Michigan State University, Tech. Rep. CPS-94-20, April
1994.
[30] R. D. Ju and L. Liu, “Open research compiler (orc) for general-purpose and
soc multi-core processors,” Tutorial Program, 2005 International Symposium on
VLSI Technology, System and Applications, Tech. Rep., 2005.
[31] V. Zivojnovic, J. Martinez, C. Schlager, and H. Meyr, “Dspstone: A dsp-oriented
benchmarking methodology,” in Proceedings of the International Conference on
Signal Processing and Technology (ICSPAT’ 94), October 1994, pp. 715–720.