在Open64編譯器中以記錄為基礎的程式碼版面最佳化的追蹤選擇方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	袁昊
論文名稱：	在Open64編譯器中以記錄為基礎的程式碼版面最佳化的追蹤選擇方法 Profile Based Trace Selection of Code Layout Optimizations in Open64 Compiler
指導教授：	鍾葉青
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2008
畢業學年度：	96
語文別：	英文
論文頁數：	36
中文關鍵詞：	程式碼版面、編譯器最佳化、追蹤、快取記憶體、程式碼區域性
外文關鍵詞：	code layout, compiler optimization, profile, cache, instruction locality
相關次數：	點閱：71 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

由於處理器的成長速度與記憶體的成長速度有所差異造成核心處理器的速度遠大於記憶體存取速度導致程式執行時間主要被記憶體速度所侷限。此現象稱為Memory Wall且會導致耗電量的大量消耗與效能的下降。對於近代的程式快取記憶體變得更加重要來彌補此差距。然而，大型的應用程式有許多指令而快取記憶體可能無法容納所有的指令，這將會使快取記憶體存取失誤導致不只效能下降還有電源消耗。為了避免過度的存取失誤率，我們使用一種編譯器最佳化的增進程式碼版面的技巧來改善效能而非使用較大容量的快取記憶體或是較複雜的快取記憶體控制器。我們提出了一種改善追蹤選擇的演算法來依序程式的行為模式選擇較精確的追蹤。追蹤選擇最佳化主要是為了增進程式碼的區域性使得最近要執行的程式碼可以儘量把存在快取記憶體中以減少核心處理器浪費執行時間再去主要記憶體取資料。為了達到程式的可移植性，我們的演算法為機器非相依性編譯器最佳化的一部分。為了評估我們的演算法，我們把我們的追蹤選擇演算法整合到Open64編譯器中後端的程式碼產生最佳化中。在我們的實驗中，我們使用一台擁有兩個四核心的Intel(R) Xeon(R)機器並且實驗結果顯示出我們的演算法在SPEC CINT2000的效能評測軟體上可以改善最好可達到19%平均也有7%的效能。

Chapter 1 Introduction    1
Chapter 2 Related Work    4
Chapter 3 Preliminaries    7
3.1 Profile and Feedback of Two-Pass Compilation    7
3.2 Overview of Code Layout    9
3.3 Description of Basic Blocks Reordering    13
3.4 Advantages and Disadvantages of Top-down and Bottom-up Algorithms    16
Chapter 4 Improved Trace Selection    18
4.1 Improved Trace Selection algorithm    18
4.2 An Example for Improved Trace Selection algorithm    21
Chapter 5 Experimental Results    24
5.1 Experimental Environments    24
5.1.1 Open64 Compiler    24
5.1.2 SPEC CINT2000 Benchmark    26
5.1.3 Hardware Infrastructure    27
5.2 Results    28
Chapter 6 Conclusions    34
References    35

                                

[1] GNU Compiler Collection (GCC), http://gcc.gnu.org/
[2] Intel(R) Xeon(R) CPU, http://www.intel.com/products/processor_number/chart/xeon.htm
[3] Open64, http://open64.sourceforge.net/
[4] SPEC cpu2000, http://www.spec.org/cpu2000/
[5] WHIRL Intermediate Language Specification, whirl.pdf. http://open64.sourceforge.net
[6] B. Calder, D. Brunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn, “Evidence-based Static Branch Prediction Using Machine Learning,” ACM Transactions on Programming Languages and Systems, 1997.
[7] J. Cavazos, J. and E. B. Moss, “Inducing Heuristics To Decide Whether To Schedule,” Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation (PLDI), June, 2004.
[8] Chiou, D. Jain, et al, “Application-Specific Memory Management for Embedded Systems,” Design Automation Conference, 2000.
[9] J. A. Fisher, “Trace Scheduling: A Technique for Global Microcode Compaction,” IEEE Transactions on Computers, vol. c-30, no.7, July, 1981.
[10] N. Gloy, T. Blackwell, M. D. Smith, and B. Calder, “Procedure placement using temporal ordering information,” Proceedings of the 30th Annual ACM/IEEE Intl. SymposiumonMicroarchitecture, pages 303– 313, Dec. 1997.
[11] A. H. Hashemi, J. Kalamatianos, B. Calder, D. Kaeli, A. Khalafi, and W. Meleis, “Cache Line Coloring Using Real And Estimated Profiles”.
[12] W. Hwu, and P. Chang, “Achieving High Instruction Cache Performance With An Optimizing Compiler,” Proc. of the 16th Int. Symp. on Computer Architecture, 1989.
[13] X. Liu, Y. Yang, J. Zhang, and X. Cheng, “A Basic Block Reordering Algorithm Based On Structural analysis,” Technical Report, 2006.
[14] S. McFarling, “Program Optimization for Instruction Caches,” In ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 183-191, 1989.
[15] D. A. Patterson, and J. L. Hennessy, “Computer Architecture: A Quantitative Approach (2nd Edition),” Morgan Kaufmann, San Mateo, CA, 1996.
[16] K. Pettis, and R. Hansen, “Profile Guided Code Positioning,” Proc. of the ACM SIGPLAN ‘90 Conf. on Programming Language Design and Implementation, 1990.
[17] K. Sanghai, and D. Kaeli, “A Code Layout Framework for Embedded Processors with Configurable Memory Hierarchy,” 5th Workshop on Optimizations for DSP and Embedded Systemsm, 2007.
[18] A. J. Smith. “Cache memories.” ACM Computing Surveys, 41(3):473-530, 1982.
[19] M. Stephenson, S. Amarasinghe, M. Martin, and U. M. O’Reilly, “Meta Optimization: Improving Compiler Heuristics with Machine Learning,” Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI), June, 2003.
[20] M. V. Wilkes. “Slave memories and dynamic storage allocation.” In IEEE Trans. on Electronic Computers, pages 270-271.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文