改良資料傳遞技術在非對稱暫存器的嵌入式VLIW數位信號處理器

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳聖元 Sheng-Yuan Chen
論文名稱：	改良資料傳遞技術在非對稱暫存器的嵌入式VLIW數位信號處理器 Enhanced Copy Propagations for Embedded VLIW DSP Processors with Irregular Register Files
指導教授：	李政崑 Jenq-Kuen Lee
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2006
畢業學年度：	94
語文別：	英文
論文頁數：	40
中文關鍵詞：	嵌入式系統、數位訊號處理器、分散式暫存器、叢集式架構、資料傳遞
外文關鍵詞：	VLIW, DSP, embedded device, cluster-based architecture, irregular register files, copy propagation
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著影像和多媒體技術的蓬勃發展，應用在嵌入式系統上的高效能低耗能數位訊號
處理器的地位便水漲船高。對嵌入式系統設計者而言，減少電力消耗和減低設計複
雜度是一項重要的工作。因此現在數位訊號處理器的趨勢便朝向以分散式暫存器和
叢集式架構為主的設計，以減少暫存器間的讀取和寫入通道。這種新的設計趨勢便
為編譯器最佳化技術帶來了新的挑戰。在這篇論文中，我提出了一套針對現代數位
訊號處理器設計架構而改良的資料傳遞技術，實驗證明這套技術確實能避免傳統資
料傳遞時可能造成的效能浪費。
我提出了一套考量數位訊號處理器硬體架構的資料流分析方法，此方法用來估
計資料在暫存器之間流動所需的額外花費，並找出一條最短的傳遞路徑，以避免資
料在此架構上流動而造成的效能消耗。由於估算資料流動的模型與硬體架構有密切
的關係，因此我提出三種估算模型來描述資料在此硬體架構上的流動花費：Inter
Cluster、Intra Cluster Communication Cost及Ping-Pong Register Constraint Cost。另外，
我也利用這三個模型提出了一套尋找最佳傳輸路徑的演算法，並將此演算法結合到
PAC ORC編譯器系統中以證明此方法的有效性，我的實驗平台為以PAC 2.0架構為
平台的模擬器ISS，實驗結果顯示我們的方法相對於傳統的資料傳遞技術，對浮點數
運算的DSP STONE測試程式可減少平均約13% 的效能浪費。

High-performance and low-power VLIW DSP processors are increasingly
deployed on embedded devices to process video and multimedia
applications. For reducing power and cost in designs of VLIW DSP
processors, distributed register files and multi-bank register
architectures are being adopted to reduce the amount of read/write
ports in register files. This presents new challenges for devising
compiler optimization schemes for such architectures. In our
research work, we address the compiler optimization issues for PAC
architecture, which is a 5-way issue DSP processor with distributed
register files. We show how to support an important class of
compiler optimization problems, known as copy propagations, for such
architectures. We illustrate that a naive deployment of copy
propagations in embedded VLIW DSP processors with distributed files
might result in performance anomaly. In our proposed scheme, we
derive communication cost models by cluster distance, register port
pressures, and the movement type of register sets. The cost models
are used to guide the data flow analysis for supporting copy
propagations over PAC architectures. Experimental results show that
our schemes are effective to prevent performance anomaly with copy
propagations over embedded VLIW DSP processors with distributed
files.

Acknowledgements i
Abstract ii
Contents iii
List of Figures v
List of Tables vi
Introduction 1
1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Background 4
1 PAC DSP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Infrastructure Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Compiler Infrastructure . . . . . . . . . . . . . . . . . . . . . 7
2.2 Code Expansion and Clustered Partitioning for PAC DSP . . 8
2.3 Copy Propagation in EBO Phase . . . . . . . . . . . . . . . . 9
Enhanced Data Flow Analysis 11
1 Problem Speci‾cations . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 An EDFA Algorithm and Cost Models . . . . . . . . . . . . . . . . . 17
3 An Advanced Estimation Algorithm . . . . . . . . . . . . . . . . . . . 23
4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Experiments and Discussions 29
1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Related Work and Discussions . . . . . . . . . . . . . . . . . . . . . . 31
Conclusion 33
1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Bibliography 36

                                

[1] David Chang and Max Baron: Taiwan's Roadmap to Leader-
ship in Design. Microprocessor Report, In-Stat/MDR, Dec. 2004.
http://www.mdronline.com/mpr/archive/mpr 2004.html.
[2] M. S. Hecht. Flow Analysis of Computer Programs. Elsevier, Ams-
terdam, 1977.
[3] K. W. Kennedy. A survey of data °ow analysis techniques. In S. S.
Muchnick and N. D. Jones, editors, Program Flow Analysis: Theory
and Applications, chapter 1, pages 5-54. Prentice-Hall, 1981.
[4] C. M. Overstreet, R. Cherinka, M. Tohki, and R. Sparks. Support
of software maintenance using data °ow analysis. Technical Report
TR-94-07, Old Dominion University, Computer Science Department,
June 1994.
[5] C. M. Overstreet, R. Cherinka, and R. Sparks. Using bidirectional
data °ow analysis to support software reuse. Technical Report TR-94-
09, Old Dominion University, Computer Science Department, June
1994.
[6] Alfred V. Aho, Ravi Sethi, and Je®rey D. Ullman. Compilers: Prin-
ciples, Techniques and Tools. Addison-Wesley. November, 1985.
[7] Peter Vanbroekhoven, Gerda Janssens, Maurice Bruynooghe, Henk
Corporaal, and Francky Catthoor. Advanced Copy Propagation for
Arrays. In Proceedings of the 2003 ACM SIGPLAN conference on
Language, compiler, and tool for embedded systems, San Diego, Cal-
ifornia, USA.
[8] P. Feautrier. Data°ow Analysis of Array and Scalar References. In
International Journal of Parallel Programming, 20(1):23{53, 1991.
[9] B. Kienhuis. Matparser: An Array Data°ow Analysis Compiler.
Technical report, University of California, Berkeley, February 2000.
[10] L. Moonen. A Generic Architecture for Data Flow Analysis to Sup-
port Reverse Engineering. In Proceedings of the 2nd International
Workshop on the Theory and Practice of Algebraic Speci‾cations,
September 1997.
[11] Mark N. Wegman, and F. Kenneth Zadeck. Constant Propagation
with Conditional Branches. In ACM Transactions on Programming
Languages and Systems, April, 1991.
[12] Cheng-Wei Chen, Yung-Chia Lin, Chung-Ling Tang, and Jenq-Kuen
Lee. ORC2DSP: Compiler Infrastructure Supports for VLIW DSP
Processors. IEEE VLSI TSA, April 27-29, 2005.
[13] Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu. Using
Pro‾le Information to Assist Classic Code Optimizations. Software
{ Practice and Experience.
[14] Jaejin Lee, David A. Padua, and Samuel P. Midki®. Basic Com-
piler Algorithms for Parallel Programs. In Proceedings of the seventh
ACM SIGPLAN symposium on Principles and Practice of Parallel
Programming, 1999.
[15] George Karypis and Vipin Kumar. A fast and highly quality mul-
tilevel scheme for partitioning irregular graphs. SIAM J. Scienti‾c
Computing, 20(1): 359V392, 1999.
[16] Lal George, and Andrew W. Appel. Iterated Register Coalescing. In
ACM Transactions on Programming Languages and Systems, May,
1996.
[17] V. Zivojnovic, J. Martinez, C. Schlager, and H. Meyr. DSPstone:
A DSP-oriented benchmarking methodology. In Proceedings of the
International Conference on Signal Processing and Technology (IC-
SPAT94), pages 715V720, October 1994.
[18] T.-J. Lin, C.-C. Chang. C.-C. Lee, and C.-W. Jen An E±cient VLIW
DSP Architecture for Baseband Processing. In Proceedings of the 21th
International Conference on Computer Design, 2003.
[19] Tay-Jyi Lin, Chie-Min Chao, Chia-Hsien Liu, Pi-Chen Hsiao, Shin-
Kai Chen, Li-Chun Lin, Chih-Wei Liu, Chein-Wei Jen Computer ar-
chitecture: A uni‾ed processor architecture for RISC & VLIW DSP.
In Proceedings of the 15th ACM Great Lakes symposium on VLSI,
April 2005.
[20] S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J.
D. Owens Register organization for media processing. International
Symposium on High Performance Computer Architecture (HPCA),
pp.375-386, 2000.
BIBLIOGRAPHY 39
[21] SGI - Developer Central Open Source - Pro64
http://oss.sgi.com/projects/Pro64/.
[22] Yung-Chia Lin, Yi-Ping You, Jenq-Kuen Lee. Register Allocation
for VLIW DSP Processors with Irregular Register Files. In 16th
Workshop on Compilers for Parallel Computers, Jan 2006.
[23] S. Muchnick. Advanced compiler design and implementation. Morgan
Kaufmann Publishers, San Francisco, CA, 1997.
[24] R. Leupers. Instruction scheduling for clustered VLIW DSPs. In
Proc. Intl Conference on Parallel Architecture and Compilation Tech-
niques, pages 291V300, Oct. 2000.
[25] Gwan-Hwan Hwang, Jenq-Kuen Lee and Roy Dz-Ching Ju. A
Function-Composition Approach to Synthesize Fortran 90 Array Op-
erations. In Journal of Parallel and Distributed Computing 54, 1-47,
1998.
[26] Gwan-Hwan Hwang, Jenq-Kuen Lee. Array Operation Synthesis to
Optimize HPF Programs on Distributed Memory Machines. In Jour-
nal of Parallel and Distributed Computing 61, 467-500, 2001.
[27] Yung-Chia Lin, Chung-Lin Tang, Chung-Ju Wu, Jenq-Kuen Lee.
Compiler Supports and Optimizations for PAC VLIW DSP Proces-
sors. In Languages and Compilers for Parallel Computing, 2005.
[28] Yung-Chia Lin, Yi-Ping You, Chung-Wen Huang, Jenq-Kuen Lee,
Wek-Kuan Shih, Ting-Ting Huang. Energy-Aware Scheduling for
Multiple-Voltage-Domain Security Processors. In ACM Transactions
on Design Automation of Electronic Systems, 2006.
BIBLIOGRAPHY 40
[29] Yi-Ping You, Chung-Wen Huang, Jenq-Kuen Lee. A SinkNHoist
Framework for Leakage Power Reduction In Proceedings of the 5th
International Workshop on Embedded Software, 2005.
[30] Peng-Sheng Chen, Yuan-Shin Hwang, Roy Dz-Ching Ju, Jenq-Kuen
Lee. Interprocedural Probabilistic Pointer Analysis. In IEEE Trans-
actions on Parallel and Distributed Systems, VOL. 15, NO. 10, OC-
TOBER 2004.
[31] L.J. Guibas and D.K Wyatt. Compilation and delayed evaluation
in APL. In Proc. of the Fifth Annual ACM Symp. on Principles of
Programming Languages, pp.1-8, 1978.
[32] Roy. D.-C. Ju, C. L. Wu and P. Carini. The Synthesis of Array Func-
tions and its use in Parallel Computation. In Proc. of International
Conference on Parallel Processing, pp.293-296, 1992.
[33] S. Chatterjee, G. E. Blelloch, and A. L. Fisher. Size and Access
Inference for Data Parallel Programs. In Proc. of ACM SIGPLAN
Conference on Programming Language Design and Implementation,
pp.130-144, 1991.
[34] S. Chatterjee. Compiling Nested Data Parallel Programs for Shared-
memory multiprocessors. In ACM Trans. Programming Lang. Syst.
15, 400-462, July 1993.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文