簡易檢索 / 詳目顯示

研究生: 陳聖元
Sheng-Yuan Chen
論文名稱: 改良資料傳遞技術在非對稱暫存器的嵌入式VLIW數位信號處理器
Enhanced Copy Propagations for Embedded VLIW DSP Processors with Irregular Register Files
指導教授: 李政崑
Jenq-Kuen Lee
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 40
中文關鍵詞: 嵌入式系統數位訊號處理器分散式暫存器叢集式架構資料傳遞
外文關鍵詞: VLIW, DSP, embedded device, cluster-based architecture, irregular register files, copy propagation
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著影像和多媒體技術的蓬勃發展,應用在嵌入式系統上的高效能低耗能數位訊號
    處理器的地位便水漲船高。對嵌入式系統設計者而言,減少電力消耗和減低設計複
    雜度是一項重要的工作。因此現在數位訊號處理器的趨勢便朝向以分散式暫存器和
    叢集式架構為主的設計,以減少暫存器間的讀取和寫入通道。這種新的設計趨勢便
    為編譯器最佳化技術帶來了新的挑戰。在這篇論文中,我提出了一套針對現代數位
    訊號處理器設計架構而改良的資料傳遞技術,實驗證明這套技術確實能避免傳統資
    料傳遞時可能造成的效能浪費。
    我提出了一套考量數位訊號處理器硬體架構的資料流分析方法,此方法用來估
    計資料在暫存器之間流動所需的額外花費,並找出一條最短的傳遞路徑,以避免資
    料在此架構上流動而造成的效能消耗。由於估算資料流動的模型與硬體架構有密切
    的關係,因此我提出三種估算模型來描述資料在此硬體架構上的流動花費:Inter
    Cluster、Intra Cluster Communication Cost及Ping-Pong Register Constraint Cost。另外,
    我也利用這三個模型提出了一套尋找最佳傳輸路徑的演算法,並將此演算法結合到
    PAC ORC編譯器系統中以證明此方法的有效性,我的實驗平台為以PAC 2.0架構為
    平台的模擬器ISS,實驗結果顯示我們的方法相對於傳統的資料傳遞技術,對浮點數
    運算的DSP STONE測試程式可減少平均約13% 的效能浪費。


    High-performance and low-power VLIW DSP processors are increasingly
    deployed on embedded devices to process video and multimedia
    applications. For reducing power and cost in designs of VLIW DSP
    processors, distributed register files and multi-bank register
    architectures are being adopted to reduce the amount of read/write
    ports in register files. This presents new challenges for devising
    compiler optimization schemes for such architectures. In our
    research work, we address the compiler optimization issues for PAC
    architecture, which is a 5-way issue DSP processor with distributed
    register files. We show how to support an important class of
    compiler optimization problems, known as copy propagations, for such
    architectures. We illustrate that a naive deployment of copy
    propagations in embedded VLIW DSP processors with distributed files
    might result in performance anomaly. In our proposed scheme, we
    derive communication cost models by cluster distance, register port
    pressures, and the movement type of register sets. The cost models
    are used to guide the data flow analysis for supporting copy
    propagations over PAC architectures. Experimental results show that
    our schemes are effective to prevent performance anomaly with copy
    propagations over embedded VLIW DSP processors with distributed
    files.

    Acknowledgements i Abstract ii Contents iii List of Figures v List of Tables vi 1 Introduction 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Background 4 2.1 PAC DSP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Infrastructure Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Compiler Infrastructure . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Code Expansion and Clustered Partitioning for PAC DSP . . 8 2.2.3 Copy Propagation in EBO Phase . . . . . . . . . . . . . . . . 9 3 Enhanced Data Flow Analysis 11 3.1 Problem Speci‾cations . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 An EDFA Algorithm and Cost Models . . . . . . . . . . . . . . . . . 17 3.3 An Advanced Estimation Algorithm . . . . . . . . . . . . . . . . . . . 23 3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Experiments and Discussions 29 4.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Related Work and Discussions . . . . . . . . . . . . . . . . . . . . . . 31 5 Conclusion 33 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Bibliography 36

    [1] David Chang and Max Baron: Taiwan's Roadmap to Leader-
    ship in Design. Microprocessor Report, In-Stat/MDR, Dec. 2004.
    http://www.mdronline.com/mpr/archive/mpr 2004.html.
    [2] M. S. Hecht. Flow Analysis of Computer Programs. Elsevier, Ams-
    terdam, 1977.
    [3] K. W. Kennedy. A survey of data °ow analysis techniques. In S. S.
    Muchnick and N. D. Jones, editors, Program Flow Analysis: Theory
    and Applications, chapter 1, pages 5-54. Prentice-Hall, 1981.
    [4] C. M. Overstreet, R. Cherinka, M. Tohki, and R. Sparks. Support
    of software maintenance using data °ow analysis. Technical Report
    TR-94-07, Old Dominion University, Computer Science Department,
    June 1994.
    [5] C. M. Overstreet, R. Cherinka, and R. Sparks. Using bidirectional
    data °ow analysis to support software reuse. Technical Report TR-94-
    09, Old Dominion University, Computer Science Department, June
    1994.
    [6] Alfred V. Aho, Ravi Sethi, and Je®rey D. Ullman. Compilers: Prin-
    ciples, Techniques and Tools. Addison-Wesley. November, 1985.
    [7] Peter Vanbroekhoven, Gerda Janssens, Maurice Bruynooghe, Henk
    Corporaal, and Francky Catthoor. Advanced Copy Propagation for
    Arrays. In Proceedings of the 2003 ACM SIGPLAN conference on
    Language, compiler, and tool for embedded systems, San Diego, Cal-
    ifornia, USA.
    [8] P. Feautrier. Data°ow Analysis of Array and Scalar References. In
    International Journal of Parallel Programming, 20(1):23{53, 1991.
    [9] B. Kienhuis. Matparser: An Array Data°ow Analysis Compiler.
    Technical report, University of California, Berkeley, February 2000.
    [10] L. Moonen. A Generic Architecture for Data Flow Analysis to Sup-
    port Reverse Engineering. In Proceedings of the 2nd International
    Workshop on the Theory and Practice of Algebraic Speci‾cations,
    September 1997.
    [11] Mark N. Wegman, and F. Kenneth Zadeck. Constant Propagation
    with Conditional Branches. In ACM Transactions on Programming
    Languages and Systems, April, 1991.
    [12] Cheng-Wei Chen, Yung-Chia Lin, Chung-Ling Tang, and Jenq-Kuen
    Lee. ORC2DSP: Compiler Infrastructure Supports for VLIW DSP
    Processors. IEEE VLSI TSA, April 27-29, 2005.
    [13] Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu. Using
    Pro‾le Information to Assist Classic Code Optimizations. Software
    { Practice and Experience.
    [14] Jaejin Lee, David A. Padua, and Samuel P. Midki®. Basic Com-
    piler Algorithms for Parallel Programs. In Proceedings of the seventh
    ACM SIGPLAN symposium on Principles and Practice of Parallel
    Programming, 1999.
    [15] George Karypis and Vipin Kumar. A fast and highly quality mul-
    tilevel scheme for partitioning irregular graphs. SIAM J. Scienti‾c
    Computing, 20(1): 359V392, 1999.
    [16] Lal George, and Andrew W. Appel. Iterated Register Coalescing. In
    ACM Transactions on Programming Languages and Systems, May,
    1996.
    [17] V. Zivojnovic, J. Martinez, C. Schlager, and H. Meyr. DSPstone:
    A DSP-oriented benchmarking methodology. In Proceedings of the
    International Conference on Signal Processing and Technology (IC-
    SPAT94), pages 715V720, October 1994.
    [18] T.-J. Lin, C.-C. Chang. C.-C. Lee, and C.-W. Jen An E±cient VLIW
    DSP Architecture for Baseband Processing. In Proceedings of the 21th
    International Conference on Computer Design, 2003.
    [19] Tay-Jyi Lin, Chie-Min Chao, Chia-Hsien Liu, Pi-Chen Hsiao, Shin-
    Kai Chen, Li-Chun Lin, Chih-Wei Liu, Chein-Wei Jen Computer ar-
    chitecture: A uni‾ed processor architecture for RISC & VLIW DSP.
    In Proceedings of the 15th ACM Great Lakes symposium on VLSI,
    April 2005.
    [20] S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi, and J.
    D. Owens Register organization for media processing. International
    Symposium on High Performance Computer Architecture (HPCA),
    pp.375-386, 2000.
    BIBLIOGRAPHY 39
    [21] SGI - Developer Central Open Source - Pro64
    http://oss.sgi.com/projects/Pro64/.
    [22] Yung-Chia Lin, Yi-Ping You, Jenq-Kuen Lee. Register Allocation
    for VLIW DSP Processors with Irregular Register Files. In 16th
    Workshop on Compilers for Parallel Computers, Jan 2006.
    [23] S. Muchnick. Advanced compiler design and implementation. Morgan
    Kaufmann Publishers, San Francisco, CA, 1997.
    [24] R. Leupers. Instruction scheduling for clustered VLIW DSPs. In
    Proc. Intl Conference on Parallel Architecture and Compilation Tech-
    niques, pages 291V300, Oct. 2000.
    [25] Gwan-Hwan Hwang, Jenq-Kuen Lee and Roy Dz-Ching Ju. A
    Function-Composition Approach to Synthesize Fortran 90 Array Op-
    erations. In Journal of Parallel and Distributed Computing 54, 1-47,
    1998.
    [26] Gwan-Hwan Hwang, Jenq-Kuen Lee. Array Operation Synthesis to
    Optimize HPF Programs on Distributed Memory Machines. In Jour-
    nal of Parallel and Distributed Computing 61, 467-500, 2001.
    [27] Yung-Chia Lin, Chung-Lin Tang, Chung-Ju Wu, Jenq-Kuen Lee.
    Compiler Supports and Optimizations for PAC VLIW DSP Proces-
    sors. In Languages and Compilers for Parallel Computing, 2005.
    [28] Yung-Chia Lin, Yi-Ping You, Chung-Wen Huang, Jenq-Kuen Lee,
    Wek-Kuan Shih, Ting-Ting Huang. Energy-Aware Scheduling for
    Multiple-Voltage-Domain Security Processors. In ACM Transactions
    on Design Automation of Electronic Systems, 2006.
    BIBLIOGRAPHY 40
    [29] Yi-Ping You, Chung-Wen Huang, Jenq-Kuen Lee. A SinkNHoist
    Framework for Leakage Power Reduction In Proceedings of the 5th
    International Workshop on Embedded Software, 2005.
    [30] Peng-Sheng Chen, Yuan-Shin Hwang, Roy Dz-Ching Ju, Jenq-Kuen
    Lee. Interprocedural Probabilistic Pointer Analysis. In IEEE Trans-
    actions on Parallel and Distributed Systems, VOL. 15, NO. 10, OC-
    TOBER 2004.
    [31] L.J. Guibas and D.K Wyatt. Compilation and delayed evaluation
    in APL. In Proc. of the Fifth Annual ACM Symp. on Principles of
    Programming Languages, pp.1-8, 1978.
    [32] Roy. D.-C. Ju, C. L. Wu and P. Carini. The Synthesis of Array Func-
    tions and its use in Parallel Computation. In Proc. of International
    Conference on Parallel Processing, pp.293-296, 1992.
    [33] S. Chatterjee, G. E. Blelloch, and A. L. Fisher. Size and Access
    Inference for Data Parallel Programs. In Proc. of ACM SIGPLAN
    Conference on Programming Language Design and Implementation,
    pp.130-144, 1991.
    [34] S. Chatterjee. Compiling Nested Data Parallel Programs for Shared-
    memory multiprocessors. In ACM Trans. Programming Lang. Syst.
    15, 400-462, July 1993.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE