簡易檢索 / 詳目顯示

研究生: 洪明郁
Ming-Yu Hung
論文名稱: 支援推測式多緒計算機結構的編譯器設計
Compiler supports for Optimizing Speculative Multithreading Architecture
指導教授: 李政崑
Jenq-Kuen Lee
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2004
畢業學年度: 92
語文別: 英文
論文頁數: 39
中文關鍵詞: 推測式多緒計算機結構相依關係分析同名機率資訊分析平行化
外文關鍵詞: Speculative multithreading, Dependence analysis, Probabilistic points-to analysis, Parallelization
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著VLSI技術的進步,在單一個處理器上已經可以加入許多特殊運作功能。支援推測式多緒計算機結構(SpMT)就是其中一種。它具有推測執行及多緒處理的能力,而且可以達到緒列間的平行處理。這樣的架構可以藉著推測且多緒平行處理,讓一般的程式達到執行效能的提升;但是要注意的是,SpMT架構存在著推測錯誤的可能性,所以當程式有高度的相依性存在時,效能不但不會提升,還有可能會有衰減的情形;因為當推測錯誤時,必須做回復的動作,而回復的程式必須先將已經存在的多緒列先刪除,再執行原來正常的程式。由此可知,量化程式間的相依性,對SpMT架構是很重要的;因為可以依照量化出來的結果,來決定是否讓此架構作推測性的執行,如此一來就可以防止發生效能衰減的情形。
    此篇論文提出了一個量化迴圈程式之間相依性的方法,這樣的量化資訊可以用來判斷該迴圈是否合適使用推測性多緒執行。首先我們先提出一個方法可以蒐集程式間指標同名(alias)機率資訊的方法(PPA),再利用PPA推導出指標之間的相依機率。之後我們利用推導出來的相依機率資訊讓推測式多緒處理的機器能夠永遠都做出一個正確的決定,不至於發生效能衰減的情況。我們的實驗平台是SImulator for Multithreaded Computer Architectures (SIMCA)。利用SIMCA所提供的特殊安插指令來模擬SpMT的行為。


    By the progress of VLSI technology, there are more and more features added in a single processor. Speculative multithreading (SpMT) architecture is one of them. It has speculative function and multithreading feature in a processor, and it can exploit thread-level parallelism that cannot be identified statically. Speedup can be obtained by speculatively executing threads in parallel that are extracted from a sequential program. However, performance degradation might occur if the threads are highly dependent. A recovery mechanism will be activated when a speculative thread violates the sequential semantics. The recovery action usually incurs a very high penalty, because it must squash all living threads before doing recovery code. Therefore, it is essential for SpMT to quantify the degree of dependences and to turn off speculation if the degree of loop carried dependence is over a certain threshold.
    This paper presents a technique that quantitatively computes loop carried dependences and such information can be used to determine if loop iterations should be executed in parallel by speculative threads or not. This technique can be broken into two steps. First, probabilistic points-to analysis is performed to estimate the probabilities of points-to relationships in case there are pointer references in programs. That way, the degree of dependences between loop iterations is computed quantitatively. Second, experimental results show compiler-directed thread-level speculation based on the information gathered by this technique can guarantee the architecture to always do a right decision on the experimental platform, SImulator for Multithreaded Computer Architectures (SIMCA). SIMCA be modeled as SpMT architecture by inserting SIMCA specific instructions.

    Acknowledgements i Abstract ii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 1.1 Thesis Overview 1 1.2 RelatedWork 4 2 Speculative Multithreading 6 2.1 Speculative Architecture 6 2.2 SimultaneousMultithreading Architecture 9 2.3 SpMT Architecture and Simulator 11 3 Data Dependence Probability 17 3.1 Definition of Data Dependence Probability 17 3.2 Optimization Target and CostModel 21 4 Experiments 24 4.1 Simulation 24 4.2 Experimental Results 24 5 Conclusion 29 5.1 Summary 29 5.2 FutureWork 30 Bibliography 31 Appendix 37

    [1] B. D. and T. Austin. The SimpleScalar Tool Set, Version 3.0. Unversity of
    Wisconsin Madison Computer Science Department.
    [2] L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, and K. Olukotun. The
    stanford hydra cmp. IEEE MICRO Magazine, March-April 2000.
    [3] V. Krishnan and J. Torrellas. A chip-multiprocessor architecture with speculative
    multithreading. IEEE Transactions on Computers, 48(9):866–880, 1999.
    [4] J. Oplinger, D. Heine, S.-W. Liao, B. A. Nayfeh, M. S. Lam, and K. Olukotun.
    Software and hardware for exploiting speculative parallelism with a multiprocessor.
    Technical Report CSL-TR-97-715, Stanford University, February 1997.
    [5] G. S. Sohi, S. E. Breach, and T. N. Vijaykumar. Multiscalar processors. In 25
    Years ISCA: Retrospectives and Reprints, pages 521–532, 1998.
    [6] J.-Y. Tsai, J. Huang, C. Amlo, D. J. Lilja, and P.-C. Yew. The superthreaded
    processor architecture. IEEE Transactions on Computers, 48(9):881–902, 1999.
    [7] J.-Y. Tsai, Z. Jiang, and P.-C. Yew. Compiler techniques for the superthreaded
    architectures. International Journal of Parallel Programming, 27(1):1–19, 1999.
    [8] A. Berson, S. Smith, and K. Thearling. Building Data Mining Applications for
    CRM. McGraw-Hill, 1999.
    31
    BIBLIOGRAPHY 32
    [9] B. W. Kernighan and D. M. Ritchie. The C programming language, Second
    Edition. Prentice Hall, 1988.
    [10] B. Stroustrup. The C++ programming language. Addison-Wesley, 1991.
    [11] G. Ramalingam. Data flow frequency analysis. In Proceedings of the ACM
    SIGPLAN ’96 conference on Programming language design and implementation,
    pages 267–277. ACM Press, 1996.
    [12] Yuan-Shin Hwang, Peng-Sheng Chen, Jenq Kuen Lee, and Roy Dz-Ching Ju.
    Probabilistic Points-to Analysis In Proceeding of the 2001 International Workshop
    on Languages and Compilers for Parallel Computing (LCPC’01), Cumberland
    Falls, Kentucky, August, 2001.
    [13] Smith, M., Horowitz, M., Lam, M. Efficient Superscalar Performance Through
    Boosting In 5th International Conference on Architectural Support for Programming
    languages and Operating Systems. IEEE/ACM. Boston, MA. p248-259.
    October, 1992
    [14] Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew, Roy Ju, Tin-fook Ngai,
    and Sun Chan, A Compiler Framework for Speculative Analysis and Optimizations
    in the ACM SIGPLAN’03 Conference on Programming Language Design
    and Implementation (PLDI), pp. 289-299, June 2003
    [15] Roy Dz-ching Ju, Kevin Nomura, Uma Mahadevan, Le-Chun Wu A Unified
    Compiler Framework for Control and Data Speculation 2000 International Conference
    on Parallel Architectures and Compilation Techniques (PACT’00) October
    15 - 19, 2000 Philadelphia, Pennsylvania
    [16] Dongkeun Kim (Intel), Steve Shih-wei Liao (Intel), Perry Wang (Intel), Juan
    del Cuvillo (Intel), Xinmin Tian (Intel), Xiang Zou (Intel), Hong Wang (Intel),
    BIBLIOGRAPHY 33
    Donald Yeung (U. of Maryland College Park), Milind Gikar (Intel), John Shen
    (Intel) Physical Experimentation with Prefetching Helper Threads on Intel’s
    Hyper-Threaded Processors 2004 International Symposium on Code Generation
    and Optimization with Special Emphasis on Feedback-Directed and Runtime
    Optimization
    [17] Intel’s docuemnt that provides the overview of the Microarchitecture, System
    Bus, Data Integrity, Configuration and Initialization, Test Access Port, and
    Integration Tools for use while designing systems with the Itanium processor.
    ftp://download.intel.com/design/Itanium/Downloads/24870102.pdf
    [18] M. Burke, P. Carini, J.-D. Choi, and M. Hind. Flow-insensitive interprocedural
    alias analysis in the presence of pointers. In Proceedings of the 8th International
    Workshop on Languages and Compilers for Parallel Computing, Columbus,
    Ohio, August 1995.
    [19] J.-D. Choi, M. Burke, and P. Carini. Efficient flow-sensitive interprocedural
    computation of pointer-induced aliases and side effects. In Proceedings of the 20th
    ACM SIGPLAN-SIGACT symposium on Principles of programming languages,
    pages 232–245. ACM Press, 1993.
    [20] M. Das. Unification-based pointer analysis with directional assignments. In
    Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language
    Design and Implementation (PLDI-00), volume 35.5 of ACM Sigplan Notices,
    pages 35–46, N.Y., June 18–21 2000. ACM Press.
    [21] A. Deutsch. Interprocedural May-Alias analysis for pointers: Beyond k-limiting.
    SIGPLAN Notices, 29(6):230–241, June 1994. Proceedings of the ACM SIGPLAN
    ’94 Conference on Programming Language Design and Implementation.
    BIBLIOGRAPHY 34
    [22] M. Emami, R. Ghiya, and L. J. Hendren. Context-sensitive interprocedural
    Points-to analysis in the presence of function pointers. SIGPLAN Notices,
    29(6):242–256, June 1994. Proceedings of the ACM SIGPLAN ’94 Conference
    on Programming Language Design and Implementation.
    [23] W. Landi and B. G. Ryder. A safe approximate algorithm for interprocedural
    pointer aliasing. SIGPLAN Notices, 27(7):235–248, July 1992. Proceedings
    of the ACM SIGPLAN ’92 Conference on Programming Language Design and
    Implementation.
    [24] J. K. Lee, D. Ho, and Y.-C. Chuang. Data distribution analysis and optimization
    for pointer-based distributed programs. In Proceedings of the 1997 International
    Conference on Parallel Processing (ICPP ’97), pages 56–63, Washington - Brussels
    - Tokyo, Aug. 1997. IEEE Computer Society Press.
    [25] Y.-J. Lin, Y.-S. Hwang, and J. K. Lee. Compiler optimizations with dsp-specific
    semantic descriptions. In Proceedings of the 2002 International Workshop on
    Languages and Compilers for Parallel Computing, July 2002.
    [26] Y.-S. Hwang, P.-S. Chen, J. K. Lee, and R. D.-C. Ju. Probabilistic points-to
    analysis. In Proceedings of the 2001 International Workshop on Languages and
    Compilers for Parallel Computing, August 2001.
    [27] The Stanford SUIF Compiler Group. The SUIF Library. Stanford University,
    1995.
    [28] M. D. Smith. The SUIF Machine Library. Division of of Engineering and Applied
    Science, Harvard University, March 1998.
    BIBLIOGRAPHY 35
    [29] R. Krishnaiyer, D. Kulkarni, D. M. Lavery, W. Li, C.-C. Lim, J. Ng, and D. C.
    Sehr. An advanced optimizer for the ia-64 architecture. IEEE Micro, 20(6):60–
    68, November/December 2000.
    [30] D.-C. R. Ju, J.-F. Collard, and K. Oukbir. Probabilistic memory disambiguation
    and its application to data speculation. In G. Lee and P.-C. Yew, editors,
    Third Workshop on Interaction between Compilers and Computer Architectures
    (INTERACT-3), San Jose, CA, Oct. 1998.
    [31] M. C. Carlisle and A. Rogers. Software caching and computation migration in
    olden. In Proceedings of ACM SIGPLAN Conference on Principles and Practice
    of Parallel Programming, pages 29–39, July 1995.
    [32] R.-G. Chang, T.-R. Chuang, and J. K. Lee. Efficient support of parallel sparse
    computation for array intrinsic functions of Fortran 90. In Conference Proceedings
    of the 1998 International Conference on Supercomputing, pages 45–52,
    Melbourne, Australia, July 13–17, 1998. ACM SIGARCH.
    [33] R.-G. Chang, J.-S. Li, J. K. Lee, and T.-R. Chuang. Probabilistic inference
    schemes for sparsity structures of fortran 90 array intrinsics. In 2001 International
    Conference on Parallel Processing (ICPP ’01, pages 61–68, Washington -
    Brussels - Tokyo, Sept. 2001. IEEE.
    [34] G.-H. Hwang, J. K. Lee, and R. D.-C. Ju. A function-composition approach
    to synthesize Fortran 90 array operations. Journal of Parallel and Distributed
    Computing, 54(1):1–47, 10 Oct. 1998.
    [35] G.-H. Hwang, J. K. Lee, and R. D.-C. Ju. Array operation synthesis to optimize
    HPF programs on distributed memory machines. Journal of Parallel and
    Distributed Computing, 61(4):467–500, Apr. 2001.
    BIBLIOGRAPHY 36
    [36] Y.-P. You, C.-R. Lee, and J. K. Lee. Compiler analysis and supports for leakage
    power reduction on microprocessors. In Proceedings of the 2002 International
    Workshop on Languages and Compilers for Parallel Computing, July 2002.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE