簡易檢索 / 詳目顯示

研究生: 張榮貴
Rong-Guey Chang
論文名稱: 在分散式記憶體環境平行稀疏運算之編譯器最佳化
Compiler Optimizations with Parallel Sparse Computations on Distributed Memory Environments
指導教授: 李政崑博士
Dr. Jenq Luen Lee
莊庭瑞博士
Dr. Tyng-Ruey Chuang
口試委員:
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2000
畢業學年度: 89
語文別: 中文
論文頁數: 145
中文關鍵詞: 平行編譯器稀疏運算編譯器最佳化
外文關鍵詞: Parallel Compilers, Sparse Computations, Compiler Optimizations
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Fortran 90 提供許多內建陣列的運算﹐而每一個內建陣列的運算都可以運用在高維陣列的運算。這些內建陣列的運算在資料平行程式所扮演的角色日形重要﹔然而Fortran
    90 的內建陣列運算卻對稀疏陣列運算缺乏支援。

    在此論文中我們建立一個有效率稀疏程式庫來解決這個問題。我們的稀疏程式庫採用雙層次的設計。在上層我們針

    對每一個內建陣列運算建立可超載的介面﹔而在下層我們針對每一個可超載的介面建立了對應的壓縮以及分散方式的函式。這些下層的函式則透過 MPI 來做溝通。如此一來﹐使用者便不必考慮下層的細節﹐而稀疏程式便可以用我們的稀疏程式庫來平行化程式。

    在我們的實驗和研究過程中﹐我們發現稀疏比例的資訊對稀疏程式的效能有極密切的關係。因此在此論文中我們對稀疏程式如何找出稀疏比例的資訊提出一個機率推理方法。首先我們針對稀疏陣列中非零元素為均勻分佈的情形作處理。再者﹐我們針對稀疏陣列中非零元素為非均勻分佈的情形先提出如何找出非零元素較集中的區域的方法﹔再利用所找出的稀疏比例的資訊發展出區段機率推理方法。

    另一方面﹐有了稀疏比例的資訊之後﹐我們便能利用它來做最佳化將我們效率稀疏程式庫上層可超載的介面對應到下層的壓縮以及分散方式的函式。由於這個問題是 NP-hard﹐我們發展出一個探索的兩階段以註解式程式圖為基礎的演算法來解決這個問題。在第一個階段我們利用註解式程式圖化簡的進階程式圖以及成本模型來自動選取稀疏陣列的壓縮方法。在第二個階段我們利用註解式程式圖以及成本模型來自動選取稀疏陣列的分散方法。

    此外我們也以Java的連續編譯技巧執行時的平行稀疏運算和PC叢集的平行稀疏運算做一些實驗


    Fortran 90 provides a rich set of array intrinsic functions.
    Each of these array intrinsic functions operates on the elements of multi-dimensional array objects concurrently.

    They provide a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data sets. In this dissertation, firstly, we address this open gap by presenting an efficient library for parallel sparse computations with Fortran 90 array intrinsic operations. Our method provides both compression schemes and distribution schemes on distributed memory environments applicable to higher-dimension sparse arrays. This way, programmers need not worry about these low-level details. Sparse programs can be expressed concisely using array expressions, and parallelized with the help of our library. Our sparse libraries are built for array intrinsics of Fortran 90, and they include an extensive set of

    array operations such as CSHIFT, EOSHIFT, MATMUL, MERGE, PACK,

    SUM, RESHAPE, SPREAD, TRANSPOSE, UNPACK, and section moves.

    In addition, we also provide the complete complexity analysis for our sparse implementation. The complexity of our algorithms is in proportion to the number of non-zero elements in the

    arrays, and that is consistent with the conventional design criteria for sparse algorithms and data structures.

    In the process of our experiments and design, we found that

    the sparsity information of sparse arrays are very critical for

    performance issues, such as compression, distribution, and

    communication cost. It becomes crucial how we can obtain the

    sparsity information for arrays of our programs. In this dissertation, we provide a solution to this problem. We provide probabilistic inference schemes to estimate sparsity ratio of target arrays operated based on Fortran 90 array operations and intrinsics. We first present an inference scheme to estimate the sparsity ratio of the target array of an expression using array intrinsic functions of Fortran 90 assuming a uniform distribution of sparsity. Next, we discuss ths issues for the non-uniform distribution of sparse elements. We divide the problems into two categories, elementwise array operations and

    transformation array operations. For sparse arrays operated on elementwise array operations, we present a segmented inference scheme with the flavor of lattice calculus to predict the sparse structures of arrays. For the case of transformational array operations with non-uniform distributions, we abstractly interpret the probability of a element to be zero pointwise for each target array element, and then present a sampling algorithm to obtain the sparsity structures. Our work gives sparse inference schemes for the complete set of array operations and array intrinsics of Fortran 90 with uniform or non-uniform distributions.

    On the other hand, due to our sparse library uses a

    two-level design, this raises a very interesting optimization

    problem described in the following. In the low-level routines,

    it requires the input sparse array to be specified with

    compression/distribution schemes for each function. In the high-level representation, sparse array functions are overloaded

    for array intrinsic interfaces so that programmers need not concern about the low-level details. What are the strategies to transform the high-level representations to low-level

    routines by automatic selections and supplies of distribution and compression schemes for sparse data sets? In this dissertation, we propose solutions to address this

    optimization issue. The optimization problem is shown to be

    be NP-hard. We develop a heuristic algorithm based on

    annotated program graphs to select compression schemes and

    distribution schemes in two phases. At the first phase, we first reduce the graph into a coarse graph.Then a tree-pruning algorithm based on the optimal solution is usedon the reduced graph to perform selections for compression schemes.In the second phase, the distribution scheme is again selected based

    on the original annotated graph and pruning algorithms. The algorithm is shown to be practical.

    In addition, we present a generic matrix class in Java

    and a runtime environment with continuous

    compilations aiming to support automatic parallelization of

    sparse computations on distributed environments.

    Moreover, we also observe the performance of our

    sparse supports for array intrinsics of Fortran 90

    built on top of PC-based networks of clusters.

    C. Ashcraft, R. Grimes, J. Lewis, B. Peyton, and H. Simon.
    Progress in sparse matrix methods for large sparse systems on vector supercomputers.Intern. Journal of Supervomputer Applications, 1(1987), pp. 10-30.
    Aart J.~C. Bik and Harry A.~G. Wijshoff.Automatic data structure selection and transformation for sparse matrix computations.IEEE Transactions on Parallel and Distributed Systems},7(2):109--126, February 1996.
    F. Bodin, P. Beckman, D. Gannon, S. Narayana, and S. Yang.
    Distributed {pC++}: Basic ideas for an object parallel language.
    Scientific programming, 2(3), Fall 1993.
    Rong-Guey Chang, Cheng-Wei Chen, Tyng-Ruey Chuang, and Jenq~Kuen Lee.Towards automatic supports of parallel sparse computation in Java with continuous compilation.
    Concurrency: Practice and Experience}, 9(11):1101--1111,
    November 1997.
    Rong-Guey Chang, Tyng-Ruey Chuang, and Jenq Kuen Lee.
    Efficient Support of Parallel Sparse Computation for Array Intrinsic Functions of Fortran 90, ACM International Conference on Supercomputing, Melbourne, July 13-17, 1998.
    Rong-Guey Chang, Tyng-Ruey Chuang, and Jenq Kuen Lee.
    Compiler Optimizations for Parallel Sparse Programs with Array Intrinsics of Fortran 90, International Conference on Parallel Processing, September 1999.
    Rong-Guey Chang, Tyng-Ruey Chuang, and Jenq Kuen Lee.
    Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90, Journal of Supercomputing, Kluwer Academic Publisher.
    Rong-Guey Chang, Xing-Yang Jiang, Tyng-Ruey Chuang, and Jenq Kuen Lee. Performance of Parallel Sparse Supports for Fortran 90 on PC-based Networks of Clusters, High-Performance Computing 2000, SCS, Washington D. C., April 2000.
    S. Chatterjee, J. Gilbert, R. Schreiber, and S. H. Teng.
    Auto matic array alignment in data-parallel programs.
    In 20th ACM Symp. on Principles of Programming Languages, pp. 16-28, 1993.
    Wai-Mee Ching and Alex Katz. An experimental APL compiler for a distributed memory parallel machine.
    In Proceedings of Supercomputing '94, pages 59--68. Washington,
    D. C., USA, November 1994.IEEE Computer Society Press.
    Tyng-Ruey Chuang, Rong-Guey Chang, and Jenq Kuen Lee.
    Sampling and analytical techniques for data distribution of parallel sparse computation.
    In Eighth SIAM Conference on Parallel Processing for Scientific
    Computing. Minneapolis, Minnesota, USA, March 1997.8 pages. SIAM Press.
    David E. Culler, Richard M. Karp, David Patterson, Abhijit Sahay, Eunice E.Santos, Klaus Erik Schauser, Ramesh Subramonian, and Thorsten von Eicken. LogP: A practical model of parallel computation. Communications of the ACM, 39(11):78--85, November 1996.
    J. W. Demmel, J. R. Gilbert, and X. S. Li. SuperLU users' guide,
    University of California, Berkerly.
    Luiz De Rose and David Padua.
    A MATLAB to Fortran 90 translator and its effectiveness.
    In Proceedings of the 1996 International Conference on
    Supercomputing, pages 309-316. Philadelphia, Pennsylvania, USA, May 1996. ACM Press.
    I. Duff and J. Reid. The multifrontal solution of indefinite sparse symmetric linear equations. ACM Transcation Mathematical Sofeware, 9(1983), pp. 302-325.
    Iain S. Duff, Roger G.Grimes, and John G Lewis, Sparse Matrix
    Test Problems, ACM Trans. Math. Softw. pp.1-14, 15, 1, Mar. 1989.
    John R. Gilbert, Cleve Moler, and Robert Schreiber. Sparse matrices in MATLAB: Design and implementation. SIAM Journal on Matrix Analysis and Applications,13(1):333---356, January 1992.
    Sanjay Goil and Alok Choudhary. High Performance OLAP and Data Mining on Parallel Computers,In Proceedings of IPPS/SPDP '98, Orlando, April, 1998.
    Gwan-Hwan Hwang, Jenq Kuen Lee, and Dz-Ching Ju. An array operation synthesis scheme to optimize Fortran 90 programs.
    In Proceedings of the Fifth ACM SIGPLAN Symposium on
    Principles & Practice of Parallel Programming, pages 112--122. Santa Barbara, California, USA, July 1995. ACM Press.
    Gwan-Hwan Hwang, Jenq Kuen Lee, and Dz-Ching Ju. Array Operation Synthesis to Optimize HPF Programs,Proceedings of the 1996 International Conference on Parallel Processing,
    Volume 3, pp. 1-8, August 1996.
    Gwan-Hwan Hwang, Jenq Kuen Lee, and Dz-ching Ju. A Function-Composition Approach to Synthesize Fortran 90 Array Operations,
    Journal of Parallel and Distributed Computing, 54, pp.1-47, 1998.
    Gwan-Hwan Hwang, Jenq Kuen Lee, Dz-Ching Ju, Integrating Automatic Data Alignment and Array Operation Synthesis to Optimize Data Parallel Programs, Workshop on Languages and
    Compilers for Parallel Computing (LCPC '97), August 1997
    (Also in LNCS Vol. 1366).
    Mahesh V. Joshi, George Karypis, and Vipin Kumar. ScalParC: A New Scalable and Efficient Parallel Classification
    Algorithm for Mining Large Datasets,In Proceedings of IPPS/SPDP '98, Orlando, April, 1998.
    K. Knobe, J. D. Lukas, and G. L. Steele. Data optimization: Allocation of arrays to reduce communication on SIMD machines.
    J. Parallel and Distributed Computing, 8(2):102-118, 1990.
    Charles H. Koelbel, David B. Loveman, Robert S. Schreiber, Guy~L. Steele Jr., and Mary E. Zosel. The High Performance Fortran Handbook.Scientific and Engineering Computation Series. The MIT Press, 1994.
    Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill.
    Compiling parallel sparse code for user-defined data structures.
    In Eighth SIAM Conference on Parallel Processing for Scientific
    Computing. Minneapolis, Minnesota, USA, March 1997.SIAM Press.
    Jenq Kuen Lee and D. Gannon. Objected-oriented parallel programming: Experiments and results.In Proceedings of Supercomputing '91. New Mexico, USA, 1991.
    R. C. T. Lee, R. C. Chang, and S.S. Tseng. Introduction to the Design and Analysis of Algorithms.
    C. M. Lin, Yue-Chee Chuang, Jenq Kuen Lee, K. L. Wu, C. A. Lin,
    Parallelizing Pressure Correction Method on Unstructured Grid,
    Parallel CFD '98.
    J. Li and M. Chen. The data alignment phase in compiling programs for distributed-memory machines, Journal of parallel and Distributed Computing, Vol. 13, pp. 213-221, Oct. 1991.
    J. Li and M. Chen. Compiling Communication-Efficient Programs for Massively Parallel Machines, IEEE Tran. On Parallel and Distributed Systems, Vol. 2, No. 3, July 1991.
    Lionel M. Ni, Hong Xu, and Edgar T. Kalns. Issues in scalable library design for massively parallel computers. In Proceedings of Supercomputing '93, pages 181--190. Portland, Oregon, USA, November 1993. IEEE Computer Society Press.
    Michael Philippsen and M. U. Mock.Data and process alignment in Modula-2*. In Automatic Parallelization: New Approaches, pages 177--191. Verlag Vieweg, 1994.
    Michael Philippsen. Automatic alignment of array data and process to reduce communication time on DMPPs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming}, pages 112--122. Santa Barbara, California, USA, July 1995. ACM Press.
    William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical recipes in Fortran 90: The Art of Parallel Scientific Computing. Cambridge University Press, 1996.
    Youcef Saad. SPARSKIT: A basic tool kit for sparse computations, VERSION 2. Technical report, Computer Science Department, University of Minnesota, June 1994.
    Samuel D. Stearns, Ruth A. David. Signal processing algorithms using Fortran and C. Prentice-Hall, 1993.
    M. Ujaldon, S. D. Sharma, J. Saltz, and E. L. Zapata. Run-time techniques for parallelizing sparse matrix problems. In Afonso Ferreira and Jos'e Rolim, editors, Parallel Algorithms for Irregularly Structured Problems: 2nd International Workshop,
    pages 43--57. Lyon, France, September 1995. Lecture Notes in Computer Science, Volume 980, Springer-Verlag.
    Manuel Ujaldon and Emilio Zapata. Efficient resolution of sparse indirections in data -parallel compilers. In Proceedings of the 1995 International Conference on Supercomputing, pages 117-126. Barcelona, Spain, July 1985. ACM Press.
    Manuel Ujaldon, Emilio Zapata, Barbara M. Chapman, and Hans P. Zima. New data-parallel language features for sparse matrix computations. In Proceedings of the 9th International Parallel Processing Symposium, pages 742-749. Santa Barbara, California, USA, April 1995. IEEE Computer Society Press.
    Janet Wu, Raja Das, Joel Saltz, Harry Berryman, and Seema Hiranandani. Distributed memory compiler design for sparse problems. IEEE Transaction on Computers, 44(6):737--753, June 1995.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE