簡易檢索 / 詳目顯示

研究生: 邵毓文
Shao, Yu-Wen
論文名稱: 用於2D Mesh AI加速器的MLIR編譯器與模擬器
An AI Accelerator MLIR Compiler and Simulator for 2D Mesh Architecture
指導教授: 李政崑
Lee, Jenq-Kuen
口試委員: 洪明郁
Hung, Ming-Yu
張元銘
Chang, Yuan-Ming
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 25
中文關鍵詞: AI加速器2D meshMLIRGEM5
外文關鍵詞: AI accelerator, 2D mesh, MLIR, GEM5
相關次數: 點閱:49下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提供了基於MLIR的編譯器與基於GEM5的模擬器來完成基於2D mesh AI加速器的編譯與模擬流程。我們提出了「MPI Lite Library」來於2D mesh的MLIR編譯器中支援2D mesh的維度設定與訊息接收發送方法。於MLIR中我們實作了「Aiengine」dialect與pass來對應並且轉換「MPI Lite Library」中的函數。為了使GEM5能夠執行2D mesh的程式,我們也在MLIR實作了將程式切割到host與tiles的pass。於GEM5上模擬編譯出來的程式即可測量host與tiles的執行時間。整體2D mesh的執行時間可以使用cost model計算得出。我們也展示了程式的擺放位置與路由的重要性,並且提供了最佳化這些問題的原則。最後,我們設計了從AI模型到時間模擬的實驗並且設定了不同的程式的擺放位置與路由來驗證這些原則並且展示我們編譯器與模擬器的效果。


    This work provides a compiler based on MLIR and a simulator based on GEM5 to enable a compilation and simulation flow for an AI accelerator based on 2D mesh architecture. To support the 2D mesh architecture in the MLIR compiler, we propose a messaging protocol called "MPI Lite Library" to specify the dimensions of the 2D mesh and provide message-sending and message-receiving methods. To support the "MPI Lite Library" in MLIR, we implement a dialect called "Aiengine" and a pass to match and convert the functions in the "MPI Lite Library". In order to enable GEM5 to simulate the program, we also implement a pass in MLIR to split the program into the host and each tile on the 2D mesh. The compiled programs are then simulated on GEM5, and the running time for the host and each tile is measured. The overall running time of the entire 2D mesh can be determined by applying a cost model. Additionally, we demonstrate the importance of program placement and routing, and provide principles for optimizing these issues on the 2D mesh. Finally, to verify the principles and demonstrate the effectiveness of our compiler and simulator, we conduct an end-to-end experiment using an AI model with different program placements and routings.

    摘要 i Abstract ii 誌謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Background 5 2.1 MLIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Polygeist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 GEM5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Compiler and Simulator for 2D Mesh 9 3.1 AI Accelerator Simulator Design . . . . . . . . . . . . . . . . . 10 3.2 MPI Lite Library Messaging Protocol . . . . . . . . . . . . . . 12 3.3 MLIR Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . 13 iv 3.3.1 Aiengine Dialect . . . . . . . . . . . . . . . . . . . . . 13 3.3.2 OP Conversion . . . . . . . . . . . . . . . . . . . . . . 13 3.3.3 Program Split . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Routing Configurations . . . . . . . . . . . . . . . . . . . . . . 16 4 Experiment 19 4.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 AI Accelerator Timing Simulation . . . . . . . . . . . . . . . . 19 5 Conclusion 23 Bibliography 24

    [1] Y.-W. Shao, C.-L. Lee, M.-S. Yu, and J.-K. Lee, “An ai accelerator simulator for 2d mesh architecture,” in Workshop on Compiler Techniques and System Software for High-Performance and Embedded Computing, ser. CTHPC’23, Hsinchu City, Taiwan, 2023.
    [2] C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko, “Mlir: Scaling compiler infrastructure for domain specific computation,” in 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2021, pp. 2–14.
    [3] T. Chen, T. Moreau, Z. Jiang, H. Shen, E. Q. Yan, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “TVM: end-to-end optimization stack for deep learning,” CoRR, vol. abs/1802.04799, 2018. [Online]. Available: http://arxiv.org/abs/1802.04799
    [4] W. S. Moses, L. Chelini, R. Zhao, and O. Zinenko, “Polygeist: Raising c to polyhedral mlir,” in 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2021, pp. 45–59.
    [5] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The gem5 simulator,” SIGARCH Comput. Archit. News, vol. 39, no. 2, p. 1–7, aug 2011. [Online]. Available: https://doi.org/10.1145/2024716.2024718
    [6] N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha, “Garnet: A detailed on-chip network model inside a full-system simulator,” in 2009 IEEE International Symposium on Performance Analysis of Systems and Software, April 2009, pp. 33–42.
    [7] C. Lattner and V. Adve, “Llvm: a compilation framework for lifelong program analysis transformation,” in International Symposium on Code Generation and Optimization, 2004. CGO 2004., March 2004, pp. 75–86.
    [8] T. Eicken, D. Culler, S. Goldstein, and K. Schauser, “Active messages: A mechanism for integrated communication and computation,” in [1992] Proceedings the 19th Annual International Symposium on Computer Architecture, 1992, pp. 256–266.
    [9] M. P. Forum, “Mpi: A message-passing interface standard,” USA, Tech. Rep., 1994.
    [10] T. D. Le, G. Bercea, T. Chen, A. E. Eichenberger, H. Imai, T. Jin, K. Kawachiya, Y. Negishi, and K. O’Brien, “Compiling ONNX neural network models using MLIR,” CoRR, vol. abs/2008.08272, 2020. [Online]. Available: https://arxiv.org/abs/2008.08272

    QR CODE