研究生: |
梁耕銘 Liang, Geng-Ming |
---|---|
論文名稱: |
SSHO:類神經網路的結構式稀疏張量與高階綜合優化器 SSHO: Structured Sparse HLS Optimizer for Neural Networks |
指導教授: |
李政崑
Lee, Jenq-Kuen |
口試委員: |
游逸平
You, Yi-Ping 洪明郁 Hung, Ming-Yu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 40 |
中文關鍵詞: | MLIR 、高階綜合 、LLVM 、編譯器 、稀疏張量 |
外文關鍵詞: | MLIR, HLS, LLVM, Compiler, Sparse-Tensor |
相關次數: | 點閱:56 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,大型語言模型(LLM)成為最熱門的話題,並大大改變了我們的生活。許多設備添加了人工智慧以增強用戶體驗,但這些設備如何能夠承受如此高的存儲和計算需求呢?稀疏計算成為了一種經典且高效的解決方案。通過不存儲和計算零值,我們可以大大減少延遲和空間使用。著名的AI框架已經實現了稀疏計算,MLIR也開發了稀疏張量方言,支持稀疏壓縮格式並提供了到LLVM的編譯流程。配合特定加速器的支持,我們可以最大化地利用我們的設計,而HLS(高級綜合)已成為快速原型化的一種方式。然而,MLIR僅支持經典的稀疏壓縮,且對硬體不友好。此外,由MLIR AI模型生成的LLVM IR無法直接被HLS利用,因為版本不同和代碼風格的差異。在本文中,我們提出了一種新的稀疏壓縮方案,這種方案在硬體上更易於並行化,並通過減少小權重來關注準確性。我們選擇了卷積神經網絡(CNN)模型來展示我們設計的性能。結果顯示,我們通過使用高稀疏性來保持準確性,同時減少了計算時間和存儲空間。
In recent years, large language models (LLM) become the hottest topic and have massively changed our lives. Many devices add AI to enhance the user experience, but how can these devices afford such high storage and computation requirements? Sparse computing become a classic and efficient way to deal with the problem. Without storing and computing zero values, we can massively reduce the latency and space usage. Famous AI frameworks have already implemented sparse computing, MLIR also develops Sparse Tensor dialect, supporting sparse compression formats and providing a compilation flow to LLVM. With specific accelerator supports, we can maximize the use of our design, and HLS has become a fast way to prototype it. However, MLIR only supports classic sparse compression and is not hardware-friendly. Moreover, the LLVM IR generated from MLIR AI model cannot be utilized by HLS directly due to the version difference and code style. In this paper, we propose a new sparse compression scheme that can be more parallelizable for hardware and cares about accuracy by reducing small weights. We choose CNN models to demonstrate our design performances. Results show that we maintain accuracy by using high sparsity and simultaneously reducing computation time and storage.
Bibliography
[1] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Pensky, “Sparse
convolutional neural networks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2015, pp. 806–814.
[2] A. Zhou, Y. Ma, J. Zhu, J. Liu, Z. Zhang, K. Yuan, W. Sun, and H. Li,
“Learning n: m fine-grained structured sparse neural networks from
scratch,” arXiv preprint arXiv:2102.04010, 2021.
[3] H.-H. Liao, C.-L. Lee, J.-K. Lee, W.-C. Lai, M.-Y. Hung, and C.-W.
Huang, “Support convolution of cnn with compression sparse matrix
multiplication flow in tvm,” in 50th international conference on parallel
processing workshop, 2021, pp. 1–7.
[4] G.-M. Liang, C.-Y. Yuan, M.-S. Yuan, T.-L. Chen, K.-H. Chen, and
J.-K. Lee, “The support of mlir hls adaptor for llvm ir,” in Workshop
Proceedings of the 51st International Conference on Parallel Processing,
2022, pp. 1–8.
[5] G.-M. Liang, C.-L. Lee, R. Lai, and J.-K. Lee, “Support of sparse tensor computing for mlir hls,” in Proceedings of the 52nd International
Conference on Parallel Processing Workshops, 2023, pp. 88–95.
38
BIBLIOGRAPHY 39
[6] H.-I. C. Liu, M. Brehler, M. Ravishankar, N. Vasilache, B. Vanik, and
S. Laurenzo, “Tinyiree: An ml execution environment for embedded
systems from compilation to deployment,” IEEE Micro, vol. 42, no. 5,
pp. 9–16, 2022.
[7] T. Jin, G.-T. Bercea, T. D. Le, T. Chen, G. Su, H. Imai, Y. Negishi,
A. Leu, K. O’Brien, K. Kawachiya et al., “Compiling onnx neural network models using mlir,” arXiv preprint arXiv:2008.08272, 2020.
[8] H. Ye, C. Hao, J. Cheng, H. Jeong, J. Huang, S. Neuendorffer, and
D. Chen, “Scalehls: A new scalable high-level synthesis framework on
multi-level intermediate representation,” in 2022 IEEE international
symposium on high-performance computer architecture (HPCA). IEEE,
2022, pp. 741–755.
[9] A. Mishra, J. A. Latorre, J. Pool, D. Stosic, D. Stosic, G. Venkatesh,
C. Yu, and P. Micikevicius, “Accelerating sparse deep neural networks,”
arXiv preprint arXiv:2104.08378, 2021.
[10] S. Cao, C. Zhang, Z. Yao, W. Xiao, L. Nie, D. Zhan, Y. Liu, M. Wu,
and L. Zhang, “Efficient and effective sparse lstm on fpga with bankbalanced sparsity,” in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019, pp. 63–72.
[11] Y. N. Wu, P.-A. Tsai, S. Muralidharan, A. Parashar, V. Sze, and
J. Emer, “Highlight: Efficient and flexible dnn acceleration with hierarchical structured sparsity,” in Proceedings of the 56th Annual IEEE/
ACM International Symposium on Microarchitecture, 2023, pp. 1106–
1120.
BIBLIOGRAPHY 40
[12] A. Bik, P. Koanantakool, T. Shpeisman, N. Vasilache, B. Zheng, and
F. Kjolstad, “Compiler support for sparse tensor computations in mlir,”
ACM Transactions on Architecture and Code Optimization (TACO),
vol. 19, no. 4, pp. 1–25, 2022.
[13] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for
word representation,” in Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
[14] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for
efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
[15] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of
word representations in vector space,” arXiv preprint arXiv:1301.3781,
2013.
[16] T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick,
J. Hays, P. Perona, D. Ramanan, P. Doll’a r, and C. L. Zitnick,
“Microsoft COCO: common objects in context,” CoRR, vol. abs/
1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
[17] C.-L. Lee, C.-T. Chao, W.-H. Chu, M.-Y. Hung, and J.-K. Lee, “Accelerating ai applications with sparse matrix compression in halide,” Journal
of Signal Processing Systems, vol. 95, no. 5, pp. 609–622, 2023.