針對基於可變電阻式記憶體的加速器之機器學習編譯流程

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖敏君 Liao, Min-Chun
論文名稱：	針對基於可變電阻式記憶體的加速器之機器學習編譯流程 Machine Learning Compilation Flow for a ReRAM-based Accelerator
指導教授：	金仲達 King, Chung-Ta
口試委員:	黃稚存 Huang, Chih-Tsun 陳耀華 Chen, Yao-Hua
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2022
畢業學年度：	111
語文別：	英文
論文頁數：	33
中文關鍵詞：	可變電阻式記憶體、編譯器、參數壓縮
外文關鍵詞：	ReRAM, Compiler, Weight compression
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

可變電阻式記憶體加速器因為擁有記憶體內運算的能力而成為了為神經網路模型加速很好的選項。然而將模型部屬到此加速器上執行需要花不少功夫，機器學習編譯器能夠幫助使用者減輕負擔。目前 TVM 還尚未支援新興記憶體加速器當作編譯的後端環境，因此在這篇論文我們修改 TVM 使之能編譯讓模型在工研院提供之可變電阻式記憶體加速器模擬環境執行，不僅能處理一般的卷積層與全連接層，稀疏的運算也能夠處理。我們將開發的編譯流程應用在分析不同參數排列演算法搭配參數壓縮在交錯式陣列上達到的壓縮效果。實驗顯示本篇論文提出的兩種方法皆能在短時間達到不錯的成效，使用者可以根據自己的時間預算選擇適合的演算法。

ReRAM-based accelerators have received much attention recently for their abilities of performing in-memory operations to accelerate the execution of neural network models. However, deploying a model to such accelerators will require a lot of tedious works. A machine learning compiler can help ease the burden on users. Unfortunately, the popular TVM compiler has not yet supported the ReRAM-based accelerators as a backend environment. In this thesis, we modify TVM to enable it to compile neural network models and run the code in a ReRAM-based accelerator simulation environment provided by ITRI. The enhanced TVM can handle not only general convolution layers and fully connected layers, but also sparse operations. We apply the developed compilation process to analyze the compression effects of different row-reordering algorithms and weight compression strategies on the crossbar array, including two proposed in this thesis. Experiments show that the
two proposed methods can achieve good results with a short execution time. Users can thus choose a suitable algorithm according to their time budget.

Acknowledgements
摘要 i
Abstract ii
Introduction 1
Related Work 5
1 CIM Compiler 5
2 Sparse Neural Network Acceleration with ReRAM 6
Preliminary 9
1 TVM Structure 9
2 ReRAM Operation Unit (OU) 10
3 Quantized Conv2d Computation 10
System Design 13
1 Pruning and Quantization 13
2 Preprocessing 14
2.1 Preprocessing Flow 14
2.2 Row Reordering Algorithm 15
3 Compile and Run 18
3.1 Processing of the Relay Graph 18
3.2 Code Generation 20
Experiments 23
1 Experimental Setup 23
2 Compilation Analysis 23
2.1 ReRAM Information 24
2.2 Time Profile 24
3 Row-Reordering Algorithm Comparison 25
3.1 Number of Crossbar Loads 25
3.2 Reordering Time 27
3.3 Varying OU Size 27
Conclusion and Future Work 29
References 31

                                

[1] M. Pak and S. Kim, “A review of deep learning in image recognition,” in Proceedings of the 2017 4th International Conference on Computer Applications and Information Processing Technology, CAIPT 2017, vol. 2018-January, 2018.
[2] D. W. Otter, J. R. Medina, and J. K. Kalita, “A Survey of the Usages of Deep Learning for Natural Language Processing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, 2021.
[3] M. Bojarski, L. Jackel, B. Firner, and U. Muller, “Explaining How End-to-End Deep Learning Steers a Self-Driving Car,” Nvidia, 2017.
[4] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory,” in Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016, 2016.
[5] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in Proceedings - 2016 43rd International Symposium on Computer Architecture, ISCA 2016, 2016.
[6] F. Zahoor, T. Z. Azni Zulkifli, and F. A. Khanday, “Resistive Random Access Memory (RRAM): an Overview of Materials, Switching Mechanism, Performance, Multilevel Cell (mlc) Storage, Modeling, and Applications,” 2020.
[7] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy, “TVM: An automated end-to-end optimizing compiler for deep learning,” in Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, 2007.
[8] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: A system for large-scale machine learning,” in Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, 2016.
[9] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information
Processing Systems, vol. 32, 2019.
[10] T. Chen, M. Li, U. W. Cmu, Y. Li, M. Lin, N. Wang, M. Wang, B. Xu, C. Zhang, Z. Zhang, and U. Alberta, “MXNet : A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems arXiv : 1512 . 01274v1 [ cs . DC ] 3 Dec 2015,” Emerald
Group Publishing Limited, vol. 36, no. 2, 2015.
[11] J. Bai, F. Lu, and K. Zhang, “ONNX: Open Neural Network Exchange,” GitHub repository, 2019.
[12] T. Chen, L. Zheng, E. Yan, Z. Jiang, T. Moreau, L. Ceze, C. Guestrin, and A. Krishnamurthy, “Learning to optimize tensor programs,” in Advances in Neural Information Processing Systems, vol. 2018-December, 2018.
[13] L. Zheng, C. Jia, M. Sun, Z. Wu, C. H. Yu, A. Haj-Ali, Y. Wang, J. Yang, D. Zhuo, K. Sen, J. E. Gonzalez, and I. Stoica, “Ansor: Generating high-performance tensor programs for deep learning,” in Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, 2020.
[14] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding,” in 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 2016.
[15] H. Ji, L. Song, L. Jiang, H. H. Li, and Y. Chen, “Recom: An efficient resistive accelerator for compressed deep neural networks,” in Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018, vol. 2018-January, 2018.
[16] C. Y. Tsai, C. F. Nien, T. C. Yu, H. Y. Yeh, and H. Y. Cheng, “RePIM: Joint Exploitation of Activation and Weight Repetitions for In-ReRAM DNN Acceleration,” in Proceedings - Design Automation Conference, vol. 2021-December, 2021.
[17] A. Drebes, L. Chelini, O. Zinenko, A. Cohen, H. Corporaal, T. Grosser, K. Vadivel, and N. Vasilache, “TC-CIM: Empowering Tensor Comprehensions for Computing-InMemory,” in IMPACT 2020 workshop (associated with HIPEAC 2020), 2020.
[18] N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen, “The next 700 accelerated layers: From mathematical expressions of network computation graphs to accelerated GPU kernels, automatically,”
ACM Transactions on Architecture and Code Optimization, vol. 16, no. 4, 2019.
[19] S. Verdoolaege, S. Guelton, T. Grosser, and A. Cohen, “Schedule Trees,” in International Workshop on Polyhedral Compilation Techniques, no. January 2014, 2014.
[20] A. Siemieniuk, L. Chelini, A. A. Khan, J. Castrillon, A. Drebes, H. Corporaal, T. Grosser, and M. Kong, “OCC: An Automated End-to-End Machine Learning Optimizing Compiler for Computing-In-Memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 6, 2022.
[21] A. Vasudevan, A. Anderson, and D. Gregg, “Parallel Multi Channel convolution using General Matrix Multiplication,” in Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, 2017.
[22] A. Sebastian, M. Le Gallo, and E. Eleftheriou, “Computational phase-change memory: Beyond von Neumann computing,” 2019.
[23] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in Advances in Neural Information Processing Systems, 2016.
[24] J. Lin, Z. Zhu, Y. Wang, and Y. Xie, “Learning the sparsity for RERAM: Mapping and pruning sparse neural network for ReRAM based accelerator,” in Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC, 2019.
[25] “Example Compilation Flow.” [Online]. Available: https://tvm.apache.org/docs/arch/index.html
[26] M. Y. Lin, H. Y. Cheng, W. T. Lin, T. H. Yang, I. C. Tseng, C. L. Yang, H. W. Hu, H. S. Chang, H. P. Li, and M. F. Chang, “DL-RSIM: A simulation framework to enable reliable ReRAM-based accelerators for deep learning,” in IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, 2018.
[27] B. Skienard, P. Blaise, B. Traore, A. Dragoni, C. Nail, and E. Vianello, “Advances in the understanding of microscopic switching mechanisms in ReRAM devices (Invited paper),” in European Solid-State Device Research Conference, 2017.
[28] A. Jain, S. Bhattacharya, M. Masuda, V. Sharma, and Y. Wang, “Efficient Execution of Quantized Deep Learning Models: A Compiler Approach,” ArXiv, vol. abs/2006.10226, 2020.
[29] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and Training of Neural Networks for Efficient IntegerArithmetic-Only Inference,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018.
[30] “Quantization aware training.” [Online]. Available: https://www.tensorflow.org/model_optimization/guide/quantization/training
[31] “Post-training quantization.” [Online]. Available: https://www.tensorflow.org/model_optimization/guide/quantization/post_training
[32] Z. Chen, C. H. Yu, T. Morris, J. Tuyls, Y.-H. Lai, J. Roesch, E. Delaye, V. Sharma, and Y. Wang, “Bring Your Own Codegen to Deep Learning Compiler,” 5 2021.
[33] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[34] L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.

簡易檢索 / 詳目顯示

相關論文