研究生: |
李杰倫 Lee, Jye-Luen |
---|---|
論文名稱: |
適用於隨機存取記憶體內運算加速器之卷積神經網路壓縮演算法 A Model Compression Algorithm of Convolution Neural Network for SRAM-based Computing-In-Memory Accelerator |
指導教授: |
鄭桂忠
Tang, Kea-Tiong |
口試委員: |
盧峙丞
Lu, Chih-Cheng 張孟凡 Chang, Meng-Fan 黃朝宗 Huang, Chao-Tsung |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 58 |
中文關鍵詞: | 剪枝 、量化 、記憶體內運算 、深度學習 |
外文關鍵詞: | Pruning, Quantization, Computing-In-Memory, Deep learning |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
記憶體內運算電路在涉及大量平行運算的仿神經計算與實現高能源效率方面顯示出極好的潛力。記憶體內運算特別適用於需要執行大量矩陣向量乘法的卷積神經網路。
在本研究中,考慮靜態記憶體內運算的硬體侷限性以及運算的特性,提出了一個實現壓縮卷積神經網路的剪枝框架。通過考慮靜態記憶體內運算的平行運算特性,和每次運算輸入及輸出的數量,提出了基於靜態記憶體內運算的剪枝結構,本論文透過組稀疏化演算法在神經網路訓練的過程中,針對所設計的剪枝結構進行稀疏化,使訓練後的神經網路產生大量為零的權重,大幅減少記憶體的存取需求。此外,本論文更進一步提出可調式正則項來平衡網路每一層的稀疏化比例,使得計算量較大的層可以獲得更高的稀疏比例,並利用交替方向乘子法在網路微調的過程中進一步約束了每個卷積核的權重數目以簡化硬體加速器設計。
本研究通過所提出的可調式正則項訓練演算法應用於VGG16,ResNet20以及ResNet18上,分別進行CIFAR-10與CIFAR-100圖像分類來驗證方法的成效,實驗結果顯示,經過剪枝的網路可以達到20倍以上的參數壓縮量,另外,在這樣的條件下,與使用固定正則項參數的組稀疏化演算法相比, VGG16和ResNet18可以減少超過20%的乘法運算量,同時在CIFAR-10和CIFAR-100上皆僅有1%左右的準確率下降。最後,本演算法在硬體上模擬可達到超過5倍的加速。
Computing in-memory(CIM) exhibits excellent potential for AI accelerator involving massive parallel computations and for achieving high energy efficiency. CIM is especially suitable for convolutional neural networks (CNNs), which need to perform large amounts of matrix-vector multiplications.
In this research, we propose a model compression framework of implementing pruning algorithm on SRAM-CIM hardware accelerator. By considering the parallel operation characteristics and the number of inputs and outputs of SRAM-CIM, a custom pruning structure is proposed. This work uses sparsity training method to gather the zero weight tensor into predefined pattern to fit the CIM macro. Furthermore, to optimize the performance of the neural network inference when deploy on CIM, this work proposes an adaptive regularization sparsity training algorithm to balance the sparsity ratio of each layer of the neural network, so that the layer with a larger computation requirement can obtain a higher sparsity ratio. In the last, this work adopts the ADMM (Alternating Direction Method of Multipliers) to constraint the non-zero weight number in the kernel to simplify the hardware design.
We evaluated the performance of the proposed model compression framework on CIFAR-10 and CIFAR-100 image classification tasks by the ResNet20 and the two large networks of VGG16 and ResNet18. The experiment results show that the network after pruning can achieve more than 20 times compression rates and over 80% flop reduction. In addition, compare to the traditional sparsity training, our proposed method has more than 20% flop reduction. Also, comparing to the network without pruning, the pruning networks can achieve more than 5 times speed up when deploy on SRAM-CIM hardware accelerator.
[1] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553): 436–444, 2015.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[3] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. In arXiv, 2015.
[4] Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. In arXiv, 2015.
[5] V. Mnih, et al. Human-level control through deep reinforcement learning. In Nature , 2015.
[6] Olga Russakovsky, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115.3: 211-252, 2015.
[7] J. Hu, L. Shen and G. Sun, "Squeeze-and-Excitation Networks," In CVPR, 2018
[8] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504-507, 2006.
[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[11] Chen, Y.-H., Krishna, T., Emer, J. S., and Sze, V., “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”, IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017. doi:10.1109/JSSC.2016.2616357.
[12] Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer." In MACRO, 2014
[13] S. Yin et al., "A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications," in IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 968-982, April 2018, doi: 10.1109/JSSC.2017.2778281.
[14] A. Shafiee et al., "ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars." In ISCA,2016
[15] P. Chi et al., "PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory." In ISCA, 2016
[16] M. Lin et al., "DL-RSIM: A Simulation Framework to Enable Reliable ReRAM-based Accelerators for Deep Learning." In ICCAD, 2018
[17] C.-X. Xue et al., "24.1 a 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors", Proc. IEEE Int. Solid-State Circuits Conf., pp. 388-390., 2019.
[18] G. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network, 2015.
[19] M. Jaderberg et al., “Speeding up convolutional neural networks with low rank expansions.” In arXiv, 2014.
[20] R. Krishnamoorthi, et al., “Quantizing deep convolutional networks for efficient inference: A whitepaper.”
[21] S. Zhou, et al., “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.” In arXiv:1606.06160, 2016.
[22] R. Banner, et al., “Post training 4-bit quantization of convolutional networks for rapid-deployment.” In NeurIPS , 2019.
[23] S. Han, et al., “Learning both Weights and Connections for Efficient Neural Networks.” In NIPS, 2015.
[24] H. Li, et al., ” Pruning Filters for Efficient ConvNets.” In ICLR, 2017
[25] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan and C. Zhang, "Learning Efficient Convolutional Networks through Network Slimming," In ICCV, 2017.
[26] S. Narang, et al., “Exploring Sparsity in Recurrent Neural Networks.” In ICLR, 2017
[27] P. Molchanov, et al., “ Importance Estimation for Neural Network Pruning.” In CVPR, 2019.
[28] N. Lee, et al., “SNIP: Single-shot Network Pruning based on Connection Sensitivity.” In ICLR, 2019.
[29] H. Yang, et al.,” Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration.” In CVPR, 2019
[30] Q. Zhang, et al., "Learning Compact Networks via Similarity-Aware Channel Pruning." In MIPR , 2020.
[31] J.-H. Luo, et al., “An Entropy-based Pruning Method for CNN Compression.” In arXiv, 2017.
[32] L. Hang, et al., “Feature Statistics Guided Efficient Filter Pruning.” In IJCAI, 2020.
[33] X. Si, et al., “A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips.” In ISSCC, 2020.
[34] S. Han, et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network.” In ISCA, 2016.
[35] S. Zhang, et al., “Cambricon-X: An accelerator for sparse neural networks.” In MICRO, 2016.
[36] J. Lin, et al., ” Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator.” In ASP-DAC, 2019.
[37] H. Ji, et al., "ReCom: An efficient resistive accelerator for compressed deep neural networks." In DATE, 2018.
[38] P. Wang, et al., "SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory." In DAC, 2019.
[39] W. Wen, et al., “Learning Structured Sparsity in Deep Neural Network”. In NIPS, 2016.
[40] S. Srinivas, et al., “Data-free Parameter Pruning for Deep Neural Networks.” In BMVC,2015.
[41] Boyd, Stephen & Parikh, Neal & Chu, Eric & Peleato, Borja & Eckstein, Jonathan. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.”
[42] T. Zhang, et al., “A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers.” In ECCV, 2018.
[43] T. Zhang, et al., “StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs.” In arXiv, 2018.
[44] H. Wang, Q. Zhang, Y. Wang, L. Yu and H. Hu, "Structured Pruning for Efficient ConvNets via Incremental Regularization," In IJCNN, 2019.
[45] H. Yang, et al., “DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures.” In ICLR, 2020.
[46] S.-H. Sie, et al. "MARS: Multi-macro Architecture SRAM CIM Based Accelerator with Co-designed Compressed Neural Networks." arXiv:2010.12861 2020.
[47] T. -W. Chin, et al., "Towards Efficient Model Compression via Learned Global Ranking." In CVPR, 2020.
[48] A. Kusupati, et al., “Soft Threshold Weight Reparameterization for Learnable Sparsity.” In ICML,2020.
[49] T. Yang, et al., “Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning.” In CVPR,2017
[50] H. Yang, et al., “Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking.” In ICLR, 2019
[51] H. Yang, et al., ”ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model,” In CVPR, 2019
[52] J. Shi, et al., “SASL: Saliency-Adaptive Sparsity Learning for Neural Network Acceleration.” In arXiv,2020.