適用於隨機存取記憶體內運算加速器之卷積神經網路壓縮演算法

簡易檢索 / 詳目顯示

回結果列表

研究生：	李杰倫 Lee, Jye-Luen
論文名稱：	適用於隨機存取記憶體內運算加速器之卷積神經網路壓縮演算法 A Model Compression Algorithm of Convolution Neural Network for SRAM-based Computing-In-Memory Accelerator
指導教授：	鄭桂忠 Tang, Kea-Tiong
口試委員:	盧峙丞 Lu, Chih-Cheng 張孟凡 Chang, Meng-Fan 黃朝宗 Huang, Chao-Tsung
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	中文
論文頁數：	58
中文關鍵詞：	剪枝、量化、記憶體內運算、深度學習
外文關鍵詞：	Pruning, Quantization, Computing-In-Memory, Deep learning
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

記憶體內運算電路在涉及大量平行運算的仿神經計算與實現高能源效率方面顯示出極好的潛力。記憶體內運算特別適用於需要執行大量矩陣向量乘法的卷積神經網路。
在本研究中，考慮靜態記憶體內運算的硬體侷限性以及運算的特性，提出了一個實現壓縮卷積神經網路的剪枝框架。通過考慮靜態記憶體內運算的平行運算特性，和每次運算輸入及輸出的數量，提出了基於靜態記憶體內運算的剪枝結構，本論文透過組稀疏化演算法在神經網路訓練的過程中，針對所設計的剪枝結構進行稀疏化，使訓練後的神經網路產生大量為零的權重，大幅減少記憶體的存取需求。此外，本論文更進一步提出可調式正則項來平衡網路每一層的稀疏化比例，使得計算量較大的層可以獲得更高的稀疏比例，並利用交替方向乘子法在網路微調的過程中進一步約束了每個卷積核的權重數目以簡化硬體加速器設計。
本研究通過所提出的可調式正則項訓練演算法應用於VGG16，ResNet20以及ResNet18上，分別進行CIFAR-10與CIFAR-100圖像分類來驗證方法的成效，實驗結果顯示，經過剪枝的網路可以達到20倍以上的參數壓縮量，另外，在這樣的條件下，與使用固定正則項參數的組稀疏化演算法相比， VGG16和ResNet18可以減少超過20%的乘法運算量，同時在CIFAR-10和CIFAR-100上皆僅有1%左右的準確率下降。最後，本演算法在硬體上模擬可達到超過5倍的加速。

Computing in-memory(CIM) exhibits excellent potential for AI accelerator involving massive parallel computations and for achieving high energy efficiency. CIM is especially suitable for convolutional neural networks (CNNs), which need to perform large amounts of matrix-vector multiplications.
In this research, we propose a model compression framework of implementing pruning algorithm on SRAM-CIM hardware accelerator. By considering the parallel operation characteristics and the number of inputs and outputs of SRAM-CIM, a custom pruning structure is proposed. This work uses sparsity training method to gather the zero weight tensor into predefined pattern to fit the CIM macro. Furthermore, to optimize the performance of the neural network inference when deploy on CIM, this work proposes an adaptive regularization sparsity training algorithm to balance the sparsity ratio of each layer of the neural network, so that the layer with a larger computation requirement can obtain a higher sparsity ratio. In the last, this work adopts the ADMM (Alternating Direction Method of Multipliers) to constraint the non-zero weight number in the kernel to simplify the hardware design.
We evaluated the performance of the proposed model compression framework on CIFAR-10 and CIFAR-100 image classification tasks by the ResNet20 and the two large networks of VGG16 and ResNet18. The experiment results show that the network after pruning can achieve more than 20 times compression rates and over 80% flop reduction. In addition, compare to the traditional sparsity training, our proposed method has more than 20% flop reduction. Also, comparing to the network without pruning, the pruning networks can achieve more than 5 times speed up when deploy on SRAM-CIM hardware accelerator.

目錄
摘要    i
ABSTRACT    ii
目錄    iii
圖目錄    vi
表目錄    viii
第一章     緒論    1
1-1 研究背景    1
1-2 研究動機與目的    4
1-3 章節介紹    6
第二章    文獻回顧    7
2-1模型壓縮演算法    7
2-2 模型剪枝    8
2-2-1剪枝結構與訓練流程    8
2-2-2剪枝策略    10
2-3 靜態隨機存取記憶體內運算架構    12
2-4 稀疏網路部署於記憶體內運算架構之限制    15
2-5 研究動機    17
第三章   面相記憶體內運算友善剪枝框架    18
3-1. 記憶體內運算硬體友善的剪枝結構    19
3-2.組稀疏化訓練演算法    21
3-3. 基於可調式正則項參數的組稀疏化訓練演算法    22
3-3-1  正則項參數更新幅度(Regularization Term Update Step)    24
3-3-2  全局閾值(Global Rank)及區域閾值(Local Rank)計算    25
3-3-3  正則項參數動態更新    26
3-4 基於全局閾值的卷積神經網路剪枝(Global Pruning)    29
3-5.  利用交替方向乘子法解決記憶體內運算權重再存儲之限制    30
第四章   實驗結果    35
4-1 實驗設置    35
4-1-1. 數據集及數據前處理    35
4-1-2. 網路架構及超參數設置    36
4-1-3. 軟硬體環境    37
4-2 組稀梳化訓練模型壓縮之實驗結果    38
4-3 可調式正則項參數組稀梳化訓練模型壓縮之實驗結果    42
4-4 可調式正則項參數與其他方法之比較結果    44
4-5 交替方向乘子法網路微調結果    48
4-6 硬體模擬之實驗結果    49
第五章  結論與未來發展    53
參考文獻    55


                                

[1] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553): 436–444, 2015.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[3] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. In arXiv, 2015.
[4] Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. In arXiv, 2015.
[5] V. Mnih, et al. Human-level control through deep reinforcement learning. In Nature , 2015.
[6] Olga Russakovsky, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115.3: 211-252, 2015.
[7] J. Hu, L. Shen and G. Sun, "Squeeze-and-Excitation Networks," In CVPR, 2018
[8] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504-507, 2006.
[9] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[11] Chen, Y.-H., Krishna, T., Emer, J. S., and Sze, V., “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”, IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017. doi:10.1109/JSSC.2016.2616357.
[12] Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer." In MACRO, 2014
[13] S. Yin et al., "A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications," in IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 968-982, April 2018, doi: 10.1109/JSSC.2017.2778281.
[14] A. Shafiee et al., "ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars." In ISCA,2016
[15] P. Chi et al., "PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory." In ISCA, 2016
[16] M. Lin et al., "DL-RSIM: A Simulation Framework to Enable Reliable ReRAM-based Accelerators for Deep Learning." In ICCAD, 2018
[17] C.-X. Xue et al., "24.1 a 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors", Proc. IEEE Int. Solid-State Circuits Conf., pp. 388-390., 2019.
[18] G. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network, 2015.
[19] M. Jaderberg et al., “Speeding up convolutional neural networks with low rank expansions.” In arXiv, 2014.
[20] R. Krishnamoorthi, et al., “Quantizing deep convolutional networks for efficient inference: A whitepaper.”
[21] S. Zhou, et al., “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.” In arXiv:1606.06160, 2016.
[22] R. Banner, et al., “Post training 4-bit quantization of convolutional networks for rapid-deployment.” In NeurIPS , 2019.
[23] S. Han, et al., “Learning both Weights and Connections for Efficient Neural Networks.” In NIPS, 2015.
[24] H. Li, et al., ” Pruning Filters for Efficient ConvNets.” In ICLR, 2017
[25] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan and C. Zhang, "Learning Efficient Convolutional Networks through Network Slimming," In ICCV, 2017.
[26] S. Narang, et al., “Exploring Sparsity in Recurrent Neural Networks.” In ICLR, 2017
[27] P. Molchanov, et al., “ Importance Estimation for Neural Network Pruning.” In CVPR, 2019.
[28] N. Lee, et al., “SNIP: Single-shot Network Pruning based on Connection Sensitivity.” In ICLR, 2019.
[29] H. Yang, et al.,” Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration.” In CVPR, 2019
[30] Q. Zhang, et al., "Learning Compact Networks via Similarity-Aware Channel Pruning." In MIPR , 2020.
[31] J.-H. Luo, et al., “An Entropy-based Pruning Method for CNN Compression.” In arXiv, 2017.
[32] L. Hang, et al., “Feature Statistics Guided Efficient Filter Pruning.” In IJCAI, 2020.
[33] X. Si, et al., “A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips.” In ISSCC, 2020.
[34] S. Han, et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network.” In ISCA, 2016.
[35] S. Zhang, et al., “Cambricon-X: An accelerator for sparse neural networks.” In MICRO, 2016.
[36] J. Lin, et al., ” Learning the sparsity for ReRAM: mapping and pruning sparse neural network for ReRAM based accelerator.” In ASP-DAC, 2019.
[37] H. Ji, et al., "ReCom: An efficient resistive accelerator for compressed deep neural networks." In DATE, 2018.
[38] P. Wang, et al., "SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory." In DAC, 2019.
[39] W. Wen, et al., “Learning Structured Sparsity in Deep Neural Network”. In NIPS, 2016.
[40] S. Srinivas, et al., “Data-free Parameter Pruning for Deep Neural Networks.” In BMVC,2015.
[41] Boyd, Stephen & Parikh, Neal & Chu, Eric & Peleato, Borja & Eckstein, Jonathan. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.”
[42] T. Zhang, et al., “A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers.” In ECCV, 2018.
[43] T. Zhang, et al., “StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs.” In arXiv, 2018.
[44] H. Wang, Q. Zhang, Y. Wang, L. Yu and H. Hu, "Structured Pruning for Efficient ConvNets via Incremental Regularization," In IJCNN, 2019.
[45] H. Yang, et al., “DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures.” In ICLR, 2020.
[46] S.-H. Sie, et al. "MARS: Multi-macro Architecture SRAM CIM Based Accelerator with Co-designed Compressed Neural Networks." arXiv:2010.12861 2020.
[47] T. -W. Chin, et al., "Towards Efficient Model Compression via Learned Global Ranking." In CVPR, 2020.
[48] A. Kusupati, et al., “Soft Threshold Weight Reparameterization for Learnable Sparsity.” In ICML,2020.
[49] T. Yang, et al., “Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning.” In CVPR,2017
[50] H. Yang, et al., “Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking.” In ICLR, 2019
[51] H. Yang, et al., ”ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model,” In CVPR, 2019
[52] J. Shi, et al., “SASL: Saliency-Adaptive Sparsity Learning for Neural Network Acceleration.” In arXiv,2020.

簡易檢索 / 詳目顯示

相關論文