研究生: |
魏瑋辰 Wei, Wei-Chen |
---|---|
論文名稱: |
IMCAQ:對應記憶體內運算硬體限制的一個基於重新參數化的深度神經網路量化方法 IMCAQ:A Deep Neural Network Quantization Training Method Based on Reparameterization Corresponding to Hardware Limitation of In-Memory Computing |
指導教授: |
鄭桂忠
Tang, Kea-Tiong |
口試委員: |
黃朝宗
Huang, Chao-Tsung 呂仁碩 Liu, Ren-Shuo |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 49 |
中文關鍵詞: | 量化 、壓縮 、記憶體內運算 、深度學習 |
外文關鍵詞: | quantization, compression, in-memory computing, deep learning |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
記憶體內運算電路在涉及大量平行運算的仿神經計算與實現高能源效率方面顯示出極好的潛力。記憶體內運算特別適用於需要執行大量矩陣向量乘法的卷積神經網路。
在本研究中,考慮到非揮發性記憶體內運算的硬體侷限性,提出了一個實現壓縮卷積神經網路的量化方法-IMCQ。通過考慮非揮發性記憶體內運算的輸入、記憶體單元與感測放大器的限制,更具體地說,對激勵函數的位元數、網路權重的位元數、矩陣向量乘法(MVM)值的位元數與記憶體的字元線數量進行控管,本論文可以在網路推論過程模擬非揮發性記憶體內運算,使訓練後的神經網路能佈署至記憶體內運算電路。此外,本論文更進一步引入基於Concrete分佈的量化方法至IMCQ中的MVM量化器,用來優化在非揮發性記憶體內運算的變異所引起的最小讀取邊界問題,將原先提出的量化方法更適合用於記憶體內運算電路,本研究將此方法稱作IMCAQ。
本研究通過所提出的IMCAQ應用在卷積層權重總和小於1Mb的LeNet與VGG-Net上,分別進行MNIST與CIFAR-10圖像分類來驗證方法的成效。結果表明將權重與激勵函數量化到2-bit、MVM值量化到4-bit、記憶體字元線開啟9條的條件下,與傳統的基於直通估計器的IMCQ相比,在CIFAR-10上的準確度能提高2.92%。而在權重與激勵函數量化到2-bit條件下,MVM值量化到4-bit與全精度的MVM值相比,MNIST和CIFAR-10數據集的準確性僅下降了0.05%和1.31%。實驗結果表明,所提出的方法對於欲製造及設計用於非揮發性記憶體內運算的晶片系統是有效且有用的。
In-memory computing (IMC) exhibits excellent potential for AI accelerator in-volving massive parallel computations and for achieving high energy efficiency. IMC is especially suitable for convolutional neural networks (CNNs), which need to perform large amounts of matrix-vector multiplications (MVMs).
In this work, we propose a quantization method—“IMCQ”, with consideration of the hardware limitations of nonvolatile IMC (nvIMC) to implement compact CNNs. We simulate nvIMC for parallel computation of multilevel MVMs by considering the con-straints of the sense amplifier in nvIMC—more specifically, the need to manage the fol-lowing: the resolution of activations, weights, MVM values and the number of word line in memory cell. Furthermore, we introduce a Concrete distribution based quantiza-tion method to MVM value quantizer in IMCQ, and this method can optimize the small read margin problem caused by variations in nvIMC. We call the advanced meth-od—“IMCAQ”.
We evaluated the performance of proposed quantization methods on the MNIST and CIFAR-10 image classification tasks by LeNet and VGG-Net, respectively. The results showed 2.92% CIFAR-10 accuracy improves compared with traditional straight-through estimator based quantization method under the conditions which are that weights and activations are quantized to 2 bits, MVM values are quantized to 4-bits, and the 9 opened WLs. The results also showed 0.05% MNIST accuracy and 1.31% CIFAR-10 accuracy drop when MVM values are quantized to 4 bits compared with full-precision MVM values. The experimental results indicate that the proposed method is practical and useful for fabricating real chips intended for use in nvIMC platforms.
[1] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553): 436–444, 2015.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
[3] Volodymyr Mnih, et al. Human-level control through deep reinforcement learning. Nature, 518(7540): 529, 2015.
[4] Dario Amodei, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. In arXiv, 2015.
[5] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. In arXiv, 2015.
[6] Olga Russakovsky, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115.3: 211-252, 2015.
[7] Jie Hu, Li Shen, Samuel Albanie, Gang Sun, and Enhua Wu. Squeeze-and-Excitation Networks. In arXiv, 2017.
[8] Geoffrey Hinton and Ruslan Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786): 504-507, 2006.
[9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[10] Xu X W, Ding Y K, Hu S X, et al. Scaling for edge inference of deep neural networks. Nature Electronics, 1: 216–222, 2018.
[11] X. Sun et al., “Fully Parallel RRAM Synaptic Array for Implementing Binary Neural Network with (+1, -1) Weights and (+1, 0) Neurons”, in ACM/IEEE ASP-DAC, 2018.
[12] W. H. Chen et al., "A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors", IEEE ISSCC Dig. Tech. Papers, pp. 494-496, Feb. 2018.
[13] C.-X. Xue et al., "24.1 a 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors", Proc. IEEE Int. Solid-State Circuits Conf., pp. 388-390., 2019.
[14] C. J. Maddison, A. Mnih, Y. W. Teh, "The concrete distribution: A continuous relaxation of discrete random variables", In ICLR, 2017.
[15] E. Jang, S. Gu, B. Poole, "Categorial reparameterization with gumbel-softmax", In ICLR, 2017.
[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998.
[17] Alex Krizhevsky, Learning multiple layers of features from tiny images, 2009.
[18] G. Hinton, O. Vinyals, J. Dean, Distilling the Knowledge in a Neural Network, 2015.
[19] Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. In arXiv, 2014.
[20] Xiangyu Zhang, et al. Accelerating very deep convolutional networks for classification and detection. IEEE Trans Patt Anal Mach Intell, 38(10):1943-1955, 2015.
[21] Song Han, et al. Learning both Weights and Connections for Efficient Neural Networks. In NIPS, 2015.
[22] Wei Wen, et al. Learning Structured Sparsity in Deep Neural Network. In NIPS, 2016.
[23] ThiNet-A Filter Level Pruning Method for Deep Neural Network Compression. In ICCV, 2017.
[24] Yoshua Bengio, Nicholas Léonard and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. In CoRR, 2013.
[25] Matthieu Courbariaux, Yoshua Bengio and Jean-Pierre David. Binaryconnect: Training deep neural networks with binary weights during propagations. In NIPS, 2015.
[26] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. In arXiv, 2016.
[27] Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. Deep learning with low precision by half-wave gaussian quantization. In CVPR, 2017.
[28] Xiaofan Lin, Cong Zhao, Wei Pan. Towards Accurate Binary Convolutional Neural Network. In NIPS, 2017.
[29] Daisuke Miyashita, Edward H. Lee, and Boris Murmann. Convolutional neural networks using logarithmic data representation. In arXiv, 2016.
[30] Aojun Zhou, et al. Incremental network quantization: Towards lossless cnns with low-precision weights. In ICLR, 2017.
[31] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon and Ali Farhadi. Xnor-net: Imagenet classification using binary convolutional neural networks. In ECCV, 2016.
[32] Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. Deep learning with low precision by half-wave gaussian quantization. In CVPR, 2017.
[33] Yingpeng Dong, Renkun Ni, Jianguo Li, Yurong Chen, Jun Zhu, and Hang Su. Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization. In BMVC, 2017.
[34] Fengfu Li and Bin Liu. Ternary weight networks. In NIPS Workshop on EMDNN, 2016.
[35] Song Han, Huizi Mao, William Dally. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
[36] Eunhyeok Park, Junwhan Ahn, and Sungjoo Yoo. Weighted-Entropy-based Quantization for Deep Neural Networks. In CVPR, 2017.
[37] Peisong Wang, et al. Two-step quantization for low-bit Neural Networks. In CVPR, 2018.
[38] Qinghao Hu, Peisong Wang, and Jian Cheng. From hashing to CNNs: training binary weight networks via hashing. In AAAI, 2018.
[39] Chenzhuo Zhu, Song Han, Huizi Mao, and William Dally. Trained Ternary Quantization. In ICLR, 2017.
[40] Minje Kim and Paris Smaragdis. Bitwise neural networks. arXiv preprint. arXiv:1601.06071, 2016.
[41] Pinyi Li, et al. A Neuromorphic Computing System for Bitwise Neural Networks Based on ReRAM Synaptic Array. IEEE Biomedical Circuits and Systems Conference, 2018.
[42] K. Simonyan, A. Zisserman, "Very deep convolutional networks for large-scale image recognition", arXiv:1409.1556, 2014.
[43] Martín Abadi, et al. TensorFlow: Large-scale machine learning on heterogeneous systems, Software available from tensorflow.org., 2015.
[44] Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, and Max Welling, "Relaxed quantization for discretized neural networks," arXiv preprint arXiv:1810.01875, 2018.
[45] Qing Yang, et al., "A quantized training method to enhance accuracy of reram-based neuromorphic systems" in ISCAS, 2018.