研究生: |
王景鴻 Wang, Jing-Hong |
---|---|
論文名稱: |
一個基於記憶體內運算硬體設計的卷積神經網路部分和量化演算法 A Quantization Algorithm for Partial Sums of Convolution Neural Network Base on Computing-In-Memory Hardware |
指導教授: |
張孟凡
Chang, Meng-Fan |
口試委員: |
邱瀝毅
Chiou, Lih-Yih 呂仁碩 Liu, Ren-Shuo |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 36 |
中文關鍵詞: | 卷積神經網路 、部分和 、量化 、記憶體內運算 |
外文關鍵詞: | Convolution Neural Network, Partial Sum, Quantization, Computing in memory |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著人工智慧的浪潮愈演愈烈,深度學習演算法也愈趨成熟,因此深度神經網路(DNN)硬體加速器的研究也如雨後春筍般出現,而記憶體內運算便是其中一種。記憶體內運算的優勢是同時具有運算以及儲存的功能,在記憶體端先進行運算再傳出結果,減少移動的資料量,節省所需時間與能源。然而,作為深度學習硬體加速器的主要運算元件,它的運算輸出卻是不完美的:意即若原始運算結果需用8個位元表示,記憶體內運算單元只會輸出約4或5個位元的結果(經過了量化),並且存在著一些運算錯誤的可能。此結果實際上會作為卷積神經網路中的部分和(Partial sums),因此若此結果不完美將會影響神經網路的預測準確度。
為使記憶體內運算單元能夠順利運行深度神經網路,本篇論文提出了一個可以容忍非揮發性電阻式記憶體(ReRAM)的有限輸出精準度,且在輸出可能出現運算錯誤時,也能維持深度神經網路預測準確度的量化方法。論文中將以ResNet殘差網路為基礎,探討不同的部分和量化方法造成的預測準確度的差異,並得出一個最終解決方案。
通過實作一個2位元機活、3位元權重的ResNet殘差網路,在考慮記憶體運算單元有機率的運算錯誤的情況下,我們提出的部分和(Product sums)量化方法,相較於線性量化方法及余詩孟老師團隊提出的非線性的量化方法,分別得到了3.9%及1%的預測準確度提升。
With the rapid development of the artificial intelligence and more mature deep learning algorithm, as a result, the number of DNN accelerator research has grown tremendously and In-Memory Computing is one of them. The advantage of in-memory computing is that it has both computation and storage functions. It performs operations on the memory side and then transmits the results, reducing the amount of data moved, saving time and energy. However, as the main processing element of the deep learning hardware accelerator, its output is imperfect : That is, if the original operation result needs to be represented by 8 bits, the in-memory computing arithmetic unit will only output the result of about 4 or 5 bits (quantized), and there are some possibility of operation errors. This result actually acts as a product sums in the convolutional neural network, so if this result is not perfect, it will affect the prediction accuracy of the neural network.
In order to enable the in-memory computing unit to run deep neural networks smoothly, this paper proposes a quantization method that can tolerate finite output precision of non-volatile resistive memory (ReRAM) and its operation error. Based on the ResNet residual network, the paper will explore the differences in prediction accuracy caused by different partial sums quantization methods, and come up with a final solution.
By implementing a 2-bit activation, 3-bit weight ResNet, in the case of considering the operation error of the in-memory computing arithmetic unit, the proposed partial sums quantization method is 3.9% and 1% improved respectively compared with the linear quantization method and the nonlinear quantization method proposed by Yu Shimeng.
[1] LeCun, Y., Bengio, Y. & Hinton, G. “Deep learning.” Nature, 521, 436–444 (2015).
[2] Silver, D., et al. (2016a). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489.
[3] 林大貴. (2017, June, 15). “人工智慧、機器學習、深度學習介紹” from http://tensorflowkeras.blogspot.com/2017/06/blog-post_15.html
[4] G. Oppy, D.L. Dowe . “The Turing test” , E.N. Zalta (Ed.), Stanford Encyclopedia of Philosophy, Stanford University (2008), http://plato.stanford.edu/entries/turing-test/
[5] 周秉誼. (2016, September 20). “淺談Deep Learning 原理及運用”, from http://www.cc.ntu.edu.tw/chinese/epaper/0038/20160920_3805.html
[6] Jagreet Kaur Gill . (2018, October 21) “ Automatic Log Analysis using Deep Learning anf AI” from. https://www.xenonstack.com/blog/log-analytics-deep-machine-learning/?fbclid=IwAR2dOLlgJG6IH0MzlL-vl3GE9xqCOf_zxE_5c0H3xsWefBUHAE6_mal_TMM
[7] Yann LeCun et al., “gradient based learning applied to document recognition,” Proceedings of the IEEE, 86(11):2278–2324, November 1998a.
[8] Chen, W.-H. et al. A 65 nm 1 Mb nonvolatile computing-in-memory ReRAM macro with sub-16 ns multiply-and-accumulate for binary DNN AI edge processor. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 494–496 (2018).
[9] Xue, C.-X. et al. A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing time for CNN-based AI Edge Processors. In IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers388-390 (2019).
[10] Khwa, W.-S. et al. A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8 TOPS/W fully parallel product-sum operation for binary DNN edge processors. In IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers 496-497 (2018).
[11] S. K. Gonugondla et al. A 42pJ/Decision 3.12TOPS/W Robust In-Memory Machine Learning Classifier with On-Chip Training. In IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers 490-491 (2018).
[12] X. Si et al. A Twin-8T SRAM Computation-In-Memory Macro for Multiple-bits CNN-Based Machine Learning. In IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers 396-397 (2019).
[13] Biswas A. Conv-RAM: An Energy-Efficient SRAM with Embedded Convolution Computation for Low-Power CNN-Based Machine Learning Applications. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 488–489 (2018).
[14] Yang J. et al. Sandwich-RAM: An Energy-Efficient In-Memory BWN Architecture with Pulse-Width Modulation. In IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers 394-395 (2019).
[15] D. Bankman et al. An always-on 3.8 μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28-nm CMOS. In IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers 222-224 (2018).
[16] M. Kang et al. A multi-functional in-memory inference processor using a standard 6T SRAM array. IEEE Journal of Solid-State Circuits. 53, 642-655 (2018).
[17] B. Chen et al., “Efficient in-memory computing architecture based on crossbar arrays,” IEEE International Electron Devices Meeting, pp. 17.5.1-17.5.4, 2015.
[18] S. Li et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” ACM/EDAC/IEEE Design Automation Conference, pp. 1-6, 2016.
[19] Q. Dong et al., “A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology,” IEEE Symposium on VLSI Circuits, pp. C160-C161, 2017.
[20] J. Zhang, Z. Wang and N. Verma, "In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array," in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017.
[21] Ping Chi, et al., “A Novel Processing-inmemory Architecture for Neural Network Computation in ReRAMbased Main Memory”. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016.
[22] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou. “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients”. arXiv preprint arXiv:1606.06160, 2016
[23] Liu R. et al. “Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks” In Proceedings of the 55th Annual Design Automation Conference 21 (ACM, 2018).
[24] Pin-Yi, et al, “A Neuromorphic Computing System for Bitwise Neural Networks Based on ReRAM Synaptic Array” In 2018 IEEE Biomedical Circuits and Systems Conference (BioCAS)
[25] He, K. et al. “Deep residual learning for image recognition.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (2016).
[26] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition”. In ICLR, 2015
[27] Krizhevsky, A. Learning Multiple Layers of Features From Tiny Images. Ch. 3, https://www.cs.toronto.edu/~kriz/cifar.html (2009).
[28] Lecun, Y. et al. “Gradient-based learning applied to document recognition”. Proc. 86, 2278–2324 (1998).