研究生: |
黃維新 Huang, Wei-Hsing |
---|---|
論文名稱: |
應用於記憶體內運算友善神經網路之動態梯度修正演算法 Dynamic Gradient Calibration Algorithm for Computing-in-Memory Friendly Neural Network |
指導教授: |
張孟凡
Chang, Meng-Fan |
口試委員: |
邱瀝毅
Chiou, Lih-Yih 呂仁碩 Liu, Ren-Shuo |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 35 |
中文關鍵詞: | 深度學習 、記憶體 、軟硬協同設計 、梯度修正 |
外文關鍵詞: | Hardware software co-design, Hardware software codesign, Gradient Calibration |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著人工智能與深度學習神經網路的快速發展與廣泛的應用,資料的搬移、傳輸呈現爆炸性的成長。然而,在傳統處理器為范紐曼(Von Neumann)架構中,能量大多消耗在搬動資料的過程,我們稱之為范紐曼瓶頸。因此,為了解決這項瓶頸,記憶體內運算誕生了。
傳統記憶體只有記憶的功能,而記憶體內運算除了記憶的功能還有運算的功能,因此不需要把要運算的資料全部搬移出來運算,而是直接把要運算的資料在記憶體內就運算完,直接傳送運算結果出來,因此減少搬動資料的過程。然而記憶體內運算的本質是用類比訊號做運算,隨著製程的進展類比運算將遇到種種挑戰。另外,深度學習在行動裝置日益增長的應用,能耗的限制也是必須探討的問題。
本論文為記憶體內運算設計了專用的卷積神經網路,這項演算法採用1)階層式位元運算量化演算法; 2)動態梯度修正演算法。 此演算法可以使記憶體內運算單元節省22%~25%的能量耗損,並且不會造成正確率下降太多。
With the rapid development and wide application of artificial intelligence and deep learning neural networks, the data transfer and transmission have experienced explosive growth. However, in the traditional processor for the Von Neumann architecture, energy is mostly consumed in the process of moving data, which we call the Von Neumann bottleneck. Therefore, in order to solve this bottleneck, computing-in-memory was born.
The traditional memory only has the function of memory, and the computing-in-memory can have both the operation and the memory function. Therefore, it is not necessary to move all the data to be calculated out, but the data to be calculated is directly calculated in the memory. The result of the operation is directly transmitted, thus reducing the process of moving the data. However, the essence of in-memory operations is to use analog signals to perform operations. However, the essence of in-memory operations is to use analog signals to perform operations. As the manufacturing process progresses, analogy computing will encounter various challenges. In addition, deep learning in the growing use of mobile devices, energy consumption limitations are also must be explored.
This thesis designs a customized convolutional neural network for computing in memory. This algorithm uses 1) Hierarchical Bit Wise Quantization Algorithm; 2) Dynamic Gradient Calibration Algorithm. This algorithm can save the computing in memory 22%~25% energy, and will not drop too much accuracy.
[1] Alex Krizhevsky, et al., “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in neural information processing systems, pp. 1097–1105, 2012
[2] Karen Simonyan, et al., “Very Deep Convolutional Networks for Large-Scale Image Recognition,” ICLR, 2015.
[3] Kaiming He, et al., “Deep residual learning for image recognition,” arXiv preprint arXiv:1512.03385.
[4] Yann LeCun et al., “Backpropagation applied to handwritten zip code recognition,” Neural Computation, 1(4):541–551, 1989.
[5] Shuchang Zhou,et al., “DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients,” arXiv preprint arXiv:1606.06160, 2016.
[6] Bengio et al., “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1308.3432, 2013.
[7] B. Chen et al., “Efficient in-memory computing architecture based on crossbar arrays,” IEEE International Electron Devices Meeting, pp. 17.5.1-17.5.4, 2015.
[8] S. Li et al., “Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories,” ACM/EDAC/IEEE Design Automation Conference, pp. 1-6, 2016.
[9] Q. Dong et al., “A 0.3V VDDmin 4+2T SRAM for searching and in-memory computing using 55nm DDC technology,” IEEE Symposium on VLSI Circuits, pp. C160-C161, 2017.
[10] J. Zhang, Z. Wang and N. Verma, "In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array," IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 915-924, April 2017.
[11] A. Biswas, et al., “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications” IEEE International Solid-State Circuits Conference, pp. 488-490, 2018
[12] S. K. Gonugondla, et al., “A 42pJ/decision 3.12TOPS/W robust in-memory machine learning classifier with on-chip training” IEEE International Solid-State Circuits Conference, pp. 490-492, 2018
[13] M. Motomura, et al., "BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W," IEEE Journal of Solid-State Circuits, vol. 53, no. 4, pp. 983-994, April 2018.
[14] W. Khwa et al., “A 65nm 4Kb algorithm-dependent computing-in-memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel product-sum operation for binary DNN edge processors” IEEE International Solid-State Circuits Conference, pp. 496-498, 2018
[15] Wei-Hao Chen et al., “A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors,” IEEE International Solid-State Circuits Conference, pp. 494-496, 2018
[16] Pin-Yi Li et al., “A Neuromorphic Computing System for Bitwise Neural Networks Based on ReRAM Synaptic Array,”IEEE Biomedical Circuits and Systems Conference, 2018
[17] Rui Liu et al., “Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks,” ACM/ESDA/IEEE Design Automation Conference, pp. 1-6, 2018.
[18] Xin Si et al., “24.5 A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning,” IEEE International Solid-State Circuits Conference, pp. 396-398, 2019