簡易檢索 / 詳目顯示

研究生: 陳奕仁
Chen, Yi-Ren
論文名稱: 使用S函數軟性四捨五入設計對純整數計算推論硬體友善的量化演算法
Using Sigmoid Soft Round (SSR) to Design a Hardware-Friendly Quantization Algorithm for Efficient Integer-Only Inference
指導教授: 鄭桂忠
Tang, Kea-Tiong
口試委員: 黃朝宗
Huang, Chao-Tsung
盧峙丞
Lu, Chih-Cheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 44
中文關鍵詞: 深度學習模型壓縮量化
外文關鍵詞: Deep Learning, Model Compression, Quantization
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度神經網路能夠被應用在許多的領域,然而其本身需要的大量儲存空間與計算資源,成為了在邊緣裝置上實現的瓶頸。為了減低深度神經網路在邊緣裝置上的資源消耗,模型壓縮演算法的研究因此開始興盛,而其中的量化演算法不但可以降低記憶體的需求,同時還能簡化硬體的運算複雜度,因此成為許多學者研究的目標。
    本研究中,為了使深度神經網路在硬體上使用較少的資源但同時又保有高正確率,提出了一個量化方法,將網路推論中所需要的參數全部都量化。首先透過探討一般量化函數的梯度的問題,設計出有考量到量化的梯度且可以微分的數值簡化函數—S函數軟性四捨五入。其次為了加速網路在終端裝置的網路推論速度設計出一個可以應用到多種的量化方法並有更快的訓練時間的與批量標準化的參數融合的權重量化器。此外在保持網路高性能的前題下,使用更少的硬體資源設計的激勵函數量化器。
    本研究通過將所提出的方法應用在VGG7跟ResNet20的兩個小網路以及VGG16跟ResNet18的兩個大網路上,並使用CIFAR-10圖像分類來驗證方法的效果。結果說明使用本研究所提出的權重量化器跟激勵函數量化器,跟使用全精度浮點數的網路相比,在權重與激勵函數位元數皆為8的情形下,其正確率差別最大在ResNet18上差了1.01%,但在VGG7上效果比原本好了0.36%,另外再加上所提出的S函數軟性四捨五入,其正確率差別最大在ResNet20上只降0.32%,但在VGG7上效果比原本好了0.53%。


    Deep neural networks (DNNs) can be applied in various fields. However the large storage overheads and the substantial computation cost of DNNs have become the bottleneck in hardware accelerators. To reduce the resource consumption of DNNs on edge devices, many researchers became interested in model compression algorithm. Among the model compression algorithm, quantization algorithm can not only reduce the demand for storage but also simplify the computational complexity.
    In this work, we propose quantization methods to quantize all parameters re-quired in network inference which make DNNs use less hardware resources while maintaining a high performance. First, a quantization function which consider the gra-dient of the quantization function and can be differentiable is proposed and called Sigmoid Soft Round (SSR). Furthermore, a weight quantizer with batch normalization fusion which can be applied to a variety of quantization methods and has a less train-ing time is proposed for accelerating the inference speed of DNNs on hardware. In addition, an activation quantizer which uses less hardware resources and keep the high performance of DNNs is proposed.
    We evaluated the performance of the proposed quantization methods on CIFAR-10 image classification tasks by the two small networks of VGG7 and Res-Net20 and the two large networks of VGG16 and ResNet18. The results show that when using the proposed weight and activation quantizer with the bit-width of 8, the accuracy drops the most in ResNet18 by 1.01%, but the accuracy increases the most in VGG7 by 0.36% comparing to the full precision models. Coupled with SSR, the ac-curacy drops the most in ResNet20 by 0.32%, but the accuracy increases the most in VGG7 by 0.53%.

    摘要 i ABSTRACT ii 目錄 iii 圖目錄 v 表目錄 vi 第 1 章 緒論 1 1.1 研究背景 1 1.2 研究動機與目的 4 1.3 章節簡介 6 第 2 章 文獻回顧 7 2.1 模型壓縮演算法 7 2.2 量化神經網路 9 2.2.1 網路單元量化 9 2.2.2 量化方法 10 2.2.3 激勵函數種類 14 2.3 權重與批量標準化 15 第 3 章 對純整數推論硬體友善的量化演算法 19 3.1 S函數軟性四捨五入 19 3.2 將批量標準化的參數融合的權重量化器 23 3.3 激勵函數量化器 25 第 4 章 實驗結果 27 4.1 實驗設置 27 4.1.1 實驗數據集及前處理 27 4.1.2 網路架構及超參數設置 28 4.1.3 軟硬體環境設置 29 4.2 S函數軟性四捨五入結果比較 30 4.3 將批量標準化的參數融合的權重量化結果比較 32 4.4 激勵函數量化結果比較 35 4.5 將批量標準化的參數融合的權重與激勵函數量化結果比較 37 4.6 全部方法結果比較 39 第 5 章 結論與未來發展 41 參考文獻 42

    [1] Y. LeCun, and et al., “Deep learning.” In Nature,521(7553): 436–444, 2015.
    [2] M. Luong, and et al., “Effective approaches to attention-based neural machine translation.” In arXiv, 2015.
    [3] A. Krizhevsky, and et al., “Imagenet classification with deep convolutional neural networks.” In NIPS, 2012.
    [4] V. Mnih, and et al., “Human-level control through deep reinforcement learning.” In Nature, 518(7540): 529, 2015.
    [5] O. Russakovsky, and et al., “Imagenet large scale visual recognition challenge.” In International Journal of Computer Vision, 115.3: 211-252, 2015.
    [6] K. He, and et al., “Deep Residual Learning for Image Recognition.” In CVPR, 2016.
    [7] Y.-H. Chen, and et al., “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.” In JSSC, ISSCC Special Issue, Vol. 52, No. 1, pp. 127-138, 2017.
    [8] K. Ueyoshi, and et al., “QUEST: A 7.49 TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS.” In ISSCC, 2018.
    [9] Y. Zhe, and et al., “Sticker: A 0.41-62.1 TOPS/W 8Bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers.” In VLSI, 2018.
    [10] S.-H. Sie, and et al., “MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks.” In arXiv:2010.12861, 2020.
    [11] S. Han, and et al., “Learning both Weights and Connections for Efficient Neural Networks.” In NIPS,2015.
    [12] W. Wen, and et al., “Learning Structured Sparsity in Deep Neural Network”. In NIPS, 2016.
    [13] J. Luo, and et al., “ThiNet-A Filter Level Pruning Method for Deep Neural Network Compression.” In ICCV, 2017.
    [14] M. Courbariaux, and et al., “Binaryconnect: Training deep neural networks with binary weights during propagations.” In NIPS, 2015.
    [15] S. Zhou, and et al., “Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.” In arXiv:1606.06160, 2016.
    [16] Z. Cai, and et al., “Deep learning with low precision by half-wave gaussian quantization.” In CVPR, 2017.
    [17] X. Lin, and et al., “Towards Accurate Binary Convolutional Neural Network.” In NIPS, 2017.
    [18] D. Miyashita, and et al., “Convolutional neural networks using logarithmic data representation.” In arXiv, 2016.
    [19] A. Zhou, and et al. “Incremental network quantization: Towards lossless cnns with low-precision weights.” In ICLR, 2017.
    [20] M. Rastegar, and et al. “Xnor-net: Imagenet classification using binary convolutional neural networks.” In ECCV, 2016.
    [21] Y. Dong, and et al. “Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization.” In BMVC, 2017.
    [22] F. Li and B. Liu. “Ternary weight networks.” In NIPS Workshop on EMDNN, 2016.
    [23] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015.
    [24] G. Hinton, and et al., “Distilling the Knowledge in a Neural Network.” In arXiv:1503.02531v1, 2015.
    [25] M. Jaderberg, and et al., “Speeding up convolutional neural networks with low rank expansions.” In arXiv, 2014.
    [26] X. Zhang, and et al., “Accelerating very deep convolutional networks for classification and detection.” In TPAMI, 38(10):1943-1955, 2015.
    [27] E. Park, and et al. “Weighted-Entropy-based Quantization for Deep Neural Networks.” In CVPR, 2017.
    [28] P. Wang, and et al., “Two-step quantization for low-bit Neural Networks. In CVPR, 2018.
    [29] Q. Hu, and et al. “From hashing to CNNs: training binary weight networks via hashing.” In AAAI, 2018.
    [30] C. Zhu, and et al. “Trained Ternary Quantization.” In ICLR, 2017.
    [31] M. Kim and P. Smaragdis. “Bitwise neural networks.” In arXiv:1601.06071, 2016.
    [32] Y. Bengio, and et al., “Estimating or propagating gradients through stochastic neurons for conditional computation.” In arXiv:1308.3432, 2013.
    [33] S. Jung, C. Son, et al., “Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss.” In CVPR, 2019.
    [34] F. Tung and G. Mori, “CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization.” In CVPR, 2019.
    [35] J. Kim, et al., “Position-based Scaled Gradient for Model Quantization and Pruning.” In NeurIPS, 2020.
    [36] Y. Li, X. Dong, and et al., “Additive Powers-Of-Two Quantization:An Efficient Non-Uniform Discretization For Neural Networks.” In ICLR, 2020.
    [37] R. Gong, and et al., “Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks.” In ICCV, 2020.
    [38] C. Louizos, and et al., “Relaxed Quantization For Discretized Neural Networks.” In ICLR, 2019.
    [39] J. Choi, and et al., “PACT: Parameterized Clipping Activation for Quantized Neural Networks.” In arXiv:1805.06085, 2018.
    [40] S. Ioffe and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” In ICML, 2015.
    [41] B. Jacob, and et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference.” In CVPR, 2018.
    [42] X. Zhao, and et al., “Linear Symmetric Quantization Of Neural Networks For Low-Precision Integer Hardware.” In ICLR, 2020.
    [43] P. Adam, and et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library”, 2019.

    QR CODE