簡易檢索 / 詳目顯示

研究生: 呂宗穎
Lu, Tsung-Yin
論文名稱: 基於歸一化的對數量化方法用於卷積神經網絡微型化
A Normalization-based Logarithmic Quantization Approach for Convolution Neural Network Miniaturization
指導教授: 蔡仁松
Tsay, Ren-Song
口試委員: 胡敏君
Hu, Min-Chun
麥偉基
Mak, Wai-Kei
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 42
中文關鍵詞: 卷積神經網絡有效執行低位寬
外文關鍵詞: efficient inference, number representation, low bit-width
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本文中,我們提出了一種有效的基於歸一化的對數量化方法,以將預訓練的CNN模型轉換為非常低的位寬表示形式,同時保持幾乎相同的精度。關鍵思想是通過具有統一的,可重複使用的CNN卷積硬件單元,將每一層的數據值標準化到相同的範圍,並最大程度地減少資源需求。在一些流行的CNN模型(AlexNet,VGG16和ResNet-18 / 34)上,已對所提出的方法進行了廣泛的評估。 結果表明,我們的方法實現了模型大小和計算資源的顯著減少,並且精度損失最小。 而且,與現有的量化方案相比,我們提出的方法可以在相同的位寬約束下實現更高的精度,同時需要最少的設計工作。我們的設計具有非常低的複雜度,高功率性能,並且非常適合邊緣計算。


    In this paper, we propose an effective normalization-based logarithmic quantization method to convert pre-trained CNN models to very low bit-width representations while maintaining almost the same accuracy. The key idea is to normalize the data values of each layer to the same range and minimize resource requirement by having a unified, reusable CNN convolution hardware unit. The proposed approach has been extensively evaluated on a few popular CNN models (AlexNet, VGG16, and ResNet-18/34). The results show that our method achieves significant reductions in both the model size and the computation resource with minimal accuracy loss. Also, in comparison to existing quantization schemes, our proposed approach can achieve higher accuracy under same bit-width constraint while requiring minimal design effort. Our design has very low complexity, high power performance and is excellent for edge computing purposes.

    I. Introduction......................................5 II. Related work......................................12 A. Network Pruning Approach..............................12 B. Network Quantization Approach.........................13 III. Method...........................................17 A. Notations.............................................17 B. Normalization-based Logarithmic Quantization Scheme...18 C. Activation Quantitation...............................25 D. Training Algorithm with Quantization Representation...29 IV. Experiment........................................31 A. Quantization Error Metric with Accuracy Loss..........31 B. Retraining for Model Accuracy.........................33 C. Image Classification Result...........................34 D. Compare with Other Quantization Methods...............36 E. Object detection result...............................37 V. Conclusion.........................................38 VI. References.........................................39

    [1] Luo, Jian-Hao, Jianxin Wu, and Weiyao Lin. "Thinet: A filter level pruning method for deep neural network compression." Proceedings of the IEEE international conference on computer vision. 2017.
    [2] Sahin, Suhap, Yasar Becerikli, and Suleyman Yazici. "Neural network implementation in hardware using FPGAs." International Conference on Neural Information Processing. Springer, Berlin, Heidelberg, 2006.
    [3] Gong, Yunchao, et al. "Compressing deep convolutional networks using vector quantization." arXiv preprint arXiv:1412.6115 (2014).
    [4] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
    [5] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
    [6] Zhu, Chenzhuo, et al. "Trained ternary quantization." arXiv preprint arXiv:1612.01064 (2016).
    [7] Rastegari, Mohammad, et al. "Xnor-net: Imagenet classification using binary convolutional neural networks." European Conference on Computer Vision. Springer, Cham, 2016.
    [8] Gysel, Philipp, Mohammad Motamedi, and Soheil Ghiasi. "Hardware-oriented approximation of convolutional neural networks." arXiv preprint arXiv:1604.03168 (2016).
    [9] Zhou, Shuchang, et al. "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients." arXiv preprint arXiv:1606.06160 (2016).
    [10] Courbariaux, Matthieu, et al. "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1." arXiv preprint arXiv:1602.02830 (2016).
    [11] Gysel, Philipp, et al. "Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks." IEEE Transactions on Neural Networks and Learning Systems99 (2018): 1-6.
    [12] He, Yihui, Xiangyu Zhang, and Jian Sun. "Channel pruning for accelerating very deep neural networks." Proceedings of the IEEE International Conference on Computer Vision. 2017.
    [13] Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015).
    [14] Ng, Andrew Y. "Feature selection, L 1 vs. L 2 regularization, and rotational invariance." Proceedings of the twenty-first international conference on Machine learning. ACM, 2004.
    [15] Lin, Darryl, Sachin Talathi, and Sreekanth Annapureddy. "Fixed point quantization of deep convolutional networks." International Conference on Machine Learning. 2016.
    [16] Denil, Misha, et al. "Predicting parameters in deep learning." Advances in neural information processing systems. 2013.
    [17] Zhou, Aojun, et al. "Incremental network quantization: Towards lossless cnns with low-precision weights." arXiv preprint arXiv:1702.03044 (2017).
    [18] Dettmers, Tim. "8-bit approximations for parallelism in deep learning." arXiv preprint arXiv:1511.04561 (2015).
    [19] LeCun, Yann, John S. Denker, and Sara A. Solla. "Optimal brain damage." Advances in neural information processing systems. 1990.
    [20] Courbariaux, Matthieu, Yoshua Bengio, and Jean-Pierre David. "Training deep neural networks with low precision multiplications." arXiv preprint arXiv:1412.7024 (2014).
    [21] Rhu, Minsoo, et al. "vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design." The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 2016.
    [22] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).
    [23] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." Cognitive modeling 5.3 (1988): 1.
    [24] Guo, Kaiyuan, et al. "[DL] A Survey of FPGA-based Neural Network Inference Accelerators." ACM Transactions on Reconfigurable Technology and Systems (TRETS) 12.1 (2019): 2.
    [25] Miyashita, Daisuke, Edward H. Lee, and Boris Murmann. "Convolutional neural networks using logarithmic data representation." arXiv preprint arXiv:1603.01025 (2016).
    [26] Hubara, Itay, et al. "Quantized neural networks: Training neural networks with low precision weights and activations." The Journal of Machine Learning Research 18.1 (2017): 6869-6898.
    [27] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
    [28] Courbariaux, Matthieu, et al. "Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1." arXiv preprint arXiv:1602.02830 (2016).
    [29] Lee, Edward H., et al. "LogNet: Energy-efficient neural networks using logarithmic computation." 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017.
    [30] Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." European conference on computer vision. Springer, Cham, 2014.
    [31] Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

    QR CODE