簡易檢索 / 詳目顯示

研究生: 金緒勳
Chin, Hsu-Hsun
論文名稱: 一種用於邊緣卷積神經網路應用的高性能自適應量化方法
A High-Performance Adaptive Quantization Approach for Edge CNN Applications
指導教授: 蔡仁松
Tsay, Ren-Song
口試委員: 胡敏君
Hu, Min-Chun
葉人傑
Yeh, Jen-Chieh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 25
中文關鍵詞: 卷積神經網絡量化低位寬
外文關鍵詞: Convolutional neural network, quantization, low bit-width
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 最近的卷積神經網絡 (CNN) 發展繼續提高最先進的模型準確性各種應用。然而,提高的準確性來自大量內存帶寬和存儲的成本要求和要求的計算資源。雖然過去的量化方法有效地減少了邊緣設備的部署成本,它受到顯著的影響處理偏向激活時的信息丟失當代 CNN。因此,在本文中,我們引入了一個自適應高性能量化方法來解決通過動態調整縮放比例激活的問題和基於任務損失的轉移因素。我們提出的方法已在圖像分類模型上得到廣泛評估(ResNet-18/34/50、MobileNet-V2、EfficientNet-B0)與ImageNet 數據集,對象檢測模型 (YOLO-V4) 與COCO 數據集,以及帶有 PTB 數據集的語言模型。這結果表明,我們的 4 位整數 (INT4) 量化模型達到比最先進的 4 位模型更好的精度,和在某些情況下,甚至超過了黃金全精度模型。這最終設計已成功部署到極資源受限的邊緣設備,適用於許多實際應用。


    Recent convolutional neural network (CNN) development continues to advance the state-of-the-art model accuracy for various applications. However, the enhanced accuracy comes at the cost of substantial memory bandwidth and storage requirements and demanding computational resources. Although in the past the quantization methods have effectively reduced the deployment cost for edge devices, it suffers from significant information loss when processing the biased activations of contemporary CNNs. In this paper, we hence introduce an adaptive high-performance quantization method to resolve the issue of biased activation by dynamically adjusting the scaling and shifting factors based on the task loss. Our proposed method has been extensively evaluated on image classification models (ResNet-18/34/50, MobileNet-V2, EfficientNet-B0) with ImageNet dataset, object detection model (YOLO-V4) with COCO dataset, and language models with PTB dataset. The results show that our 4-bit integer (INT4) quantization models achieve better accuracy than the state-of-the-art 4-bit models, and in some cases, even surpass the golden full-precision models. The final designs have been successfully deployed onto extremely resource-constrained edge devices for many practical applications.

    I. INTRODUCTION 5 II. RELATED WORK 10 A. Architecture Optimization Methods 10 B. Network Pruning Methods 10 C. Network Quantization Methods 11 III. PRELIMINARY 12 IV. THE PROPOSED METHOD 13 A. Forward Propagation 13 B. Backward Propagation 15 V. EXPERIMENT 18 A. Image Classification 18 B. Object Detection 19 C. Language Model 20 VI. CONCLUSION 22

    [1] Krizhevsky, A., Sutskever, I. and Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25 (2012), 1097-1105.
    [2] Szegedy, C., Toshev, A. and Erhan, D. Deep neural networks for object detection (2013).
    [3] Badrinarayanan, V., Kendall, A. and Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39, 12 (2017), 2481-2495.
    [4] Kamilaris, A. and Prenafeta-Boldú, F. X. A review of the use of convolutional neural networks in agriculture. The Journal of Agricultural Science, 156, 3 (2018), 312-322.
    [5] Kabir, S., Patidar, S., Xia, X., Liang, Q., Neal, J. and Pender, G. A deep convolutional neural network model for rapid prediction of fluvial flood inundation. Journal of Hydrology, 590 (2020), 125481.
    [6] De Oliveira, M. A., Monteiro, A. V. and Vieira Filho, J. A new structural health monitoring strategy based on PZT sensors and convolutional neural network. Sensors, 18, 9 (2018), 2955.
    [7] He, K., Zhang, X., Ren, S. and Sun, J. Deep residual learning for image recognition. City, 2016.
    [8] Han, S., Mao, H. and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
    [9] Hassibi, B. and Stork, D. G. Second order derivatives for network pruning: Optimal brain surgeon. Morgan Kaufmann, 1993.
    [10] Li, H., Kadav, A., Durdanovic, I., Samet, H. and Graf, H. P. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016).
    [11] Zhang, X., Zhou, X., Lin, M. and Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. City, 2018.
    [12] Hu, J., Shen, L. and Sun, G. Squeeze-and-excitation networks. City, 2018.
    [13] Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R. and Vasudevan, V. Searching for mobilenetv3. City, 2019.
    [14] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. City, 2018.
    [15] Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A. and Le, Q. V. Mnasnet: Platform-aware neural architecture search for mobile. City, 2019.
    [16] Hennessy, J. L. and Patterson, D. A. Computer architecture: a quantitative approach. Elsevier, 2011.
    [17] Choi, J., Wang, Z., Venkataramani, S., Chuang, P. I.-J., Srinivasan, V. and Gopalakrishnan, K. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).
    [18] Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H. and Zou, Y. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).
    [19] Xu, B., Wang, N., Chen, T. and Li, M. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015).
    [20] Ramachandran, P., Zoph, B. and Le, Q. V. Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017).
    [21] Misra, D. Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681, 4 (2019), 2.
    [22] Baskin, C., Liss, N., Chai, Y., Zheltonozhskii, E., Schwartz, E., Giryes, R., Mendelson, A. and Bronstein, A. M. Nice: Noise injection and clamping estimation for neural network quantization. arXiv preprint arXiv:1810.00162 (2018).
    [23] Esser, S. K., McKinstry, J. L., Bablani, D., Appuswamy, R. and Modha, D. S. Learned step size quantization. arXiv preprint arXiv:1902.08153 (2019).
    [24] Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H. and Kalenichenko, D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. City, 2018.
    [25] Lin, D., Talathi, S. and Annapureddy, S. Fixed point quantization of deep convolutional networks. PMLR, City, 2016.
    [26] Park, E., Ahn, J. and Yoo, S. Weighted-entropy-based quantization for deep neural networks. City, 2017.
    [27] Elsken, T., Metzen, J. H. and Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res., 20, 55 (2019), 1-21.
    [28] Jin, H., Song, Q. and Hu, X. Auto-keras: An efficient neural architecture search system. City, 2019.
    [29] Liu, H., Simonyan, K. and Yang, Y. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
    [30] Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L. and Aleksic, M. A quantization-friendly separable convolution for mobilenets. IEEE, City, 2018.
    [31] LeCun, Y., Denker, J. S. and Solla, S. A. Optimal brain damage. City, 1990.
    [32] Lebedev, V. and Lempitsky, V. Fast convnets using group-wise brain damage. City, 2016.
    [33] Zhou, H., Alvarez, J. M. and Porikli, F. Less is more: Towards compact cnns. Springer, City, 2016.
    [34] Wen, W., Wu, C., Wang, Y., Chen, Y. and Li, H. Learning structured sparsity in deep neural networks. arXiv preprint arXiv:1608.03665 (2016).
    [35] Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A. and Dally, W. J. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44, 3 (2016), 243-254.
    [36] Courbariaux, M., Bengio, Y. and David, J.-P. Binaryconnect: Training deep neural networks with binary weights during propagations. arXiv preprint arXiv:1511.00363 (2015).
    [37] Rastegari, M., Ordonez, V., Redmon, J. and Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. Springer, City, 2016.
    [38] Li, F., Zhang, B. and Liu, B. Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016).
    [39] Zhu, C., Han, S., Mao, H. and Dally, W. J. Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016).
    [40] Krishnamoorthi, R. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).
    [41] Hwang, K. and Sung, W. Fixed-point feedforward deep neural network design using weights+ 1, 0, and− 1. IEEE, City, 2014.
    [42] Banner, R., Hubara, I., Hoffer, E. and Soudry, D. Scalable methods for 8-bit training of neural networks. arXiv preprint arXiv:1805.11046 (2018).
    [43] Jung, S., Son, C., Lee, S., Son, J., Han, J.-J., Kwak, Y., Hwang, S. J. and Choi, C. Learning to quantize deep networks by optimizing quantization intervals with task loss. City, 2019.
    [44] Bengio, Y., Léonard, N. and Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
    [45] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. Ieee, City, 2009.
    [46] Ruder, S. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016).
    [47] Jefferys, W. H. and Berger, J. O. Ockham's razor and Bayesian analysis. American Scientist, 80, 1 (1992), 64-72.
    [48] Bochkovskiy, A., Wang, C.-Y. and Liao, H.-Y. M. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
    [49] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C. L. Microsoft coco: Common objects in context. Springer, City, 2014.
    [50] Graves, A., Mohamed, A.-r. and Hinton, G. Speech recognition with deep recurrent neural networks. Ieee, City, 2013.
    [51] Rumelhart, D. E., Hinton, G. E. and Williams, R. J. Learning representations by back-propagating errors. nature, 323, 6088 (1986), 533-536.
    [52] Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation, 9, 8 (1997), 1735-1780.
    [53] Zhou, S.-C., Wang, Y.-Z., Wen, H., He, Q.-Y. and Zou, Y.-H. Balanced quantization: An effective and efficient approach to quantized neural networks. Journal of Computer Science and Technology, 32, 4 (2017), 667-682.
    [54] Marcus, M., Santorini, B. and Marcinkiewicz, M. A. Building a large annotated corpus of English: The Penn Treebank (1993).

    QR CODE