簡易檢索 / 詳目顯示

研究生: 鄭丞軒
Cheng, Cheng-Hsuan
論文名稱: 運用近似運算及省略運算之卷積神經網路低功耗設計
Low Power Design of Convolutional Neural Networks Using Approximate Computing and Computation Skipping
指導教授: 呂仁碩
Liu, Ren-Shuo
口試委員: 劉靖家
Liou, Jing-Jia
許雅三
Hsu, Yarsun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 31
中文關鍵詞: 卷積神經網路近似運算內積全連接層動態記憶體低功耗
外文關鍵詞: Convolutional neural network, Approximate computing, Inner product, Fully-connected layer, DRAM, Low power
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 卷積神經網絡(CNNs)有許多吸引人的應用,像是影像辨識、物體偵測、影像分割等,執行CNNs的裝置不僅是配備有強大運算能力(e.g.,GPUs)的伺服器和雲端,還有低功耗的裝置,其只配備嵌入式CNN加速器,像是智慧型手機、無人機、物聯網等裝置。為了提升CNN模型的準確度,CNN模型通常傾向於被訓練成具有大量的參數以及乘加運算,容易消耗超過低功耗裝置所能承受的能耗,因此,節省執行CNN的能耗是非常重要的議題。大部分的CNN模型是由兩個部分所組成:卷積層(convolution layers)和全連接層(fully-connected layers)。在本論文中,我採用了兩個技術:DrowsyNet和我提出的AIP,分別處理卷積層和全連接層來達到節省能耗的目標。這兩個技術都是屬於應用在推論階段的技術。

    AIP藉由一小部分的參數(1/16的參數)做內積運算,來近似卷積神經網絡中的全連接層的內積運算。我觀察到卷積神經網絡(CNN)中的全連接層有幾個特性與AIP自然的結合:丟棄訓練策略(dropout training strategy)、線性整流函數(ReLUs)和排序運算子(top-n operator)。實驗結果顯示可以節省48% DRAM存取能耗,只損失2% top-5準確率(VGG-f)。

    DrowsyNet會將卷積的神經元做隨機性的零化,來達到能耗與準確度之間的取捨,這個技術與傳統訓練階段所使用的零化策略(i.e.,dropout)不一樣。DrowsyNet進一步呈現非均勻配置的策略,其代表著每個卷積層之間使用不同的隨機零化比率;均勻策略,則是每層卷積層都使用一樣的隨機零化比率。實驗結果顯示捨棄11% top-5的準確度,可以達到50%卷積運算數量的節省。我將DrowsyNet做一些延伸實驗,應用於較新的卷積神經網路模型。

    關於AIP之設計與研究成果,我已於2019年3月18--20日在台灣新竹舉辦的IEEE國際人工智慧電路與系統研討會演講發表。


    Convolutional neural networks (CNNs) have many attractive applications (e.g., image classification, object detection and segmentation) not only for servers and cloud that equip powerful GPUs but also for light-weight, low-power devices (e.g., smartphones and IoT devices) that equip embedded CNN accelerators. To offer higher accuracy, CNN models may consume power that low-power devices can not afford. Therefore, saving the power consumption of CNNs is crucial. Most CNNs are composed of two parts, convolutional (CONV) layers and fully-connected (FC) layers. In this thesis, I adopt two strategies, DrowsyNet and my proposed AIP strategy, for CONV layers and FC layers to
    reduce energy consumption, respectively. These two techniques are both inference-stage techniques.

    AIP approximates the inner products of CNNs' FC layers by using only a small fraction (e.g., one-sixteenth) of parameters. I observe that FC layers possess several characteristics that naturally fit AIP: the dropout training strategy, rectified linear units (ReLUs), and top-n operator. Experimental results show that 48\% of DRAM access energy can be reduced at the cost of only 2% of top-5 accuracy loss (for VGG-f).

    DrowsyNet randomly drops out a fraction of convolutional neurons to achieve a power-accuracy tradeoff. This technique is different from traditional training-stage dropout. DrowsyNet further presents that the dropout rates would better be set non-uniformly among convolutional layers. Experimental results show that up to 50% savings in the number of convolutions are available by trading away less than 11% of top-5 accuracy.

    The results of designing and evaluating AIP has been accepted as an oral presentation at IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS 2019) held March 18--20, 2019, in Hsinchu, Taiwan.

    誌謝 摘要 Abstract 1 Introduction 1.1 AIP . . . . . . . . . . . . . . . . . 1 1.1.1 Energy Evaluation and Hardware Architecture Model for AIP 3 1.2 DrowsyNet . . . . . . . . . . . . . . . . . . . . 3 2 Background 5 3 AIP Design 7 3.1 Inference-Phase, Runtime-Adjustable Design . . . . . . . . . . . . 7 3.2 Two-Pass FC Computing . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Exploiting the Headroom Created by Dropout . . . . . . . . . . . 8 3.4 Contiguous Parameter Selection . . . . . . . . . . . . . . . . . . . 9 3.5 Proportional Bias Scaling 10 3.6 AIP Hardware Architecture Model 10 4 DrowsyNet Design 13 4.1 Drowsy Neuron Model 13 4.2 Uniform DrowsyNet 14 4.3 Non-Uniform DrowsyNet 14 5 AIP Evaluation 15 5.1 Evaluate Energy Consumption 17 5.2 Experimental Results 20 6 DrowsyNet Evaluation 25 7 Related Works 27 7.1 Approximate Convolutions 27 7.2 Approximate Multiplications 28 8 Conclusions 29 References 30

    [1] N. Srivastava et al., “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, Jan. 2014.
    [2] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connec- tions for efficient neural networks,” arXiv preprint arXiv:1506.02626, 2015.
    [3] B. Zhang, A. Davoodi, and Y. H. Hu, “Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
    [4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [5] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,”
    arXiv preprint arXiv:1409.0575, 2014.
    [6] R.-S. Liu, Y.-C. Lo, Y.-C. Luo, C.-Y. Shen, and C.-J. Lee, “DrowsyNet: Convolutional neural networks with runtime power-accuracy tunability us- ing inference-stage dropout,” in International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2018.
    [7] X. Li, G. Zhang, H. H. Huang, Z. Wang, and W. Zheng, “Performance analysis of GPU-based convolutional neural networks,” in International Conference on Parallel Processing (ICPP), 2016.
    [8] D. Shin, J. Lee, J. Lee, and H. Yoo, “DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks,” in Interna- tional Solid-State Circuits Conference (ISSCC), 2017.
    [9] A. Vedaldi and K. Lenc, “MatConvNet – convolutional neural networks for MATLAB,” in International Conference on Multimedia, 2015.
    [10] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
    S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
    [11] A. Krizhevsky, “One weird trick for parallelizing convolutional neural net- works,” CoRR, vol. abs/1404.5997, 2014.
    [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.

    [13] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and
    K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size,” CoRR, vol. abs/1602.07360, 2016.
    [14] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional networks,” CoRR, vol. abs/1608.06993, 2016.
    [15] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in International Conference on Neural Information Processing Systems (NIPS), 2016.
    [16] V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, R. K. Gupta, and H. Es- maeilzadeh, “SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks,” in International Symposium on Com- puter Architecture (ISCA), 2018.
    [17] T. Ujiie, M. Hiromoto, and T. Sato, “Approximated prediction strategy for reducing power consumption of convolutional neural network processor,” in International Conference on Computer Vision and Pattern Recognition Work- shops (CVPRW), 2016.
    [18] S. Boroumand, H. P. Afshar, P. Brisk, and S. Mohammadi, “Exploration of approximate multipliers design space using carry propagation free compres- sors,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
    [19] T. Yang, T. Ukezono, and T. Sato, “A low-power high-speed accuracy- controllable approximate multiplier design,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
    [20] A. Ren, Z. Li, C. Ding, Q. Qiu, Y. Wang, J. Li, X. Qian, and B. Yuan, “SC- DCNN: Highly-scalable deep convolutional neural network using stochastic computing,” in International Conference on Architectural Support for Pro- gramming Languages and Operating Systems (ASPLOS), 2017.
    [21] M. Imani, M. Masich, D. Peroni, P. Wang, and T. Rosing, “CANNA: Neural network acceleration using configurable approximation on GPGPU,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
    [22] Y.-S. Hsiao, Y.-C. Lo, and R.-S. Liu, “FlexNet: Neural networks with inherent inference-time bitwidth flexibility,” in ACM Student Research Competition at International Symposium on Microarchitecture (ACM SRC at MICRO), 2018.

    QR CODE