運用近似運算及省略運算之卷積神經網路低功耗設計

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭丞軒 Cheng, Cheng-Hsuan
論文名稱：	運用近似運算及省略運算之卷積神經網路低功耗設計 Low Power Design of Convolutional Neural Networks Using Approximate Computing and Computation Skipping
指導教授：	呂仁碩 Liu, Ren-Shuo
口試委員:	劉靖家 Liou, Jing-Jia 許雅三 Hsu, Yarsun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	31
中文關鍵詞：	卷積神經網路、近似運算、內積、全連接層、動態記憶體、低功耗
外文關鍵詞：	Convolutional neural network, Approximate computing, Inner product, Fully-connected layer, DRAM, Low power
相關次數：	點閱：91 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

卷積神經網絡(CNNs)有許多吸引人的應用，像是影像辨識、物體偵測、影像分割等，執行CNNs的裝置不僅是配備有強大運算能力(e.g.,GPUs)的伺服器和雲端，還有低功耗的裝置，其只配備嵌入式CNN加速器，像是智慧型手機、無人機、物聯網等裝置。為了提升CNN模型的準確度，CNN模型通常傾向於被訓練成具有大量的參數以及乘加運算，容易消耗超過低功耗裝置所能承受的能耗，因此，節省執行CNN的能耗是非常重要的議題。大部分的CNN模型是由兩個部分所組成：卷積層(convolution layers)和全連接層(fully-connected layers)。在本論文中，我採用了兩個技術：DrowsyNet和我提出的AIP，分別處理卷積層和全連接層來達到節省能耗的目標。這兩個技術都是屬於應用在推論階段的技術。

AIP藉由一小部分的參數(1/16的參數)做內積運算，來近似卷積神經網絡中的全連接層的內積運算。我觀察到卷積神經網絡(CNN)中的全連接層有幾個特性與AIP自然的結合：丟棄訓練策略(dropout training strategy)、線性整流函數(ReLUs)和排序運算子(top-n operator)。實驗結果顯示可以節省48% DRAM存取能耗，只損失2% top-5準確率(VGG-f)。

DrowsyNet會將卷積的神經元做隨機性的零化，來達到能耗與準確度之間的取捨，這個技術與傳統訓練階段所使用的零化策略(i.e.,dropout)不一樣。DrowsyNet進一步呈現非均勻配置的策略，其代表著每個卷積層之間使用不同的隨機零化比率;均勻策略，則是每層卷積層都使用一樣的隨機零化比率。實驗結果顯示捨棄11% top-5的準確度，可以達到50%卷積運算數量的節省。我將DrowsyNet做一些延伸實驗，應用於較新的卷積神經網路模型。

關於AIP之設計與研究成果，我已於2019年3月18--20日在台灣新竹舉辦的IEEE國際人工智慧電路與系統研討會演講發表。

Convolutional neural networks (CNNs) have many attractive applications (e.g., image classification, object detection and segmentation) not only for servers and cloud that equip powerful GPUs but also for light-weight, low-power devices (e.g., smartphones and IoT devices) that equip embedded CNN accelerators. To offer higher accuracy, CNN models may consume power that low-power devices can not afford. Therefore, saving the power consumption of CNNs is crucial. Most CNNs are composed of two parts, convolutional (CONV) layers and fully-connected (FC) layers. In this thesis, I adopt two strategies, DrowsyNet and my proposed AIP strategy, for CONV layers and FC layers to
reduce energy consumption, respectively. These two techniques are both inference-stage techniques.

AIP approximates the inner products of CNNs' FC layers by using only a small fraction (e.g., one-sixteenth) of parameters. I observe that FC layers possess several characteristics that naturally fit AIP: the dropout training strategy, rectified linear units (ReLUs), and top-n operator. Experimental results show that 48\% of DRAM access energy can be reduced at the cost of only 2% of top-5 accuracy loss (for VGG-f).

DrowsyNet randomly drops out a fraction of convolutional neurons to achieve a power-accuracy tradeoff. This technique is different from traditional training-stage dropout. DrowsyNet further presents that the dropout rates would better be set non-uniformly among convolutional layers. Experimental results show that up to 50% savings in the number of convolutions are available by trading away less than 11% of top-5 accuracy.

The results of designing and evaluating AIP has been accepted as an oral presentation at IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS 2019) held March 18--20, 2019, in Hsinchu, Taiwan.

誌謝
摘要
Abstract
  Introduction
1    AIP    . . . . . . . . . . . . . . . . . 1
1.1       Energy Evaluation and Hardware Architecture Model for AIP 3
2    DrowsyNet    . . . . . . . . . . . . . . . . . . . . 3
  Background    5
  AIP Design    7
1    Inference-Phase, Runtime-Adjustable Design  .  .  .  .  .  .  .  .  .  . . .    7
2    Two-Pass FC Computing .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . . .    8
3    Exploiting the Headroom Created by  Dropout  .  .  .  .  .  .  .  .  .  . .    8
4    Contiguous Parameter Selection  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . . .    9
5    Proportional Bias Scaling    10
6    AIP Hardware Architecture Model    10
  DrowsyNet Design    13
1    Drowsy Neuron Model    13
2    Uniform  DrowsyNet    14
3    Non-Uniform  DrowsyNet    14
  AIP Evaluation    15
1    Evaluate Energy Consumption    17
2    Experimental Results    20
  DrowsyNet Evaluation    25
  Related Works    27
1    Approximate Convolutions    27
2    Approximate Multiplications    28
  Conclusions    29
References    30

                                

[1] N. Srivastava et al., “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, Jan. 2014.
[2] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connec- tions for efficient neural networks,” arXiv preprint arXiv:1506.02626, 2015.
[3] B. Zhang, A. Davoodi, and Y. H. Hu, “Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[5] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,”
arXiv preprint arXiv:1409.0575, 2014.
[6] R.-S. Liu, Y.-C. Lo, Y.-C. Luo, C.-Y. Shen, and C.-J. Lee, “DrowsyNet: Convolutional neural networks with runtime power-accuracy tunability us- ing inference-stage dropout,” in International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2018.
[7] X. Li, G. Zhang, H. H. Huang, Z. Wang, and W. Zheng, “Performance analysis of GPU-based convolutional neural networks,” in International Conference on Parallel Processing (ICPP), 2016.
[8] D. Shin, J. Lee, J. Lee, and H. Yoo, “DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks,” in Interna- tional Solid-State Circuits Conference (ISSCC), 2017.
[9] A. Vedaldi and K. Lenc, “MatConvNet – convolutional neural networks for MATLAB,” in International Conference on Multimedia, 2015.
[10] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
[11] A. Krizhevsky, “One weird trick for parallelizing convolutional neural net- works,” CoRR, vol. abs/1404.5997, 2014.
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.

[13] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and
K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size,” CoRR, vol. abs/1602.07360, 2016.
[14] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional networks,” CoRR, vol. abs/1608.06993, 2016.
[15] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in International Conference on Neural Information Processing Systems (NIPS), 2016.
[16] V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, R. K. Gupta, and H. Es- maeilzadeh, “SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks,” in International Symposium on Com- puter Architecture (ISCA), 2018.
[17] T. Ujiie, M. Hiromoto, and T. Sato, “Approximated prediction strategy for reducing power consumption of convolutional neural network processor,” in International Conference on Computer Vision and Pattern Recognition Work- shops (CVPRW), 2016.
[18] S. Boroumand, H. P. Afshar, P. Brisk, and S. Mohammadi, “Exploration of approximate multipliers design space using carry propagation free compres- sors,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[19] T. Yang, T. Ukezono, and T. Sato, “A low-power high-speed accuracy- controllable approximate multiplier design,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[20] A. Ren, Z. Li, C. Ding, Q. Qiu, Y. Wang, J. Li, X. Qian, and B. Yuan, “SC- DCNN: Highly-scalable deep convolutional neural network using stochastic computing,” in International Conference on Architectural Support for Pro- gramming Languages and Operating Systems (ASPLOS), 2017.
[21] M. Imani, M. Masich, D. Peroni, P. Wang, and T. Rosing, “CANNA: Neural network acceleration using configurable approximation on GPGPU,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[22] Y.-S. Hsiao, Y.-C. Lo, and R.-S. Liu, “FlexNet: Neural networks with inherent inference-time bitwidth flexibility,” in ACM Student Research Competition at International Symposium on Microarchitecture (ACM SRC at MICRO), 2018.

簡易檢索 / 詳目顯示

相關論文