研究生: |
鄭丞軒 Cheng, Cheng-Hsuan |
---|---|
論文名稱: |
運用近似運算及省略運算之卷積神經網路低功耗設計 Low Power Design of Convolutional Neural Networks Using Approximate Computing and Computation Skipping |
指導教授: |
呂仁碩
Liu, Ren-Shuo |
口試委員: |
劉靖家
Liou, Jing-Jia 許雅三 Hsu, Yarsun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 31 |
中文關鍵詞: | 卷積神經網路 、近似運算 、內積 、全連接層 、動態記憶體 、低功耗 |
外文關鍵詞: | Convolutional neural network, Approximate computing, Inner product, Fully-connected layer, DRAM, Low power |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
卷積神經網絡(CNNs)有許多吸引人的應用,像是影像辨識、物體偵測、影像分割等,執行CNNs的裝置不僅是配備有強大運算能力(e.g.,GPUs)的伺服器和雲端,還有低功耗的裝置,其只配備嵌入式CNN加速器,像是智慧型手機、無人機、物聯網等裝置。為了提升CNN模型的準確度,CNN模型通常傾向於被訓練成具有大量的參數以及乘加運算,容易消耗超過低功耗裝置所能承受的能耗,因此,節省執行CNN的能耗是非常重要的議題。大部分的CNN模型是由兩個部分所組成:卷積層(convolution layers)和全連接層(fully-connected layers)。在本論文中,我採用了兩個技術:DrowsyNet和我提出的AIP,分別處理卷積層和全連接層來達到節省能耗的目標。這兩個技術都是屬於應用在推論階段的技術。
AIP藉由一小部分的參數(1/16的參數)做內積運算,來近似卷積神經網絡中的全連接層的內積運算。我觀察到卷積神經網絡(CNN)中的全連接層有幾個特性與AIP自然的結合:丟棄訓練策略(dropout training strategy)、線性整流函數(ReLUs)和排序運算子(top-n operator)。實驗結果顯示可以節省48% DRAM存取能耗,只損失2% top-5準確率(VGG-f)。
DrowsyNet會將卷積的神經元做隨機性的零化,來達到能耗與準確度之間的取捨,這個技術與傳統訓練階段所使用的零化策略(i.e.,dropout)不一樣。DrowsyNet進一步呈現非均勻配置的策略,其代表著每個卷積層之間使用不同的隨機零化比率;均勻策略,則是每層卷積層都使用一樣的隨機零化比率。實驗結果顯示捨棄11% top-5的準確度,可以達到50%卷積運算數量的節省。我將DrowsyNet做一些延伸實驗,應用於較新的卷積神經網路模型。
關於AIP之設計與研究成果,我已於2019年3月18--20日在台灣新竹舉辦的IEEE國際人工智慧電路與系統研討會演講發表。
Convolutional neural networks (CNNs) have many attractive applications (e.g., image classification, object detection and segmentation) not only for servers and cloud that equip powerful GPUs but also for light-weight, low-power devices (e.g., smartphones and IoT devices) that equip embedded CNN accelerators. To offer higher accuracy, CNN models may consume power that low-power devices can not afford. Therefore, saving the power consumption of CNNs is crucial. Most CNNs are composed of two parts, convolutional (CONV) layers and fully-connected (FC) layers. In this thesis, I adopt two strategies, DrowsyNet and my proposed AIP strategy, for CONV layers and FC layers to
reduce energy consumption, respectively. These two techniques are both inference-stage techniques.
AIP approximates the inner products of CNNs' FC layers by using only a small fraction (e.g., one-sixteenth) of parameters. I observe that FC layers possess several characteristics that naturally fit AIP: the dropout training strategy, rectified linear units (ReLUs), and top-n operator. Experimental results show that 48\% of DRAM access energy can be reduced at the cost of only 2% of top-5 accuracy loss (for VGG-f).
DrowsyNet randomly drops out a fraction of convolutional neurons to achieve a power-accuracy tradeoff. This technique is different from traditional training-stage dropout. DrowsyNet further presents that the dropout rates would better be set non-uniformly among convolutional layers. Experimental results show that up to 50% savings in the number of convolutions are available by trading away less than 11% of top-5 accuracy.
The results of designing and evaluating AIP has been accepted as an oral presentation at IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS 2019) held March 18--20, 2019, in Hsinchu, Taiwan.
[1] N. Srivastava et al., “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, Jan. 2014.
[2] S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connec- tions for efficient neural networks,” arXiv preprint arXiv:1506.02626, 2015.
[3] B. Zhang, A. Davoodi, and Y. H. Hu, “Exploring energy and accuracy tradeoff in structure simplification of trained deep neural networks,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large- scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[5] O. Russakovsky et al., “ImageNet large scale visual recognition challenge,”
arXiv preprint arXiv:1409.0575, 2014.
[6] R.-S. Liu, Y.-C. Lo, Y.-C. Luo, C.-Y. Shen, and C.-J. Lee, “DrowsyNet: Convolutional neural networks with runtime power-accuracy tunability us- ing inference-stage dropout,” in International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2018.
[7] X. Li, G. Zhang, H. H. Huang, Z. Wang, and W. Zheng, “Performance analysis of GPU-based convolutional neural networks,” in International Conference on Parallel Processing (ICPP), 2016.
[8] D. Shin, J. Lee, J. Lee, and H. Yoo, “DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks,” in Interna- tional Solid-State Circuits Conference (ISSCC), 2017.
[9] A. Vedaldi and K. Lenc, “MatConvNet – convolutional neural networks for MATLAB,” in International Conference on Multimedia, 2015.
[10] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.
[11] A. Krizhevsky, “One weird trick for parallelizing convolutional neural net- works,” CoRR, vol. abs/1404.5997, 2014.
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CoRR, vol. abs/1512.03385, 2015.
[13] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and
K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size,” CoRR, vol. abs/1602.07360, 2016.
[14] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional networks,” CoRR, vol. abs/1608.06993, 2016.
[15] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning structured sparsity in deep neural networks,” in International Conference on Neural Information Processing Systems (NIPS), 2016.
[16] V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, R. K. Gupta, and H. Es- maeilzadeh, “SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks,” in International Symposium on Com- puter Architecture (ISCA), 2018.
[17] T. Ujiie, M. Hiromoto, and T. Sato, “Approximated prediction strategy for reducing power consumption of convolutional neural network processor,” in International Conference on Computer Vision and Pattern Recognition Work- shops (CVPRW), 2016.
[18] S. Boroumand, H. P. Afshar, P. Brisk, and S. Mohammadi, “Exploration of approximate multipliers design space using carry propagation free compres- sors,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[19] T. Yang, T. Ukezono, and T. Sato, “A low-power high-speed accuracy- controllable approximate multiplier design,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[20] A. Ren, Z. Li, C. Ding, Q. Qiu, Y. Wang, J. Li, X. Qian, and B. Yuan, “SC- DCNN: Highly-scalable deep convolutional neural network using stochastic computing,” in International Conference on Architectural Support for Pro- gramming Languages and Operating Systems (ASPLOS), 2017.
[21] M. Imani, M. Masich, D. Peroni, P. Wang, and T. Rosing, “CANNA: Neural network acceleration using configurable approximation on GPGPU,” in Asia and South Pacific Design Automation Conference (ASP-DAC), 2018.
[22] Y.-S. Hsiao, Y.-C. Lo, and R.-S. Liu, “FlexNet: Neural networks with inherent inference-time bitwidth flexibility,” in ACM Student Research Competition at International Symposium on Microarchitecture (ACM SRC at MICRO), 2018.