使用迭代 Two-Pass Decomposition 加速卷積神經網路之方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	林暐翔 Lin, Wei-Shiang
論文名稱：	使用迭代 Two-Pass Decomposition 加速卷積神經網路之方法 Accelerating Convolutional Neural Networks using Iterative Two-Pass Decomposition
指導教授：	黃稚存 Huang, Chih-Tsun
口試委員:	吳誠文 Wu, Cheng-Wen 孫民 Sun, Min
學位類別：	碩士 Master
系所名稱：
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	36
中文關鍵詞：	卷積類神經網路、加速、低秩近似
外文關鍵詞：	Low Rank Approximation
相關次數：	點閱：74 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近幾年，卷積類神經網路被廣泛應用在各方面。尤其是在電腦視覺領域，為了解決影像辨識、影像偵測及影像分割等問題，研究者們提出了許多最先進的卷積類神經網路模型。這些模型可以有效地解決此類問題，但也需要龐大的計算量。隨著卷積類神經網路模型的深度越來越深，計算複雜度也越趨龐大，把這些模型推廣到資源有限的裝置，例如智慧型手機或物聯網裝置，也越來越困難。

因此，加速卷積類神經網路模型變成了一個重要的議題。通常卷積類神經網路模型為了方便訓練，存在著過多的參數。可以藉由把這些冗餘的參數移除來達到減少運算量及模型的尺寸。在這篇論文中，我們使用了低秩近似來加速卷積類神經網路的推理。

在許多發表中，低秩近似已經被應用在移除卷積類神經網路的冗余參數。藉由轉換原本卷積層中過濾器參數張量成較小的向量表示，我們可以替換原本的卷積層成數個較為簡單的運算層，例如一對一的卷積層或是深度獨立的卷積層，來減少整體運算量。但這樣的轉換存在著一些問題，首先，因為無法直接判斷一個張量最佳拆解的秩，我們需要一個啟發式的演算法來判斷該賦予每個卷積層中的張量多少的秩。另一方面，容易遇到"CP instability"的現象，當使用 "CP decomposition" 拆解卷積類神經網路時，會造成微調上的困難。一旦網路的精準度因為"CP decomposition" 而下降時，利用訓練過程恢復精准度會變成異常困難。

在這篇論文中，我們提出了挑選秩的演算法，使得CP decomposition可以達到更好的效果。以及為了避免CP instability的現象，我們提出了Two-Pass拆解的方法。根據實驗結果，我們挑選秩的演算法在改善拆解網路精準度的同時，能夠達到我們目標的加速。藉由Two-Pass拆解，可以避免訓練拆解網路會遇到的麻煩，並改善拆解後網路的精準度。在這篇論文中，我們成功地在VGG16上達到6$\times$的加速，並維持精準度只下降了1.20\%。另外，在RestNet50上達到1.35$\times$的加速，並維持精準度只下降了1.51\%。

因為我們的方法只用到了一對一的卷積層及深度獨立的卷積層，這方法可以輕易地轉換到不同深度學習架構中。另外我們的方法是基於數據驅動的改善，沒有對於拆解方法有任何假設，也可以輕易地推廣到其他拆解方法來加速卷積類神經網路。

Convolutional Neural Networks (CNNs) are widely used in recent years.They have shown their ability to handle wide range of learning problems, especially in computer vision applications such as image classification, detection and segmentation. Many state-of-the-art models have been proposed to solve a large variety of these problems. These existing models are powerful in learning problems but also computational expensive. With the CNN models growing deeper and deeper, their computation cost also grows. Huge computation cost makes deep CNNs hard to apply to resource constrained devices such as smart phone or IOT devices.

Accelerating CNNs has become a critical topic if we want to apply these powerful models in such kind of applications. Since deep CNNs models tend to be over-parametrized, reducing their redundancy is helpful to accelerating the computation as well as compressing the model size. In this work, we adopt the low rank approximation technique to accelerate the inferences of CNNs.

The low rank approximation has been used to reduce CNN's redundancy in many publications. By converting filter tensors in convolutional layers into smaller vectors, we can substitute the original complex 2D convolutional layers with simpler layers such as 1$\times$1 convolution and depth-wise convolution. Overall computation amount could be therefore decreased. However some issues come out. Because there are no direct algorithms to calculate one tensor's specific rank, we need some heuristics to determine the ranks. The other problem is the CP instability, which makes networks which are decomposed by CP decomposition hard to be fine-tuned. Once the accuracy drops due to the CP decomposition, it's difficult to recover the accuracy by training process.

Based on these observations, we propose the Rank Selection approach for efficient CP decomposition and Two-Pass decomposition to avoid the CP Instability. According to the experiment result, our Rank Selection could determine the effective ranks with improved performance while maintaining the target speedup. With the proposed Two-Pass decomposition technique, by avoiding the ineffective training on the decomposed layers, we can obtain the decomposed models with higher accuracy. For deeper networks, iterative Two-Pass decomposition is proposed to further improve the accuracy. In this work, we have successfully achieved 6$\times$ speedup in VGG16 with only 1.20\% accuracy drop and 1.35$\times$ speedup in RestNet50 with 1.51\% accuracy drop.

Comparing with other works, our Two-Pass decomposition is more universal. It can be easily adopted to different deep learning frameworks because only 1$\times$1 convolution and depth-wise convolution are used. Our method is based on data driven approach to optimize the performance, without any specific decomposition method needed. Therefore, unlike other previous methods, the proposed one could be extended to other decomposing forms for accelerating deep neural networks.

1 Introduction 1
* 1.1  Convolutional Neural Network 2
* 1.2  Relative Works  3
* 1.2.1  Low Rank Approximation  3
* 1.2.2  Prune weights 4
* 1.2.3  Quantization  5
* 1.2.4  Fast Algorithms 5
* 1.2.5  Efficient Network Design 6
* 1.3  Low Rank Approximation 6
* 1.3.1  Preliminary 7
* 1.3.2  CP Decomposition 7
* 1.3.3  Decompose Convolutional Layer 8
* 1.3.4  The Complexity and Speedup 10
* 1.3.5  The CPInstability 11
* 1.4  Contributions 12
2 Proposed Methods 13
* 2.1  Two-PassDecomposition 13
* 2.2  Proposed Iterative Decomposition Flow for DNNs 15
* 2.2.1 Our Rank Selection Algorithm 16
* 2.2.2 Iterative Two-Pass Decomposition Flow 17
3 Experiments 19
* 3.1  Speedup 20
* 3.2  Decomposing VGG16 21
* 3.2.1 Rank Selection 21
* 3.2.2 Decomposing Sequence in Iterative Two-Pass Decomposition 21
* 3.2.3 Comparing with other works 22
* 3.3 Decomposing ResNet 24
4 Discussion 26
* 4.1 Data Dependence 26
* 4.2 Freezing layers in Iterative finetune 27
* 4.3 Iterative Sequence 27
5  Conclusion 30
6  Future Work 31
                                

[1] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. [Online]. Available: http://papers.nips.cc/ paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
[3] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” ArXiv e-prints, Sep. 2014.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” ArXiv e-prints, Dec. 2015.
[5] M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up Convolutional Neural Net- works with Low Rank Expansions,” ArXiv e-prints, May 2014.
[6] X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating Very Deep Convolutional Networks for Classification and Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, pp. 1943–1955, 2016.
[7] E. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting Linear Struc- ture Within Convolutional Networks for Efficient Evaluation,” ArXiv e-prints, Apr. 2014.
[8] V. Lebedev, Y. Ganin, M. Rakhuba, I. Oseledets, and V. Lempitsky, “Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition,” ArXiv e-prints, Dec. 2014.
[9] T. G. Kolda and B. W. Bader, “Tensor Decompositions and Applications,” SIAM, Society for Industrial and Applied Mathematics, vol. 51, no. 3, pp. 455–500, 2009.
[10] Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin, “Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications,” ArXiv e-prints, Nov. 2015.
[11] M. Astrid and S.-I. Lee, “CP-decomposition with Tensor Power Method for Convolu- tional Neural Networks Compression,” ArXiv e-prints, Jan. 2017.
[12] S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” ArXiv e-prints, Oct. 2015.
[13] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning Filters for Efficient ConvNets,” ArXiv e-prints, Aug. 2016.
[14] X. Yu, T. Liu, X. Wang, and D. Tao, “On Compressing Deep Models by Low Rank and Sparse Decomposition,” CVPR,Conference on Computer Vision and Pattern Recogni- tion, 2017.
[15] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, “Learning Structured Sparsity in Deep Neural Networks,” ArXiv e-prints, Aug. 2016.
[16] J. Park, S. Li, W. Wen, P. T. P. Tang, H. Li, Y. Chen, and P. Dubey, “FASTER CNNS WITH DIRECT SPARSE CONVOLUTIONS AND GUIDED PRUNING,” ICLR,International Conference on Learning Representations, 2017.
[17] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy., “Sparse Convolutional Neural Networks,” CVPR,Conference on Computer Vision and Pattern Recognition, 2015.
[18] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks,” in Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 4107–4115. [Online]. Available: http: //papers.nips.cc/paper/6573-binarized-neural-networks.pdf
[19] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classi- fication Using Binary Convolutional Neural Networks,” ArXiv e-prints, Mar. 2016.
[20] N. Vasilache, J. Johnson, M. Mathieu, S. Chintala, S. Piantino, and Y. LeCun, “Fast Convolutional Nets With fbfft: A GPU Performance Evaluation,” ArXiv e-prints, Dec. 2014.
[21] A. Lavin and S. Gray, “Fast Algorithms for Convolutional Neural Networks,” ArXiv e-prints, Sep. 2015.
[22] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ¡ 0.5MB model size,” ArXiv e-prints, Feb. 2016.
[23] M. Wang, B. Liu, and H. Foroosh, “Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial “Bottleneck” Structure,” ArXiv e-prints, Aug. 2016.
[24] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” ArXiv e-prints, Apr. 2017.
[25] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An Extremely Efficient Convolu- tional Neural Network for Mobile Devices,” ArXiv e-prints, Jul. 2017.
[26] H.A.L.Kiers,“Towardsastandardizednotationandterminologyinmultiwayanalysis,” JOURNAL OF CHEMOMETRICS, 2000.
[27] L. R. Tucker, “Implications of factor analysis of three-way matrices for measurement of change,” in Problems in measuring change., C. W. Harris, Ed. Madison WI: University of Wisconsin Press, 1963, pp. 122–137.
[28] M. Nickel, “scikit-tensor,” https://github.com/mnick/scikit-tensor, 2016.
[29] machrisaa, “Tensorflow vgg16 and vgg19,” https://github.com/machrisaa/ tensorflow-vgg, 2016.
[30] ethereon, “Caffe to tensorflow,” https://github.com/ethereon/caffe-tensorflow, 2016.

簡易檢索 / 詳目顯示

相關論文