基於優化器之低維壓縮應用於類神經網路加速

簡易檢索 / 詳目顯示

回結果列表

研究生：	朱伯軒 Chu, Bo-Shiuan
論文名稱：	基於優化器之低維壓縮應用於類神經網路加速 Low-rank Compression of Covolution Neural Networks using Optimizer Based Rank Selection
指導教授：	李哲榮 LEE, CHE-RUNG
口試委員:	陳煥宗 CHEN, HWANN-TZONG 王聖智 Wang, Sheng-Jyh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	34
中文關鍵詞：	壓縮、類神經網路、維度、加速、卷積、張量
外文關鍵詞：	tensor, compression, optimizer, funnel, convolution, neural network
相關次數：	點閱：104 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

深度學習已在許多領域獲得值得關注的進展，尤其深度卷積網路在圖像辨識相關領域的成就，然而深度卷積網路的能源消耗與計算量非常高，使得該技術能被應用的領域非常受限。深度卷積網路的過度參數化已經是為人所知的，為了能將深度卷積網路應用於資源受限的裝置上，模型壓縮成為了一個重要的研究領域，目前主要的研究領域包括剪枝,量化與分解，其中，分解演算法中的張量分解是深度卷積網路壓縮中的重要技巧之一，歸功於其能夠發現隱藏於卷積層複雜節構中的線性關西。然而，現有方法尚無法提供在模型加速與正確率的損失間令人滿意的平衡點。在本工作中，我們提出一個基於張量分解的模型壓縮方法，針對深度卷積網路中的卷積層進行低維近似。首先，一個全新的卷積網路訓練方法被設計來在模型壓縮過程中更好的維持正確率。再來，我們以最佳化方法解決在使用張量分解於壓縮卷積網路中的維度選擇問題。最後，我們提出一個全新的迴歸方法，稱之為漏斗函數，來決定張量的維度。實驗結果顯示我們的演算法相較於其他張量壓縮演算法可以移除更多模型中的參數並維持更好的正確率。在數個指標性的大型卷積網路上，我們在平均上達到約一個百分比的正確率損失與兩倍的計算量加速。

Tensor decomposition is one of the fundamental technique for model compressionof deep convolution neural networks owing to its ability to reveal the latent relationsamong complex structures. However, most existing methods compress the networkslayer by layer, which cannot provide a satisfactory solution to achieve global op-timization. In this thesis, we proposed model reduction methods to compress thepre-trained networks using low rank approximation for the convolution layers. Ourmethod is based on the optimization techniques to select the proper ranks of decom-posed network layers. In addition, we redesigned the compression flow, and proposed anew regularization function to better distinguish the desired and undesired structures.The experimental results show that our algorithm can reduce more model parame-ters than other tensor compression methods. For Resnet18 with Imagenet2012, ourreduced model can reach more than 2 times speed up in terms of GMAC with merely0.7% Top-1 accuracy drop, which outperforms all existing methods in both metrics.

中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Introduction 1 2 Related Work 4 2.1 Low-rank compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Tensor decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Rank selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Channel pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Drawbacks about tensor compression methods . . . . . . . . . . . . . . 7 2.6 Effectiveness of tensor compression methods . . . . . . . . . . . . . . . 8 3 Method 10 3.1 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.1 Funnel function . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.1 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Experiments 21 4.1 Comparison with other methods . . . . . . . . . . . . . . . . . . . . . . 22 4.1.1 Comparison with TKD-VBMF . . . . . . . . . . . . . . . . . . . 23 4.1.2 Comparison with CP-TPM . . . . . . . . . . . . . . . . . . . . . 23 4.2 Ablation test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.1 Different decomposition methods . . . . . . . . . . . . . . . . . 25 4.2.2 Training flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.3 Learning rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.4 Pruning threshold . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Funnel function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.1 Different regularization functions . . . . . . . . . . . . . . . . . 27 4.3.2 Parameter c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5 Conclusion and Future Work 31 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

                                

[1] Song Han, Huizi Mao, and William J Dally. “Deep compression: Compressing
deep neural networks with pruning, trained quantization and huffman coding”.
arXiv preprint arXiv:1510.00149 (2015).
[2] Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. “Towards the limit of network quantization”. arXiv preprint arXiv:1612.01543 (2016).
[3] Yuhui Xu et al. “Deep neural network compression with single and multiple level
quantization”. Proceedings of the AAAI Conference on Artificial Intelligence.
Vol. 32. 1. 2018.
[4] L. Deng et al. “Model Compression and Hardware Acceleration for Neural
Networks: A Comprehensive Survey”. Proceedings of the IEEE 108.4 (2020),
pp. 485–532. doi: 10.1109/JPROC.2020.2976475.
[5] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. “Thinet: A filter level pruning
method for deep neural network compression”. Proceedings of the IEEE international conference on computer vision. 2017, pp. 5058–5066.
[6] Yihui He, Xiangyu Zhang, and Jian Sun. “Channel pruning for accelerating
very deep neural networks”. Proceedings of the IEEE International Conference
on Computer Vision. 2017, pp. 1389–1397.
[7] Zhuangwei Zhuang et al. “Discrimination-aware channel pruning for deep neural
networks”. Advances in Neural Information Processing Systems. 2018, pp. 875–
886.
[8] Vadim Lebedev et al. “Speeding-up convolutional neural networks using finetuned cp-decomposition”. arXiv preprint arXiv:1412.6553 (2014).
[9] Yong-Deok Kim et al. “Compression of deep convolutional neural networks
for fast and low power mobile applications”. arXiv preprint arXiv:1511.06530
(2015).
[10] Cheng Tai et al. “Convolutional neural networks with low-rank regularization”.
arXiv preprint arXiv:1511.06067 (2015).
[11] Julia Gusak et al. “Automated multi-stage compression of neural networks”. Proceedings of the IEEE International Conference on Computer Vision Workshops.
2019, pp. 0–0.
[12] Yu Cheng et al. “A survey of model compression and acceleration for deep neural
networks”. arXiv preprint arXiv:1710.09282 (2017).
[13] Tamara G Kolda and Brett W Bader. “Tensor decompositions and applications”.
SIAM review 51.3 (2009), pp. 455–500.
32
[14] Zhuang Liu et al. “Learning efficient convolutional networks through network
slimming”. Proceedings of the IEEE International Conference on Computer Vision. 2017, pp. 2736–2744.
[15] Xuanyi Dong et al. “More is less: A more complicated network with less inference complexity”. Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition. 2017, pp. 5840–5848.
[16] Weizhe Hua et al. “Channel gating neural networks”. Advances in Neural Information Processing Systems. 2019, pp. 1886–1896.
[17] Xitong Gao et al. “Dynamic channel pruning: Feature boosting and suppression”.
arXiv preprint arXiv:1810.05331 (2018).
[18] Marcella Astrid and Seung-Ik Lee. “Cp-decomposition with tensor power method
for convolutional neural networks compression”. 2017 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2017, pp. 115–
118.
[19] Misha Denil et al. “Predicting parameters in deep learning”. arXiv preprint
arXiv:1306.0543 (2013).
[20] Emily Denton et al. “Exploiting linear structure within convolutional networks
for efficient evaluation”. arXiv preprint arXiv:1404.0736 (2014).
[21] Yunchao Gong et al. “Compressing deep convolutional networks using vector
quantization”. arXiv preprint arXiv:1412.6115 (2014).
[22] Wenlin Chen et al. “Compressing neural networks with the hashing trick”. International conference on machine learning. PMLR. 2015, pp. 2285–2294.
[23] Yu Cheng et al. “Fast neural networks with circulant projections”. arXiv preprint
arXiv:1502.03436 2 (2015).
[24] Alexander Novikov et al. “Tensorizing neural networks”. arXiv preprint arXiv:1509.06569
(2015).
[25] Johan Håstad. “Tensor rank is NP-Complete”. International Colloquium on Automata, Languages, and Programming. Springer. 1989, pp. 451–460.
[26] Shinichi Nakajima et al. “Perfect dimensionality recovery by variational Bayesian
PCA”. Advances in Neural Information Processing Systems 25 (2012), pp. 971–
979.
[27] J Douglas Carroll and Jih-Jie Chang. “Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition”. Psychometrika 35.3 (1970), pp. 283–319.
[28] Richard A Harshman and Margaret E Lundy. “PARAFAC: Parallel factor analysis”. Computational Statistics & Data Analysis 18.1 (1994), pp. 39–72.
[29] Amnon Shashua and Tamir Hazan. “Non-negative tensor factorization with applications to statistics and computer vision”. Proceedings of the 22nd international conference on Machine learning. 2005, pp. 792–799.
[30] Ledyard R Tucker. “Some mathematical notes on three-mode factor analysis”.
Psychometrika 31.3 (1966), pp. 279–311.
33
[31] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. “A multilinear singular value decomposition”. SIAM journal on Matrix Analysis and Applications
21.4 (2000), pp. 1253–1278.
[32] Yong-Deok Kim and Seungjin Choi. “Nonnegative tucker decomposition”. 2007
IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2007,
pp. 1–8.
[33] Pieter M Kroonenberg and Jan De Leeuw. “Principal component analysis of
three-mode data by means of alternating least squares algorithms”. Psychometrika 45.1 (1980), pp. 69–97.
[34] Arie Kapteyn, Heinz Neudecker, and Tom Wansbeek. “An approach ton-mode
components analysis”. Psychometrika 51.2 (1986), pp. 269–275.
[35] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. “On the best rank-1
and rank-(r 1, r 2,..., rn) approximation of higher-order tensors”. SIAM journal
on Matrix Analysis and Applications 21.4 (2000), pp. 1324–1342.
[36] Lars Eldén and Berkant Savas. “A Newton–Grassmann Method for Computing
the Best Multilinear Rank-(r_1, r_2, r_3) Approximation of a Tensor”. SIAM
Journal on Matrix Analysis and applications 31.2 (2009), pp. 248–271.
[37] Jianbo Ye et al. “Rethinking the smaller-norm-less-informative assumption in
channel pruning of convolution layers”. arXiv preprint arXiv:1802.00124 (2018).
[38] Gao Huang et al. “Densely connected convolutional networks”. Proceedings of
the IEEE conference on computer vision and pattern recognition. 2017, pp. 4700–
4708.
[39] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for
large-scale image recognition”. arXiv preprint arXiv:1409.1556 (2014).
[40] Richard A Harshman et al. “Foundations of the PARAFAC procedure: Models
and conditions for an" explanatory" multimodal factor analysis” (1970).
[41] Kaiming He et al. “Deep residual learning for image recognition”. Proceedings of
the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–
778.
[42] Lei Deng et al. “Model compression and hardware acceleration for neural networks: A comprehensive survey”. Proceedings of the IEEE 108.4 (2020), pp. 485–
532.
[43] Vin De Silva and Lek-Heng Lim. “Tensor rank and the ill-posedness of the
best low-rank approximation problem”. SIAM Journal on Matrix Analysis and
Applications 30.3 (2008), pp. 1084–1127.
[44] Ashish Vaswani et al. “Attention is All you Need”. Advances in Neural Information Processing Systems. Ed. by I. Guyon et al. Vol. 30. Curran Associates,
Inc., 2017. url: https : / / proceedings . neurips . cc / paper / 2017 / file /
3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[45] Tom Brown et al. “Language Models are Few-Shot Learners”. Advances in Neural
Information Processing Systems. Ed. by H. Larochelle et al. Vol. 33. Curran
Associates, Inc., 2020, pp. 1877–1901. url: https://proceedings.neurips.
cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.

簡易檢索 / 詳目顯示

相關論文