簡易檢索 / 詳目顯示

研究生: 蕭婷云
Hsiao, Ting-Yun
論文名稱: 基於全域均值池化和剪枝及截斷奇異值分解與二進位表示聚類中心之複合型卷積網路壓縮
Convolutional Network Compression with Global Average Pooling, Pruning, Truncated SVD and Binary Representation Center Clustering
指導教授: 邱瀞德
Chiu, Ching-Te
口試委員: 張隆紋
Chang, Long-Wen
黃朝宗
Huang, Chao-Tsung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 通訊工程研究所
Communications Engineering
論文出版年: 2017
畢業學年度: 106
語文別: 英文
論文頁數: 45
中文關鍵詞: 深度卷積網路壓縮全域均值池化剪枝截斷奇異值分解二進位表示聚類中心
外文關鍵詞: Deep Convolutional Model Compression, Global Average Pooling (GAP), Pruning, Truncated SVD, Binary Representation Center Clustering
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來深度類神經網路十分強大,然因其眾多的參數與運行時所需之大量運算,使其在使用上耗時且耗記憶體。現今已有許多壓縮深度類神經網路模型之研究,這些研究多著重於降低整體所需的參數數量與位元數。在參數層面上,許多研究者首先嘗試壓縮最占記憶體的全連階層(Fully-connected layer),而對於一個模型中所有的層(Layer),其中有一個常見的方法,剪枝。由於剪枝是先去除不重要的參數並再次訓練,因此雖然藉由多次的剪枝可以大量減少參數量,卻有可能耗去大量的時間。在位元層面上,有許多研究使用了量化來減少所需要的位元量。然而,舉例來說,有些研究會以編碼來降低每個參數所需的位元數,卻仍使用高精度的浮點數來記錄編碼簿。
    因此,我們提出一個能有效壓縮模型的策略,這個策略是指對耗計算量或耗記憶體的層作壓縮。我們先對第一個全連階層(Fully-connected layer) 輸入的特徵圖做全域均值池化以降低第一全連階層中權重矩陣之維度。接著,我們會迭代地對幾層權重矩陣的輸出維做多次的剪枝以等比例地減少所需要的計算量,而在這個步驟中,我們提出了一個決定剪枝順序的方法以減少花在這一步上的時間。之後,我們會對全連階層做截斷奇異值分解以將一個使用偌大權重矩陣之層分解成兩個使用小權重矩陣之全連結層。最後,我們會對於每層的矩陣做分類並以二進位表示法用16 個位元記錄下每個聚類中心的數值大小,以降低所需要的位元數。在VGG16 網路模型上的結果顯示,我們能在離線下約壓縮60.9 倍時,在ILSVRC2012 之驗證資料庫的top-1 與top-5 分類結果上只分別失去0.848% 及0.1378% 的正確率。


    Deep neural networks are powerful, but using these networks is both memory and time consuming for their numerous parameters and large amounts of computation. There are many studies in compressing the models on the parameter-level and also on the bit-level. On the parameter-level, many researchers first compress the most memory consuming fully-connected layers and for the parameters in all layers, one of the compressing methods is to spend lots of time performing the iterative process consisting of pruning weights and fine-tuning the models. On the bit-level, many studies use quantization to cut down on the number of need bits. However, these studies, for example, apply encoding methods to store parameters with fewer bits, but still use high-precision floating-point numbers to record their codebooks.
    Hence, we propose an efficient strategy to compress on the layers which are computation or memory consuming. We first compress the model on the parameter- level with adding the global average pooling to reduce the number of parameters in the first fully-connected layer. Secondly, we iteratively pruning on the filters in convolutional layers or rows of weights in fully-connected layers to proportionally reduce the amount of computation of each layer, and for this process, we propose an order-deciding scheme to prune more efficiently. Then, we perform the truncated singular value decomposition on the fully-connected layer. Finally, on the bit-level, we cluster each layer’s weights and record the center values in binary representation with 16 bits compared with the previous ones that use 32 or 64 bits floating-point numbers.
    Experiments on the VGG16 model show that we can reach a 60.9× compression ratio in off-line storage with about 0.848% and 0.1378% loss of accuracy on the top-1 and top-5 classification results with the validation dataset of ILSVRC2012, respectively.

    1 Introduction 1 1.1 Motivation and Problem Description . . . . . . . . . . . . . . . . . 4 1.2 Goal and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Related Works 7 2.1 Compressing on parameter-level . . . . . . . . . . . . . . . . . . . . 7 2.2 Compressing on bit-level . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Compressing on both levels . . . . . . . . . . . . . . . . . . . . . . 10 3 CNN Compression with Compounded Skills 12 3.1 Global Average Pooling . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Truncated SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Binary Representation Center Clustering . . . . . . . . . . . . . . . 23 4 Experimental Results 26 4.1 Intermediate and Final Results . . . . . . . . . . . . . . . . . . . . 29 4.2 Analysis of using different choices . . . . . . . . . . . . . . . . . . . 32 4.2.1 With and Without Proposed Pruning Scheme . . . . . . . . 32 4.2.2 Number of Sub-layers in TSVD and Their Rank . . . . . . . 35 4.3 Comparison with Other Works . . . . . . . . . . . . . . . . . . . . 38 5 Conclusion and Future Work 40

    [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
    [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    [3] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
    [4] M. Denil, B. Shakibi, L. Dinh, N. de Freitas et al., “Predicting parameters in deep learning,” in Advances in Neural Information Processing Systems, 2013, pp. 2148–2156.
    [5] W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen, “Compressing neural networks with the hashing trick,” in International Conference on Machine Learning, 2015, pp. 2285–2294.
    [6] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820–4828.
    [7] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in Neural Information Processing Systems, 2015, pp. 1135–1143.
    [8] S. Tang and J. Han, “A pruning based method to learn both weights and connections for lstm.”
    [9] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” arXiv preprint arXiv:1608.08710, 2016.
    [10] H. Hu, R. Peng, Y.-W. Tai, and C.-K. Tang, “Network trimming: A data-driven neuron pruning approach towards efficient deep architectures,” arXiv preprint arXiv:1607.03250, 2016.
    [11] J.-H. Luo, J. Wu, and W. Lin, “Thinet: A filter level pruning method for deep neural network compression,” arXiv preprint arXiv:1707.06342, 2017.
    [12] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv: 1312.4400, 2013.
    [13] J. Xue, J. Li, and Y. Gong, “Restructuring of deep neural network acoustic models with singular value decomposition.” in Interspeech, 2013, pp. 2365–2369.
    [14] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting linear structure within convolutional networks for efficient evaluation,” in Advances in Neural Information Processing Systems, 2014, pp. 1269–1277.
    [15] X. Zhang, J. Zou, K. He, and J. Sun, “Accelerating very deep convolutional networks for classification and detection,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 10, pp. 1943–1955, 2016.
    [16] X. Zhang, J. Zou, X. Ming, K. He, and J. Sun, “Efficient and accurate approximations of nonlinear convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1984–1992.
    [17] Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin, “Compression of deep convolutional neural networks for fast and low power mobile applications,” arXiv preprint arXiv:1511.06530, 2015.
    [18] X. Yu, T. Liu, X. Wang, and D. Tao, “On compressing deep models by low rank and sparse decomposition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
    [19] S. Basu and L. R. Varshney, “Universal source coding of deep neural networks,” in Data Compression Conference (DCC), 2017. IEEE, 2017, pp. 310–319.
    [20] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 1737–1746.
    [21] K. Hwang and W. Sung, “Fixed-point feedforward deep neural network design using weights+ 1, 0, and- 1,” in Signal Processing Systems (SiPS), 2014 IEEE Workshop on. IEEE, 2014, pp. 1–6.
    [22] Z. Ji, I. Ovsiannikov, Y. Wang, L. Shi, and Q. Zhang, “Reducing weight precision of convolutional neural networks towards large-scale on-chip image recognition,” in SPIE Sensing Technology+ Applications. International Society for Optics and Photonics, 2015, pp. 94 960A–94 960A.
    [23] Y. Gong, L. Liu, M. Yang, and L. Bourdev, “Compressing deep convolutional networks using vector quantization,” arXiv preprint arXiv:1412.6115, 2014.
    [24] V. Chandrasekhar, J. Lin, Q. Liao, O. Morère, A. Veillard, L. Duan, and T. Poggio, “Compression of deep neural networks for image instance retrieval,” in Data Compression Conference (DCC), 2017. IEEE, 2017, pp. 300–309.
    [25] M. Tu, V. Berisha, M. Woolf, J.-s. Seo, and Y. Cao, “Ranking the parameters of deep neural networks using the fisher information,” in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 2647–2651.
    [26] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding,” CoRR, vol. abs/1510.00149, 2015. [Online]. Available: http://arxiv.org/abs/1510.00149
    [27] A. Vedaldi and B. Fulkerson, “Vlfeat: An open and portable library of computer vision algorithms,” in Proceedings of the 18th ACM international conference on Multimedia. ACM, 2010, pp. 1469–1472.
    [28] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia. ACM, 2014, pp. 675–678.

    QR CODE