簡易檢索 / 詳目顯示

研究生: 羅 賓
Lo, Pin
論文名稱: 不同設計的高效卷積神經網路在圖像分類中的比較研究
Comparative study for different designs of efficient Convolutional neural network on the image classification
指導教授: 李雨青
Lee, Yu-Ching
口試委員: 張永佳
Chang, Yung-Chia
邱銘傳
Chiu, Ming-Chuan
學位類別: 碩士
Master
系所名稱: 工學院 - 工業工程與工程管理學系
Department of Industrial Engineering and Engineering Management
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 94
中文關鍵詞: 卷積神經網路全域連接層模型壓縮模型加速圖像分類
外文關鍵詞: convolutional neural network, batch normalization, compression, acceleration, image classification
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在過去幾年中,卷積神經網絡(CNN)已經在圖像識別問題中實現了高預測精度的結果,例如臉部屬性分析,物體檢測,細粒度識別,大規模地理定位和人臉辨識。然而,這些成功依賴於在昂貴的設備上運算和計算成本高,耗時長且內存密集的深層CNN模型。為了解決高計算複雜性的卷積神經網路,許多最近的技術在探討模型壓縮和加速,專注於設計或修改傳統的CNN模型,目的是減少訓練時間而且不降低精度。在本研究中實現了一種高效的CNN,其特徵在於(1)卷積濾波器變換,(2)對每層的變異量做單位初始化,(3)附加批量正規化層和(4)簡化的完全連接層。我們使用LSUV創建了CNN模型的基準,在GPU GTX 1060上創建了20層和使用了1,071,542個參數,並將不同設計的高效CNN的性能與此基准進行了比較。我們使用高效CNN的不同設計來解決的問題是圖像分類,這是最簡單的視覺辨識問題之一。最後,我們建構了包含622,630個參數的最終模型,僅需15,217秒即可實現90.33%的精度,並創建簡化的完全連接層。此外,我們還制定了設計高效CNN的使用說明。最終模型的結果還表明簡化的完全連接層是一種有用的方法。 數值實驗中使用的圖像是從CIFAR-10資料庫中收集的。


    For the past few years, convolutional neural networks (CNNs) have achieved high predictive precision in visual recognition problems, such as face attributes, object detection, fine-grained recognition, large-scale geolocation, and face embeddings. However, these successes relied on training computationally expensive, high time consuming and memory-intensive deep-layered CNN models on costly equipment. To address the high computational complexity, many recent techniques, namely, model compressions and accelerations [40, 42], focused on designing modified and efficient CNN models with the aim of reducing the training time without decreasing accuracies. This study implements an efficient CNN featuring (1) convolutional filter transformation, (2) layer-sequential unit-variance initialization, (3) additional batch norm layer and (4) simplified fully connected layer. We create a benchmark of CNN model with LSUV initialization and 20 layers and 1,071,542 parameters on GPU GTX 1060 and compare the performances of different designs of efficient CNN with this benchmark. The problem we attempt to resolve using the different designs of efficient CNN is image classification, which is among the simplest visual recognition tasks. Finally, we construct the Final Model which contains 622,630 parameters and takes only 15,217 seconds to achieve the accuracy 90.33% and create the simplified fully connected layer. Besides, we build the guideline of designing the efficient CNN. The outcome of the final model also indicates the simplified fully-connected layer is an useful method. The images used in the numerical experiments are collected from the standard CIFAR-10 library.

    Content Chapter 1 Introduction 5 1.1 Preliminaries on CNN 7 1.1.1 Architecture of Convolutional neural network 7 1.1.2 Convolutional layer 12 1.1.3 Rectified linear unit layer 14 1.1.4 Pooling layer 15 1.2 An Efficient CNN Layer 17 1.2.1. Modeling 17 1.2.2 Efficient convolution layer 18 Chapter 2 Literature review 21 2.1 Preliminaries on batch norm layer and LSUV method 22 2.1.1 Weight Initialization 22 2.1.2 Regularization 23 2.2 Batch Norm Layer 23 2.3 LSUV- Layer-sequential unit-variance initialization 25 Chapter 3 Methodology 28 3.1 Simplified fully connected layer 28 3.2 The architecture of efficient CNN 29 Chapter 4 Results 31 4.1 The environment of experiments 31 4.2 Database 31 4.3 Experiments 32 4.3.1 Dropout 33 4.3.2 Efficient convolutional layer 39 4.3.3 Max-pooling layer 46 4.3.4 Simplified fully connected layer 50 4.3.5 LSUV and batch norm layer 56 4.3.6 MobileNet and large model 59 4.3.7 Final model 61 Chapter 5 Conclusions 71 References 72 Appendix 79

    [1] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016, November). Tensorflow: a system for large-scale machine learning. In OSDI (Vol. 16, pp. 265-283).
    [2] Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. In Advances in neural information processing systems (pp. 971-980).
    [3] Benchmarks.AI. Retrieved December, 10, 2018, from https://benchmarks.ai/cifar-10.
    [4] Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in neural information processing systems (pp. 153-160).
    [5] Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740-750).
    [6] Chen, Q., Xu, J., & Koltun, V. (2017, October). Fast image processing with fully-convolutional networks. In IEEE International Conference on Computer Vision (Vol. 9, pp. 2516-2525).
    [7] Cheng, Y., Wang, D., Zhou, P., & Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.
    [8] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 1800-1807.
    [9] Cohen, T., & Welling, M. (2016, June). Group equivariant convolutional networks. In International conference on machine learning (pp. 2990-2999).
    [10]Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A. Y. (2012). Large scale distributed deep networks. In Advances in neural information processing systems (pp. 1223-1231).
    [11] Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2), 295-307.
    [12] Giusti, A., Ciresan, D. C., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2013, September). Fast image scanning with deep max-pooling convolutional neural networks. In Image Processing (ICIP), 2013 20th IEEE International Conference on (pp. 4034-4038). IEEE.
    [13] Glorot, X., & Bengio, Y. (2010, March). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256).
    [14] Glorot, X., Bordes, A., & Bengio, Y. (2011, June). Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 315-323).
    [15]Gong, Y., Liu, L., Yang, M., & Bourdev, L.D. (2014). Compressing Deep Convolutional Networks using Vector Quantization. CoRR, abs/1412.6115.
    [16] Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):1319-1327.
    [17] Graham, B. (2014). Fractional max-pooling. arXiv preprint arXiv:1412.6071.
    [18] Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., ... & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377.
    [19] Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In Proceedings of the 2016 International Conference on Learning Representation.
    [20] Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135-1143).
    [21] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
    [22] He, Y., Zhang, X., & Sun, J. (2017, October). Channel pruning for accelerating very deep neural networks. In International Conference on Computer Vision (ICCV) (Vol. 2, No. 6).
    [23] Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. CoRR, vol. abs/1503.02531
    [24] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
    [25] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017, July). Densely connected convolutional networks. In CVPR (Vol. 1, No. 2, p. 3).
    [26] Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016, October). Deep networks with stochastic depth. In European Conference on Computer Vision (pp. 646-661). Springer, Cham.
    [27] Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., & Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360.
    [28] Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:448-456.
    [29] Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017-2025).
    [30] Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866.
    [31] Jin, J., Dundar, A., & Culurciello, E. (2014). Flattened convolutional neural networks for feedforward acceleration. CoRR, abs/1412.5474.
    [32] Kim, J. K., Lee, M. Y., Kim, J. Y., Kim, B. J., & Lee, J. H. (2016, October). An efficient pruning and weight sharing method for neural network. In Consumer Electronics-Asia (ICCE-Asia), IEEE International Conference on (pp. 1-2). IEEE.
    [33] Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images, https://www.cs.toronto.edu/~kriz/cifar.html.
    [34] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
    [35] Lavin, A., & Gray, S. (2016). Fast algorithms for convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4013-4021).
    [36] Lebedev, V., & Lempitsky, V. (2016). Fast convnets using group-wise brain damage. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2554-2564).
    [37] Lin, M., Chen, Q., & Yan, S. (2013). Network in network. arXiv preprint arXiv:1312.4400.
    [38] Lu, Y., Kumar, A., Zhai, S., Cheng, Y., Javidi, T., & Feris, R. S. (2017, July). Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification. In CVPR (Vol. 1, No. 2, p. 6).
    [39] Mishkin, D., & Matas, J. (2015). All you need is a good init. In Proceedings of the 2016 International Conference on Learning Representation.
    [40] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).
    [41] Nasrabadi, N. M. (2007). Pattern recognition and machine learning. Journal of electronic imaging, 16(4), 049901.
    [42] Ng, A. Y. (2004, July). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning (p. 78). ACM.
    [43] Nguyen, D., & Widrow, B. (1990, June). Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Neural Networks, 1990., 1990 IJCNN International Joint Conference on (pp. 21-26). IEEE.
    [44] Ovtcharov, K., Ruwase, O., Kim, J. Y., Fowers, J., Strauss, K., & Chung, E. S. (2015). Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, 2(11).
    [45] Park, M. Y., & Hastie, T. (2007). L1‐regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4), 659-677.
    [46] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.
    [47] Povey, D., Zhang, X., & Khudanpur, S. (2014). Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging. CoRR, abs/1410.7455.
    [48] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6), 1137-1149.
    [49] Rigamonti, R., Sironi, A., Lepetit, V., & Fua, P. (2013). Learning separable filters. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2754-2761).
    [50] Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. In Proceedings of the 2015 International Conference on Learning Representation.
    [51] Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in Neural Information Processing Systems (pp. 901-909).
    [52] Saxe, A. M., McClelland, J. L., & Ganguli, S. (2013). Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In Y. Bengio & Y. LeCun (Eds.), CoRR, abs/1312.6120.
    [53] Schölkopf, B., Smola, A. J., & Bach, F. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
    [54] Sercu, T., Puhrsch, C., Kingsbury, B., & LeCun, Y. (2016). Very deep multilingual convolutional neural networks for LVCSR. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 4955-4959). IEEE.
    [55] Shang, W., Sohn, K., Almeida, D., & Lee, H. (2016, June). Understanding and improving convolutional neural networks via concatenated rectified linear units. In International Conference on Machine Learning (pp. 2217-2225).
    [56] Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2), 227-244.
    [57] Sifre, L., & Mallat, S. (2014). Rigid-motion scattering for image classification (Doctoral dissertation, PhD thesis, Ph. D. thesis).
    [58] Simon Wiesler, A. R., Schlüter, R., & Ney, H. (2014, May). Mean-normalized stochastic gradient for large-scale deep learning. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, Italy (pp. 180-184).
    [59] Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. In Proceedings of the 2015 International Conference on Learning Representation.
    [60] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
    [61] Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013, February). On the importance of initialization and momentum in deep learning. In International conference on machine learning (pp. 1139-1147).
    [62] Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017, February). Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI (Vol. 4, p. 12).
    [63] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).
    [64] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826).
    [65] Vanhoucke, V., Senior, A., & Mao, M. Z. (2011, December). Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop (Vol. 1, p. 4).
    [66] Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., & Fergus, R. (2013, February). Regularization of neural networks using dropconnect. In International Conference on Machine Learning (pp. 1058-1066).
    [67] Wang, M., Liu, B., & Foroosh, H. (2016). Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial" Bottleneck" Structure. arXiv preprint arXiv:1608.04337.
    [68] Wu, H., & Gu, X. (2015, November). Max-pooling dropout for regularization of convolutional neural networks. In International Conference on Neural Information Processing (pp. 46-54). Springer, Cham.
    [69] Yam, J. Y., & Chow, T. W. (2000). A weight initialization method for improving training speed in feedforward neural network. Neurocomputing, 30(1-4), 219-232.
    [70] Yin, Z., Wan, B., Yuan, F., Xia, X., & Shi, J. (2017). A deep normalization and convolutional neural network for image smoke detection. Ieee Access, 5, 18429-18438.
    [71] Zeiler, M., & Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional neural networks. In Proceedings of the 2013 International Conference on Learning Representation.
    [72] Zhai, S., Cheng, Y., Zhang, Z. M., & Lu, W. (2016). Doubly convolutional neural networks. In Advances in neural information processing systems (pp. 1082-1090).

    QR CODE