研究生: |
鍾杰峰 Chung, Chieh-Feng |
---|---|
論文名稱: |
以簡化群體演算法優化卷積神經網路結構及超參數調整 Optimization Convolutional Neural Network Architecture and Hyperparameter Using Simplified Swarm Optimization |
指導教授: |
葉維彰
Yeh, Wei-Chang |
口試委員: |
邱銘傳
Chiu, Ming-Chuan 黃佳玲 Huang, Chia-Ling |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 工業工程與工程管理學系 Department of Industrial Engineering and Engineering Management |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 50 |
中文關鍵詞: | 神經網路架構搜尋 、超參數優化 、卷積神經網路 |
外文關鍵詞: | Neural architecture search, Hyperparameter optimization, Convolutional neural network |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
深度學習是一項功能強大的機器學習技術,其在圖像分類,語義圖像分割等方面均扮演重要的角色。鑑於圖片的高維度及同類圖片間的變異性,圖像分類任務一直是機器學習艱鉅的任務之一;而卷積神經網路(CNN)在處理圖像分類問題上具有優異的表現。然而,手動設計一個優秀的卷積神經網路模型除了需要具備豐富的相關領域知識外,更需耗費大量的時間。神經網路結構搜索(NAS)和超參數優化(HPO)可以通過多種搜索策略自動搜尋網路的結構及超參數的良好組合。本研究提出了一種基於簡化群體演算法 (SSO) 的演算法作為搜索策略,並納入三種不同的更新機制於演算法中以尋找良好的卷積神經網路結構及超參數組合。此外,所提出的演算法在優化過程中亦將變數的重要性及值域納入權衡。為了使卷積神經網路的結構和超參數可以被提出的搜索策略進行優化,需要一種將網路結構及超參數訊息映射到演算法編碼中的編碼策略,故本研究亦提出一種三級式編碼策略,將卷積神經網路結構和超參數編碼成演算法的解。為了驗證所提出方法的表現,本研究於Convex及MNIST-RB資料集上與基於演化算法的神經網路結構搜索方法及著名的卷積神經網路模型進行比較。實驗結果顯示,本研究的提出的方法可以的達到相當具有競爭力的表現。
Deep learning is one of the powerful machine learning techniques in the world and plays an important role in image classification, semantic image segmentation, etc. Among the tasks, the image classification task is one of the hard jobs for machine learning due to the high dimensionality of images and the variability in the same class. Convolutional Neural Networks (CNNs) take an important place in it. But design a prominent CNN manually requires ample domain knowledge and is a time-consumption job. The Neural architecture search (NAS) and hyperparameter optimization (HPO) can automatically search the good combination of architecture and hyperparameters of the network through a variety of search strategies. In this study, we proposed an SSO-based algorithm as a search strategy that applies three different kinds of update mechanisms to find the good structure and hyperparameters combination of CNN. Moreover, this algorithm took the importance and the range of value of variables into consideration during the search process. For allowing CNN's architecture and hyperparameters can be optimized by the search strategy, an encoding strategy is required which mapping the network's information to a series of encoding. We encoded the CNN structure and hyperparameters into solutions by a three-level encoding strategy and contained different CNN's information at varying levels. To verify our approach's performance, we compare image classification accuracy on the Convex and MNIST-RB datasets with competitors, including evolutionary NAS approaches and famous CNN models. The experiment results indicate our approach could achieve promising performance compare to rivals.
[1] I. Sutskever, G. E. Hinton, and A. Krizhevsky, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems,pp. 1097–1105, 2012.
[2] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab:Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence,
vol. 40, no. 4, pp. 834–848, 2017.
[3] A. Barbu, M. Suehling, X. Xu, D. Liu, S. K. Zhou, and D. Comaniciu, “Automatic detection and segmentation of lymph nodes from ct data,” IEEE Transactions on Medical Imaging, vol. 31, no. 2, pp. 240–250, 2011.
[4] F. Sultana, A. Sufian, and P. Dutta, “Advancements in image classification using convolutional neural network,” in 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 122–129, IEEE, 2018.
[5] B. Wang, Y. Sun, B. Xue, and M. Zhang, “Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification,” in 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, IEEE, 2018.
[6] T. Wei, C. Wang, Y. Rui, and C. W. Chen, “Network morphism,” in International Conference on Machine Learning, pp. 564–572, 2016.
[7] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710, 2018.
[8] E. Real, A. Aggarwal, Y. Huang, and Q. Le, “Aging evolution for image classifier architecture search,” in AAAI Conference on Artificial Intelligence, 2019.
[9] E. Hazan, A. Klivans, and Y. Yuan, “Hyperparameter optimization: A spectral approach,” arXiv preprint arXiv:1706.00764, 2017.
[10] T. Sinha, A. Haidar, and B. Verma, “Particle swarm optimization based approach for finding optimal values of convolutional neural network parameters,” in 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–6, IEEE, 2018.
[11] X. Zhang, X. Chen, L. Yao, C. Ge, and M. Dong, “Deep neural network hyperparameter optimization with orthogonal array tuning,” arXiv preprint arXiv:1907.13359, 2019.
[12] H. Pérez-Espinosa, H. Avila-George, J. Rodriguez-Jacobo, H. A. Cruz-Mendoza, J. Martínez-Miranda, and I. Espinosa-Curiel, “Tuning the parameters of a convolutional
artificial neural network by using covering arrays,” Research in Computing Science, vol. 121, pp. 69–81, 2016.
[13] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
[14] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyperparameter optimization,” in Advances in neural information processing systems, pp. 2546–2554, 2011.
[15] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in neural information processing systems, pp. 2951–2959, 2012.
[16] T. Yamasaki, T. Honma, and K. Aizawa, “Efficient optimization of convolutional neural networks using particle swarm optimization,” in 2017 IEEE Third International Conference on Multimedia Big Data (BigMM), pp. 70–73, IEEE, 2017.
[17] B. Wang, Y. Sun, B. Xue, and M. Zhang, “A hybrid ga-pso method for evolving architecture and short connections of deep convolutional neural networks,” arXiv preprint arXiv:1903.03893, 2019.
[18] W.-C. Yeh, “A two-stage discrete particle swarm optimization for the problem of multiple multi-level redundancy allocation in series systems,” Expert Systems with Applications, vol. 36, no. 5, pp. 9192–9200, 2009.
[19] W.-C. Yeh, “An improved simplified swarm optimization,” Knowledge-Based Systems, vol. 82, pp. 60–69, 2015.
[20] W.-C. Yeh and C.-M. Lai, “Accelerated simplified swarm optimization with exploitation search scheme for data clustering,” PloS one, vol. 10, no. 9, p. e0137246, 2015.
[21] W.-C. Yeh, W.-W. Chang, and Y. Y. Chung, “A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method,” Expert Systems with Applications, vol. 36, no. 4, pp. 8204–8211, 2009.
[22] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
[23] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
[24] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” arXiv preprint arXiv:1603.07285, 2016.
[25] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[26] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[27] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[28] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
[29] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
[31] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inceptionresnet
and the impact of residual connections on learning,” in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[33] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, pp. 630–645, Springer, 2016.
[34] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng, “Dual path networks,” in Advances in Neural Information Processing Systems, pp. 4467–4475, 2017.
[35] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 510–519, 2019.
[36] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” arXiv preprint arXiv:1905.11946, 2019.
[37] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[38] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.
[39] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu,
R. Pang, V. Vasudevan, et al., “Searching for mobilenetv3,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324, 2019.
[40] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, “An empirical evaluation of deep architectures on problems with many factors of variation,” in Proceedings of the 24th international conference on Machine learning, pp. 473–
480, ACM, 2007.
[41] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” Journal of Machine Learning Research, vol. 13, no. Feb, pp. 281–305, 2012.
[42] B. Zoph, V. Vasudevan, J. Shlens, and Q. Le, “Learning transferable architectures for scalable image recognition. arxiv, cs,” 2017.
[43] Z. Zhong, J. Yan, W. Wu, J. Shao, and C.-L. Liu, “Practical block-wise neural network architecture generation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2423–2432, 2018.
[44] K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,” Evolutionary computation, vol. 10, no. 2, pp. 99–127, 2002.
[45] L. Xie and A. Yuille, “Genetic cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1379–1388, 2017.
[46] M. Suganuma, S. Shirakawa, and T. Nagao, “A genetic programming approach to designing convolutional neural network architectures,” in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 497–504, ACM, 2017.
[47] M. Carvalho and T. Ludermir, “Particle swarm optimization of neural network architectures and weights,” pp. 336–339, 10 2007.
[48] B. Wang, Y. Sun, B. Xue, and M. Zhang, “Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification,” in 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8, 2018.
[49] B. Wang, Y. Sun, B. Xue, and M. Zhang, “A hybrid differential evolution approach to designing deep convolutional neural networks for image classification,” in Australasian Joint Conference on Artificial Intelligence, pp. 237–250, Springer, 2018.
[50] F. E. F. Junior and G. G. Yen, “Particle swarm optimization of deep neural networks architectures for image classification,” Swarm and Evolutionary Computation, vol. 49, pp. 62–74, 2019.
[51] W.-C. Yeh, “New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 4, pp. 661–665, 2013.
[52] W.-C. Yeh, “Orthogonal simplified swarm optimization for the series–parallel redundancy allocation problem with a mix of components,” Knowledge-Based Systems, vol. 64, pp. 1–12, 2014.
[53] W.-C. Yeh, C.-M. Lai, Y.-C. Huang, T.-W. Cheng, H.-P. Huang, and Y. Jiang, “Simplified swarm optimization for task assignment problem in distributed computing system,” in 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 773–776, IEEE,2017.
[54] W. Yeh, “New parameter-free simplified swarm optimization for artificial neural network training and its application in the prediction of time series,” IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 4, pp. 661–665,2013.
[55] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
[56] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the aaai conference on artificial intelligence, vol. 33, pp. 4780–4789, 2019.
[57] C. Saltori, S. Roy, N. Sebe, and G. Iacca, “Regularized evolutionary algorithm for dynamic neural topology search,” in International Conference on Image Analysis and Processing, pp. 219–230, Springer, 2019.
[58] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
[59] “Analysis of variance from summary data.” https://statpages.
info/anova1sm.html?fbclid=IwAR0Q28ELqgAPUvT-_
hhnVmaG2FHIinTJZtYXs2Idz2VJVgpFr-34NcJyMM4. Accessed: 2020-05-
28.