基於全域卷積網路與串連特徵圖的語意分割｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	王傳凱 Wang, Chuan-Kai
論文名稱：	基於全域卷積網路與串連特徵圖的語意分割 Semantic Segmentation via Global Convolutional Network and Concatenated Feature Maps
指導教授：	張隆紋 Chang, Long-Wen
口試委員:	邱瀞德 Chiu, Ching-Te 許秋婷 Hsu, Chiou-Ting
學位類別：	碩士 Master
系所名稱：
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	39
中文關鍵詞：	語意分割、深度學習、卷積類神經網路
外文關鍵詞：	Semantic Segmentation, Deep Learning, CNN
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

自從Long 等人發表了基於經訓練的分類卷積類神經網路(CNN)的完全卷積網路(FCN)，更多的方法開始用分類CNN來建構他們的語意分割CNN並在許多具挑戰性的資料集上取得一流的表現。由於ResNet在2015年取得了ImageNet分類競賽的冠軍，大部分的分割CNN選擇將他們的網路建構在 ResNet 之上。
去年，黃等人提出了一個新的分類CNN名叫DenseNet。在這之後Jégou等人僅僅用一連串的DenseNet建構模塊來建造他們的整個分割CNN，名叫FC-DenseNet，就在CamVid資料集上取得了一流的成果。
在這篇論文中，我們希望能證明直接把DenseNet建構模塊實做到分割CNN並不是應用它最好的方法。因此，我們用DenseNet的設計概念來修改一個基於ResNet架構名為GCN的分割CNN，並建構我們自己的類神經網路。我們的網路用更少的訓練參數便在CamVid資料集上取得了69.34%的mean-IoU成績，超越了FC-DenseNet在論文中取得的66.9%。

After Long et al. proposed the fully convolutional network (FCN) based on a pre-trained classification convolutional neural network (CNN), more methods started using classification CNN to build their semantic segmentation CNN and achieved state-of-the-art performances on challenging datasets. Since ResNet won the first place of ImageNet classification task in 2015. Most of the segmentation CNN chose to build their network based on the ResNet.
Recently, Huang et al. introduced a new classification CNN called DenseNet. Then Jégou et al. simply used a sequence of building blocks of DenseNet to build their entire segmentation CNN, called FC-DenseNet, and achieved state-of-the-art result on CamVid dataset.
In this thesis, we want to prove that implement DenseNet building block directly into a segmentation CNN is not the best way of using it. Therefore, we implement the design concept of DenseNet into a ResNet-based segmentation CNN called GCN and build our own network. Our network uses less computation resource to obtain a mean-IoU score of 69.34% on CamVid dataset, surpass the 66.9% obtained in the paper of FC-DenseNet.

Chapter 1. Introduction. . . . . . . . . . . . .  1
Chapter 2. Related Work. . . . . . . . . . . . .  4
1 Classification CNNs. . . . . . . . . . . . .  4
2 Multi-Scale Segmentation CNNs. . . . . . . .  5
3 FCN-Based Segmentation CNNs. . . . . . . . .  6
3.1 FC-DenseNet. . . . . . . . . . . . . . . .  7
3.2 Global Convolutional Network (GCN) . . . .  7
Chapter 3. The Proposed Method . . . . . . . . .  9
1 Overview . . . . . . . . . . . . . . . . . .  9
2 Encoder Network. . . . . . . . . . . . . . . 10
3 Decoder Network. . . . . . . . . . . . . . . 11
3.1 Global Convolution Block . . . . . . . . . 11
3.2 Boundary Refinement Block. . . . . . . . . 13
3.3 Transposed Convolution Layer . . . . . . . 14
4 Training Settings. . . . . . . . . . . . . . 16
Chapter 4. Experiment Results. . . . . . . . . . 18
1 Finding the Best Setting of Our Network. . . 18
1.1 DenseNet Depth . . . . . . . . . . . . . . 18
1.2 Global Convolution Kernel Size . . . . . . 19
1.3 Up-Sampling Method . . . . . . . . . . . . 20
2 Comparison between other networks. . . . . . 20
Chapter 5. Conclusions . . . . . . . . . . . . . 33
Reference. . . . . . . . . . . . . . . . . . . . 34

                                

[1] A. Krizhevsky, I. Sutskever and G.E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Conference on Neural Information Processing Systems, 2012, pp. 1097-1105.
[2] K. Simonyan, and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in CoRR, abs/1409.1556, 2014.
[3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.
[4] K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[5] G. Huang, Z. Liu, L. van der Maaten and K. Q. Weinberger, “Densely connected convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2261-2269.
[6] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li and F. F. Li, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248-255.
[7] T. Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick and P. Dollár, “Microsoft COCO: Common objects in context,” in European Conference on Computer Vision, 2014, pp. 740-755.
[8] A. Krizhevsky and G. E. Hinton, “Learning multiple layers of features from tiny images,” in Tech Report, 2009.
[9] J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.
[10] G. Lin, A. Milan, C. Shen and I. Reid, “RefineNet: Multi-path refinement networks for high-resolution semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.5168-5177.
[11] C. Peng, X. Zhang, G. Yu, G. Luo and J. Sun, “Large kernel matters -- Improve semantic segmentation by global convolutional network,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.1743-1751.
[12] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero and Y. Bengio, “The one hundred layers tiramisu: Fully convolutional DenseNets for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1175-1183.
[13] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn and A. Zisserman, “The PASCAL visual object classes (VOC) challenge,” in International journal of computer vision, 2010, pp. 303-338.
[14] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213-3223.
[15] G. J. Brostow, J. Shotton, J. Fauqueur and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in European Conference on Computer Vision, 2008, pp. 44-57.
[16] L. C. Chen, Y. Yang, J. Wang, W. Xu and A. L. Yuille, “Attention to scale: Scale-aware semantic image segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3640-3649.
[17] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, pp. 834-848.
[18] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, “Pyramid scene parsing network,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2881-2891.
[19] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, 2015, pp. 448-456.
[20] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in International Conference on Machine Learning, 2010, pp. 807-814.
[21] H. Noh, S. Hong and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1520-1528.
[22] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer, “Automatic differentiation in PyTorch,” 2017.
[23] K. He, X. Zhang, S. Ren and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026-1034.
[24] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” in CoRR, abs/1412.6980, 2014.

簡易檢索 / 詳目顯示

相關論文