研究生: |
林哲維 Lin, Che-Wei |
---|---|
論文名稱: |
多重損失評估卷積網路於影像內容分割之應用 Multi-loss Convolutional Networks for Semantic Segmentation |
指導教授: |
陳煥宗
Hwann-Tzong Chen |
口試委員: |
劉庭祿
Tyng-Luh Liu 賴尚宏 Shang-Hong Lai |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2015 |
畢業學年度: | 103 |
語文別: | 英文 |
論文頁數: | 25 |
中文關鍵詞: | 卷積類神經網路 、影像內容分割 、多重損失評估 |
外文關鍵詞: | Convolutional Neural Network, Semantic Segmentation, Multi-loss |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文研究的主題是場景物件辨識與切割。透過訓練深層卷積類神經網路,讓 照片中的每個像素都能夠被分類。與傳統非類神經網路為主的學習方法不同的是, 不需要取大量不同的特徵向量對於訓練類神經網路,我們提高了丟失率 (dropout rate) 以及利用不同的能量函數來衡量類神經網路的效能,比單一能量函數的衡量更 能提高對於小物件的辨識率。接著利用前述方法對於不同類別都有很高偵測率的特 性,設計了一套利用擴張小物體面積來讓小物體能更顯著被分類的方法。
最後藉由實驗結果探討改進之處,發現擴張小物體面積的技巧對於物體分割 的準確度的關係,以及了解多重能量函數對於類神經網路能否同時執行多項不同的 作業有無幫助。
This thesis presents a semantic segmentation method based on fully-convolutional network (FCN). We focus on increasing mean-class accuracy by adding other steps that help FCN to find more small objects: i) modulating the dropout rates, ii) combining multiple loss functions, and iii) expanding small object areas.
Our approach shows that the above steps can significantly increase mean-class accuracy without sacrifice too much per-pixel accuracy. We also provide experimental observations on the relationship between the area-expanding method and the CNN model. Finally, we discuss how to improve the workflow and what we have learned from the experiments of training with multi-loss functions.
[1] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR, 2015.
[2] Yi-Ting Chen, Xiaokai Liu, and Ming-Hsuan Yang. Multi-instance object segmentation with occlusion handling. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[3] Mark Everingham, Luc J. Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303–338, 2010.
[4] Cle ́ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. Scene parsing with multiscale feature learning, purity trees, and optimal covers. In ICML, 2012.
[5] Cle ́ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1915–1929, 2013.
[6] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient graph-based image segmentation. International Journal of Computer Vision, 59(2):167–181, 2004.
[7] Marian George. Image parsing with a wide range of classes and scene-level context. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[8] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 580–587, 2014. RCNN, R-CNN.
[9] Stephen Gould, Richard Fulton, and Daphne Koller. Decomposing a scene into geometric and semantically consistent regions. In ICCV, pages 1–8, 2009.
[10] Bharath Hariharan, Pablo Andre ́s Arbela ́ez, Ross B. Girshick, and Jitendra Malik. Simultaneous detection and segmentation. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII, pages 297–312, 2014.
[11] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.
[12] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
[13] Ce Liu, Jenny Yuen, and Antonio Torralba. Nonparametric scene parsing: Label transfer via dense scene alignment. In CVPR, pages 1972–1979, 2009.
[14] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[15] Mohammadreza Mostajabi, Payman Yadollahpour, and Gregory Shakhnarovich. Feedforward semantic segmentation with zoom-out features. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[16] Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee. Learning object relationships via graph-based context model. In CVPR, pages 2727–2734, 2012.
[17] Heesoo Myeong and Kyoung Mu Lee. Tensor-based high-order semantic relation transfer for semantic scene segmentation. In CVPR, pages 3073–3080, 2013.
[18] Abhishek Sharma, Oncel Tuzel, and David W. Jacobs. Deep hierarchical parsing for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[19] Jamie Shotton, Matthew Johnson, and Roberto Cipolla. Semantic texton forests for image categorization and segmentation. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, An- chorage, Alaska, USA, 2008.
[20] Bing Shuai, Gang Wang, Zhen Zuo, Bing Wang, and Lifan Zhao. Integrating parametric and non-parametric models for scene labeling. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[21] Gautam Singh and Jana Kosecka. Nonparametric scene parsing with adaptive feature relevance and semantic context. In CVPR, pages 3151–3157, 2013.
[22] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014.
[23] Joseph Tighe and Svetlana Lazebnik. Understanding scenes on many levels. In ICCV, pages 335–342, 2011.
[24] Joseph Tighe and Svetlana Lazebnik. Finding things: Image parsing with regions and per-exemplar detectors. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, June 23-28, 2013, pages 3001–3008, 2013.
[25] Joseph Tighe and Svetlana Lazebnik. Superparsing - scalable nonparametric image parsing with superpixels. International Journal of Computer Vision, 101(2):329–349, 2013.
[26] Joseph Tighe, Marc Niethammer, and Svetlana Lazebnik. Scene parsing with object instance inference using regions and per-exemplar detectors. International Journal of Computer Vision, 112(2):150–171, 2015.
[27] Frederick Tung and James J. Little. Collageparsing: Nonparametric scene parsing by adaptive overlapping windows. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI, pages 511–525, 2014.
[28] A. Vedaldi and K. Lenc. Matconvnet – convolutional neural networks for matlab. CoRR, abs/1412.4564, 2014.
[29] Jimei Yang, Brian L. Price, Scott Cohen, and Ming-Hsuan Yang.Context driven scene parsing with attention to rare classes. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pages 3294–3301, 2014.
[30] Wei Zhang, Sheng Zeng, Dequan Wang, and Xiangyang Xue. Weakly supervised semantic segmentation for social images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
[31] Yukun Zhu, Raquel Urtasun, Ruslan Salakhutdinov, and Sanja Fidler. segdeepm: Exploiting segmentation and context in deep neural networks for object detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.