簡易檢索 / 詳目顯示

研究生: 徐慧文
Hsu, Hui-Wen.
論文名稱: 利用重組網路第二版於增強上下文網路的實時語義切割
Enhanced Context Network with ShufflenetV2 for Real-Time Semantic Segmentation
指導教授: 張隆紋
Chang, Long-Wen
口試委員: 邱瀞德
Chiu, Ching-Te
胡敏君
Hu, Min-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 31
中文關鍵詞: 實時語意切割
外文關鍵詞: real-time, semantic-segmentation
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來無人駕駛課題是個熱門的研究,要如何在有限的硬體資源上運行深度學習的網路是目前主要的研究方向。為了追求實時性(real-time),大多方法都是減少網路模型的計算量或者參數量,試著在精準度和速度中找到一個平衡。近期的實時語義分割網路(real-time semantic segmentation network)大多會結合輕量級(light-weight)的卷積類神經網路和能賦予資訊不同重要性的注意力機制(attention mechanism)使性能提高。
    在這篇論文中,我們提出一個利用輕量級神經網路-重組網路第二版(shufflenetV2)提取圖片特徵的增強上下文網路(Enhanced Context Network)。我們的網路能在參數量少的情況得到語義豐富的特徵圖。因為shufflenetV2沒有著重空間資訊的提取,所以我們加入所提出的像素注意力模塊(pixel attention block)。所提出的注意力模塊主要概念是使用少量的參數建立每個pixel與其他pixel之間的相關性,使shufflenetv2能增強空間資訊的提取而恢復比較多圖片細節然後提高精確度。此外,我們利用U-net的方式把低級特徵(low-level feature maps)與高級特徵(high-level feature maps)一層一層地融合在一起,提高分割效果。
    我們使用CamVid和Cityscapes的數據集來評估我們的架構,CamVid測出來的mIOU為-67.54%,而Cityscapes測出68.2%。從結果來看,我們提出的像素注意力模塊有助於取得空間資訊然後提高精確度。因為像素注意力模塊的參數量不多,我們的網絡可以應用於實時語義分割。


    In recent years, the autonomous driving subject is a hot research. How to run a deep learning network on the limited hardware resources in real time is currently a main research direction. Most methods reduce the computational cost of the network to achieve a balance between the accuracy and the speed. Recently, most real-time semantic segmentation networks combine a light-weight convolutional neural network(CNN) with an attention mechanism that distinguishes the different importance of information in the network to boost network performance.
    In this thesis, we propose the enhanced context network that uses the light-weight neural network---shufflenetV2 to extract picture features so that the network can obtain rich semantic feature maps with fewer parameters. We also propose the pixel attention block because the spatial information extracted by the shufflenetv2 is insufficient. The core idea of the pixel attention block is that it establishes the correlation between each pixel and other pixels with few parameters. Therefore, our proposed network can enhance the extraction of spatial information and then restore more picture details to improve accuracy. In addition, we use U-net[2] to combine low-level feature maps with high-level feature maps layer by layer to improve our network performance.
    We used CamVid and Cityscapes datasets to evaluate our network. We are able to obtain mean-IoU 67.54% on CamVid and 68.2% on Cityscapes with our proposed network. Since the calculation of parameters in the pixel attention block is reduced, our network can be applied in real-time semantic segmentation.

    Chapter 1. Introduction 1 Chapter 2. Related Work 5 Chapter 3. The Proposed Method 11 Chapter 4. Experiment Results 20 Chapter 5. Conclusions 28 Reference 29

    [1] J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431-3440.
    [2] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (2015)
    [3] Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12), 2481–2495 (2017)
    [4] Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: A deep neural network architecture for real-time semantic segmentation. arXiv (2016)
    [5] Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. Icnet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545, 2017
    [6] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. arXiv preprint arXiv:1808.00897, 2018
    [7] Hanchao Li, Pengfei Xiong, Haoqiang Fan, Jian Sun. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. arXiv:1904.02216,2019
    [8] X. Wang, R. Girshick, A. Gupta, and K. He. Non-local neural networks. In CVPR, 2018
    [9] Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, Wenyu Liu. CCNet: Criss-Cross Attention for Semantic Segmentation. arXiv:1811.11721, 2018
    [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
    [11] Franc¸ois Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv preprint, pages 1610–02357, 2017
    [12] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
    [13] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
    [14] Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: Practical guidelines for efficient cnn architecture design. Lecture Notes in Computer Science p. 122–138 (2018)
    [15] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2018
    [16] Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker. Learning to Adapt Structured Output Space for Semantic Segmentation. arXiv:1802.10349, 2018
    [17] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Y. Jia, “Pyramid scene parsing network,” in CVPR, 2016, pp. 6230–6239
    [18] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016
    [19] Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang. DenseASPP for Semantic Segmentation in Street Scenes. In CVPR2018
    [20] Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv (2017)
    [21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008, 2017.
    [22] J. Cheng, L. Dong, and M. Lapata. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733, 2016.
    [23] Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena. Self-Attention Generative Adversarial Networks. arXiv:1805.08318, 2018
    [24] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in ECCV (1), 2008, pp. 44–57.
    [25] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding.
    [26] G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in ECCV (1), 2008, pp. 44–57.
    [27] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

    QR CODE