簡易檢索 / 詳目顯示

研究生: 劉育瑋
Liu, Yu-Wei.
論文名稱: 基於特徵聚合雙注意力模塊的實時語意分割
DAFNet: Dual Attention with Feature Aggregation Network for Real-Time Semantic Segmentation
指導教授: 張隆紋
Chang, Long-Wen.
口試委員: 李濬屹
Lee, Chun-Yi.
陳朝欽
Chen, Chaur-Chin.
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 40
中文關鍵詞: 語意分割注意力特徵聚合深度學習
外文關鍵詞: semantic segmentation, attention, feature aggregation, deep learning
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,語意分割網路試著減少計算資源消耗以達到實時的效果,然而這時常面臨了精準度不佳的情形,為了解決上述問題,我們在盡可能少的計算資源消耗提升下達到精準度的提升。我們基於BiSeNet (Bilateral segmentation network) 設計了實時語意分割網路,並將其命名為DAFNet (Dual Attention with Feature Aggregation Network)。如同BiSeNet設計,DAFNet也包含兩條路徑,語意路徑以及空間路徑。語意路徑透過較深的網路架構以取得精準的語意訊息;而空間路徑則是使用較淺的網路以保留圖片細節。為了探討特徵與特徵之間的關係,我們提出包含頻道間注意力模塊與像素間注意力模塊的"雙注意力區塊",我們透過高層級特徵與低層級特徵的互補性將雙注意力區塊設計為雙輸入,另外,基於即時的需求,我們除了加速聚合區塊運算也採用非對稱全局區塊減少了雙注意力區塊的計算,同時我們也加入了精煉殘差區塊再度優化效果,最後,我們在不降低幀率 (FPS) 下,在Cityscapes資料集及CamVid資料集獲得明顯的精準度提升。


    In recent years, semantic segmentation networks have tried to reduce the computing resource consumption to achieve real-time effects. However, this is often faced with poor accuracy. In order to solve the above-mentioned problems, we have achieved an improvement of accuracy with a little computing resource consumption. We design our DAFNet (Dual Attention with Feature Aggregation Network) based on BiSeNet (Bilateral segmentation network). Like BiSeNet design, our method also includes two paths, a context path and a spatial path. The context path uses a deep network structure to obtain precise semantic information and the spatial path uses a shallower network to preserve the details of pictures. To explore the relationship between image features, we propose Dual Attention Block (DAB) that includes a channel-wise attention module and a point-wise attention module. We design our DAB as a dual input through the complementarity of the high-level features and the low-level features. Based on real-time demands, in addition to accelerating fusion block operation through Path Fusion Block (PFB), we also use asymmetric non-local blocks to reduce the amount of calculation for DAB. At the same time, we add Refine Residual Block (RRB) to promote the effect again. Finally, we have made significant accuracy improvement on Cityscapes dataset and CamVid dataset.

    Contents Chapter 1. Introduction 1 Chapter 2. Related Works 4 Chapter 3. The Proposed Method 11 Chapter 4. Experiment Results 23 Chapter 5. Conclusion 35 References 36

    References
    [1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
    [2] L. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” CoRR, vol. abs/1706.05587, 2017.
    [3] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
    [4] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Learning a discriminative feature network for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
    [5] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of the Neural Information and Processing Systems, 2012.
    [6] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proceedings of the International Conference on Learning Representations, 2015.
    [7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    [8] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241.
    [9] L. Chen et al., “SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017
    [10] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, ‘‘CBAM: Convolutional block attention module,’’ in Proceedings of the European Conference on Computer Vision, 2018, pp. 3–19.
    [11] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
    [12] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena. “Self-attention generative adversarial networks.” CoRR, abs/1805.08318, 2018
    [13] J. Fu, J. Liu, H. Tian, and Y. Li, “Dual attention network for scene segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
    [14] A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” CoRR, vol. abs/1606.02147, 2016.
    [15] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, “ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation,” in Proceedings of the European Conference on Computer Vision, 2018.
    [16] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang. Bisenet: “Bilateral segmentation network for real-time semantic segmentation.” in Proceedings of the European Conference on Computer Vision, 2018.
    [17] X. Ding, Y. Guo, G. Ding, and J. Han, ‘‘ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks,’’ in Proceedings of the IEEE International Conference on Computer Vision, 2019.
    [18] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” arXiv:1511.00561, 2015.
    [19] Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai, “Asymmetric non-local neural networks for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019.
    [20] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
    [21] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Proceedings of the Neural Information and Processing Systems, 2012.
    [22] A. G. Howard et al., “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861v1 [cs.CV], 2017.
    [23] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
    [24] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.” arXiv preprint arXiv:1606.00915, 2016.
    [25] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proceedings of the International Conference on Learning Representations, 2016.
    [26] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 346–361.
    [27] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proceedings of the International Conference on Learning Representations, 2016.
    [28] S. Ioffe and C. Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift.” in Proceedings of the International Conference on Machine Learning, 2015
    [29] A.C. W. Liu, A. Rabinovich, and A. C. Berg: “ParseNet. Looking wider to see better.” in Proceedings of the International Conference on Learning Representations, 2016.
    [30] M. Cordts, et al., “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    [31] G. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla, “Segmentation and recognition using structure from motion point clouds,” in Proceedings of the European Conference on Computer Vision, 2008.
    [32] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proceedings of the International Conference on Machine Learning, 2010
    [33] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.
    [34] G. Li, I. Yun, J. Kim, and J. Kim, ‘‘DABNet: Depth-wise asymmetric bottleneck for real-time semantic segmentation,’’ in Proceedings of the British Machine Vision Conference, 2019, pp. 1–12.
    [35] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia. “Icnet for real-time semantic segmentation on high-resolution images.” in Proceedings of the European Conference on Computer Vision, 2018.
    [36] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in Proceedings of International Conference on 3D Vision, 2016.

    QR CODE