簡易檢索 / 詳目顯示

研究生: 羅楷傑
Lo, Kai-Chieh
論文名稱: 基於雙注意網路的實時語意分割
Bi-Attention Network For Real-time Semantic Segmentation
指導教授: 張隆紋
Chang, Long-Wen
口試委員: 陳朝欽
Chen, Chaur-Chin
邱瀞德
Chiu, Ching-Te
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 40
中文關鍵詞: 實時注意力語意分割
外文關鍵詞: realtime, attention, semantic segmentation
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在近年來,語意分割 (semantic segmentation) 是影像處理中一個非常重要的議題,而語意分割在行動應用上都需要能夠被實時 (real-time) 的計算,我們稱之為實時語意分割 (real-time semantic segmentation)。在實時語意分割當中,要同時兼顧精確度以及處理速度一直以來都是棘手的難題。
    在2018年,Yu等人 [3] 提出了 BiSeNet。它成功地使用了一個特別的雙道 (two-path) 架構來使精確度和處理速度都有了提升。基於這篇論文,我們想要藉由移除一些不必要的區塊以及增加精煉模塊 (refine module) 來創造一個更加有效率的網路。因此,我們設計了 Bi-Attention Network,它使用了高效率的注意模塊 (attention module) 以及裁切掉了從淺層網路連接出來的分支。Bi-Attention Network 在 CityScapes 測試集 [4] 上達到了70.6%的mean-IoU (mean intersection of union) 以及80.64 FPS (frame per second)的處理速度。對於BiSeNet而言各是1%以及13%的提升。此外,我們亦節省了11%的記憶體用量。


    In recent years, semantic segmentation is a very important issue in image processing. Most of the mobile applications of sematic segmentation require real time. We call it real-time semantic segmentation. In real-time semantic segmentation, it is always a tough decision between accuracy and inference time.
    In 2018, Yu et al. [3] introduced BiSeNet, which managed to make both its accuracy and speed improved by a unique two-pathed architecture. Based on this paper, we want to design a more improved network from BiSeNet by removing unnecessary blocks and adding in more refining modules. Therefore, we come up a Bi-Attention Network, utilizing efficient attention modules and truncating branches form former layers. Our work achieves 70.6% mean-IoU (mean intersection of union) in Cityscapes [4] dataset with 80.64 FPS (frame per second) which are 1% and 13% improved form BiSeNet. Beside them, we also save 11% of the memory usage.

    Chapter 1. Introduction-1 Chapter 2. Related Work-4 Chapter 3. The Proposed Method-11 Chapter 4. Experiments and Results-22 Chapter 5. Conclusions-37 References-38

    [1] Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881-2890).
    [2] Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147.
    [3] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018). Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 325-341).
    [4] O’Neill, E., Kostakos, V., Kindberg, T., Penn, A., Fraser, D. S., and Jones, T. (2006, September). Instrumenting the city: Developing methods for observing and understanding the digital cityscape. In International Conference on Ubiquitous Computing (pp. 315-332). Springer, Berlin, Heidelberg.
    [5] Long, J., Shelhamer, E., and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
    [6] Zhao, H., Qi, X., Shen, X., Shi, J., and Jia, J. (2018). Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 405-420).
    [7] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.
    [8] He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
    [9] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
    [10] Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019). Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3146-3154).
    [11] Li, H., Xiong, P., An, J., and Wang, L. (2018). Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180.
    [12] Woo, S., Park, J., Lee, J. Y., and So Kweon, I. (2018). Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3-19).
    [13] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer. (2017) “Automatic differentiation in PyTorch,”
    [14] Mcordts. (2017) “cityscapesScripts” https://github.com/mcordts/cityscapesScripts
    [15] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Li, Fei-Fei. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
    [16] Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010 (pp. 177-186). Physica-Verlag HD.
    [17] Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., ... and Renduchintala, A. (2018). Espnet: End-to-end speech processing toolkit. arXiv preprint arXiv:1804.00015.
    [18] Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
    [19] G. J. Brostow, J. Shotton, J. Fauqueur and R. Cipolla. (2018). “Segmentation and recognition using structure from motion point clouds,” in European Conference on Computer Vision, pp. 44-57.

    QR CODE