研究生: |
羅楷傑 Lo, Kai-Chieh |
---|---|
論文名稱: |
基於雙注意網路的實時語意分割 Bi-Attention Network For Real-time Semantic Segmentation |
指導教授: |
張隆紋
Chang, Long-Wen |
口試委員: |
陳朝欽
Chen, Chaur-Chin 邱瀞德 Chiu, Ching-Te |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 40 |
中文關鍵詞: | 實時 、注意力 、語意分割 |
外文關鍵詞: | realtime, attention, semantic segmentation |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在近年來,語意分割 (semantic segmentation) 是影像處理中一個非常重要的議題,而語意分割在行動應用上都需要能夠被實時 (real-time) 的計算,我們稱之為實時語意分割 (real-time semantic segmentation)。在實時語意分割當中,要同時兼顧精確度以及處理速度一直以來都是棘手的難題。
在2018年,Yu等人 [3] 提出了 BiSeNet。它成功地使用了一個特別的雙道 (two-path) 架構來使精確度和處理速度都有了提升。基於這篇論文,我們想要藉由移除一些不必要的區塊以及增加精煉模塊 (refine module) 來創造一個更加有效率的網路。因此,我們設計了 Bi-Attention Network,它使用了高效率的注意模塊 (attention module) 以及裁切掉了從淺層網路連接出來的分支。Bi-Attention Network 在 CityScapes 測試集 [4] 上達到了70.6%的mean-IoU (mean intersection of union) 以及80.64 FPS (frame per second)的處理速度。對於BiSeNet而言各是1%以及13%的提升。此外,我們亦節省了11%的記憶體用量。
In recent years, semantic segmentation is a very important issue in image processing. Most of the mobile applications of sematic segmentation require real time. We call it real-time semantic segmentation. In real-time semantic segmentation, it is always a tough decision between accuracy and inference time.
In 2018, Yu et al. [3] introduced BiSeNet, which managed to make both its accuracy and speed improved by a unique two-pathed architecture. Based on this paper, we want to design a more improved network from BiSeNet by removing unnecessary blocks and adding in more refining modules. Therefore, we come up a Bi-Attention Network, utilizing efficient attention modules and truncating branches form former layers. Our work achieves 70.6% mean-IoU (mean intersection of union) in Cityscapes [4] dataset with 80.64 FPS (frame per second) which are 1% and 13% improved form BiSeNet. Beside them, we also save 11% of the memory usage.
[1] Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881-2890).
[2] Paszke, A., Chaurasia, A., Kim, S., and Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147.
[3] Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018). Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 325-341).
[4] O’Neill, E., Kostakos, V., Kindberg, T., Penn, A., Fraser, D. S., and Jones, T. (2006, September). Instrumenting the city: Developing methods for observing and understanding the digital cityscape. In International Conference on Ubiquitous Computing (pp. 315-332). Springer, Berlin, Heidelberg.
[5] Long, J., Shelhamer, E., and Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
[6] Zhao, H., Qi, X., Shen, X., Shi, J., and Jia, J. (2018). Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 405-420).
[7] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2017). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4), 834-848.
[8] He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[9] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258).
[10] Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019). Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3146-3154).
[11] Li, H., Xiong, P., An, J., and Wang, L. (2018). Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180.
[12] Woo, S., Park, J., Lee, J. Y., and So Kweon, I. (2018). Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3-19).
[13] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer. (2017) “Automatic differentiation in PyTorch,”
[14] Mcordts. (2017) “cityscapesScripts” https://github.com/mcordts/cityscapesScripts
[15] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Li, Fei-Fei. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). Ieee.
[16] Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010 (pp. 177-186). Physica-Verlag HD.
[17] Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., ... and Renduchintala, A. (2018). Espnet: End-to-end speech processing toolkit. arXiv preprint arXiv:1804.00015.
[18] Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
[19] G. J. Brostow, J. Shotton, J. Fauqueur and R. Cipolla. (2018). “Segmentation and recognition using structure from motion point clouds,” in European Conference on Computer Vision, pp. 44-57.