高效的即時語義分割：基於訓練時邊界監督和結構重新參數化的 FCHarDNetV2

簡易檢索 / 詳目顯示

回結果列表

研究生：	沈柏懷 Shen, Po-Haui
論文名稱：	高效的即時語義分割：基於訓練時邊界監督和結構重新參數化的 FCHarDNetV2 Efficient Real-Time Semantic Segmentation: Enhancing FCHarDNetV2 with Training-Time Techniques and Structural Re-parameterization
指導教授：	林永隆 Lin, Youn-Long
口試委員:	黃俊達 Huang, Juinn-Dar 王廷基 Wang, Ting-Chi 高肇陽 Kao, Chao-Yang
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	31
中文關鍵詞：	深度學習、卷積神經網路、即時語義分割、神經網路架構設計
外文關鍵詞：	Deep learning, CNN, Real-time sementic segmentation, Nerual network architecture design
相關次數：	點閱：92 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來，大量的研究致力於開發高效且強健的神經網絡，用於即時語義分割。許多方法專注於設計雙分支或三分支網絡，以增強網絡從輸入圖像中捕獲形狀和語義信息的能力。然而，添加額外的分支通常會導致推理延遲的顯著提升。在本文中，我們提出了 FCHarDNetV2，一種基於 FCHarDNet，並針對推理階段效率而設計的編碼器-解碼器架構。FCHarDNetV2結合了多種僅在訓練時使用的技術，例如結構重新參數化和訓練時邊界監督，這些技術在不增加推理階段成本的情況下提高了整體性能。我們提出的模型 FCHarDNetV2-M 在 Cityscapes 數據集上獲得與先前最先進方法相媲美的結果。在單個 Tesla V100 GPU 上，它在 Cityscapes 測試集上實現了 79.5% 的平均交集聯合（mIOU），同時保持了 49.3 FPS 的幀率。這證明了 FCHarDNetV2-M 在即時語義分割任務中的有效性和高效性。總體而言，我們的貢獻包含提出 FCHarDNetV2，一種推理時間的編碼器-解碼器架構，並整合僅在訓練時使用的方法來提高性能。通過在 Cityscapes 數據集上的實驗，我們展示了我們方法在實現高準確度和實時處理能力方面的競爭力。

In recent years, extensive research has been dedicated to the development of efficient and robust neural networks for real-time semantic segmentation. Numerous methods focused on designing two-branch or three-branch networks, aiming to enhance the network’s capacity to capture both shape and semantic information from input images. However, the inclusion of additional branches can significantly increase inference latency. In this paper, we present FCHarDNetV2, an encoder-decoder architecture specifically designed for inference time efficiency, based on FCHarDNet. FCHarDNetV2 incorporates several training-time-only techniques, such as structural re-parameterization and training-time boundary supervision, which
enhance the overall performance without incurring any additional cost during the inference stage. Our proposed model, FCHarDNetV2-M, achieves competitive results on the Cityscapes dataset when compared to previous state-of-the-art methods. It attains a mean Intersection over Union (mIOU) of 79.5% on the Cityscape test set, while maintaining a frame rate of 49.3 FPS on a single Tesla V100 GPU.
This demonstrates the effectiveness and efficiency of FCHarDNetV2-M in real-time semantic segmentation tasks. Overall, our contributions include the introduction of FCHarDNetV2, an inference-time encoder-decoder architecture, and the incorporation of training-time-only techniques to improve performance. Through our experiments on the Cityscapes dataset, we showcase the competitiveness of our approach in achieving high accuracy and real-time processing capabilities.

摘要 i
Abstract ii
Introduction 1
Related Works 5
1 Memory Efficient Network Architecture 5
2 Structural Re-parameterization 7
3 High-accuracy Semantic Segmentation 8
4 Real-time Semantic Segmentation 9
Proposed Methods 11
1 FCHarDNetV2 11
2 Encoder 13
3 Decoder 15
4 Loss Function 16
Experiment 17
1 Datasets 17
2 Train Setting 18
3 Measure of Inference Speed and Accuracy 18
4 Ablation Study 19
4.1 Comparison of PPM and Dilated Backbone 19
4.2 Effectiveness of Boundary Decoder 20
4.3 Comparison of Structural Re-parameterization method  21
4.4 Comparison of context module  22
5 Comparison  23
Conclusions and Future Works 27
References 29
                                

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep con-
volutional neural networks,” Advances in neural information processing systems, vol. 25,
2012.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa-
thy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,”
International journal of computer vision, vol. 115, pp. 211–252, 2015.
[3] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic seg-
mentation,” in Proceedings of the IEEE conference on computer vision and pattern recog-
nition, pp. 3431–3440, 2015.
[4] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Bisenet: Bilateral segmentation
network for real-time semantic segmentation,” in Proceedings of the European conference
on computer vision (ECCV), pp. 325–341, 2018.
[5] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network
with guided aggregation for real-time semantic segmentation,” International Journal of
Computer Vision, vol. 129, pp. 3051–3068, 2021.
[6] Y. Nirkin, L. Wolf, and T. Hassner, “Hyperseg: Patch-wise hypernetwork for real-time se-
mantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pp. 4061–4070, 2021.
[7] Y. Hong, H. Pan, W. Sun, and Y. Jia, “Deep dual-resolution networks for real-time and
accurate semantic segmentation of road scenes,” arXiv preprint arXiv:2101.06085, 2021.
[8] J. Xu, Z. Xiong, and S. P. Bhattacharyya, “Pidnet: A real-time semantic segmentation
network inspired by pid controllers,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 19529–19539, 2023.
[9] P. Chao, C.-Y. Kao, Y.-S. Ruan, C.-H. Huang, and Y.-L. Lin, “Hardnet: A low memory
traffic network,” in Proceedings of the IEEE/CVF international conference on computer
vision, pp. 3552–3561, 2019.
[10] M. Fan, S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei, “Rethinking bisenet for real-
time semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, pp. 9716–9725, 2021.
[11] J. Peng, Y. Liu, S. Tang, Y. Hao, L. Chu, G. Chen, Z. Wu, Z. Chen, Z. Yu, Y. Du,
et al., “Pp-liteseg: A superior real-time semantic segmentation model,” arXiv preprint
arXiv:2204.02681, 2022.
[12] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang,
V. Vasudevan, et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF in-
ternational conference on computer vision, pp. 1314–1324, 2019.
[13] C.-H. Huang, H.-Y. Wu, and Y.-L. Lin, “Hardnet-mseg: A simple encoder-decoder polyp
segmentation neural network that achieves over 0.9 mean dice and 86 fps,” arXiv preprint
arXiv:2101.07172, 2021.
[14] T.-Y. Liao, C.-H. Yang, Y.-W. Lo, K.-Y. Lai, P.-H. Shen, and Y.-L. Lin, “Hardnet-dfus:
An enhanced harmonically-connected network for diabetic foot ulcer image segmentation
and colonoscopy polyp segmentation,” arXiv preprint arXiv:2209.07313, 2022.
[15] H.-Y. Wu and Y.-L. Lin, “Hardnet-bts: A harmonic shortcut network for brain tumor
segmentation,” in International MICCAI Brainlesion Workshop, pp. 261–271, Springer,
2021.
[16] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for effi-
cient cnn architecture design,” in Proceedings of the European conference on computer
vision (ECCV), pp. 116–131, 2018.
[17] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-style
convnets great again,” in Proceedings of the IEEE/CVF conference on computer vision
and pattern recognition, pp. 13733–13742, 2021.
[18] X. Ding, Y. Guo, G. Ding, and J. Han, “Acnet: Strengthening the kernel skeletons for
powerful cnn via asymmetric convolution blocks,” in Proceedings of the IEEE/CVF in-
ternational conference on computer vision, pp. 1911–1920, 2019.
[19] X. Ding, X. Zhang, J. Han, and G. Ding, “Diverse branch block: Building a convolution as
an inception-like unit,” in Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pp. 10886–10895, 2021.
[20] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceed-
ings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890,
2017.
[21] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Se-
mantic image segmentation with deep convolutional nets, atrous convolution, and fully
connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40,
no. 4, pp. 834–848, 2017.
[22] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for
semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
[23] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with
atrous separable convolution for semantic image segmentation,” in Proceedings of the
European conference on computer vision (ECCV), pp. 801–818, 2018.
[24] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters–improve semantic
segmentation by global convolutional network,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, pp. 4353–4361, 2017.
[25] T. Takikawa, D. Acuna, V. Jampani, and S. Fidler, “Gated-scnn: Gated shape cnns for se-
mantic segmentation,” in Proceedings of the IEEE/CVF international conference on com-
puter vision, pp. 5229–5238, 2019.
[26] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Learning a discriminative feature
network for semantic segmentation,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 1857–1866, 2018.
[27] R. Gao, “Rethink dilated convolution for real-time semantic segmentation,” arXiv preprint
arXiv:2111.09957, 2021.
[28] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang,
et al., “Deep high-resolution representation learning for visual recognition,” IEEE transac-
tions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3349–3364, 2020.
[29] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical
image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–
MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015,
Proceedings, Part III 18, pp. 234–241, Springer, 2015.
[30] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “Ghostnet: More features from cheap
operations,” in Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp. 1580–1589, 2020.
[31] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, pp. 7132–7141, 2018.
[32] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in Artifi-
cial intelligence and statistics, pp. 562–570, Pmlr, 2015.
[33] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke,
S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understand-
ing,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 3213–3223, 2016.
[34] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, “Understanding
convolution for semantic segmentation,” in 2018 IEEE winter conference on applications
of computer vision (WACV), pp. 1451–1460, Ieee, 2018.

簡易檢索 / 詳目顯示

相關論文