研究生: |
劉亮昆 Liu, Liang-Kun |
---|---|
論文名稱: |
具有混合損失的八度半膨脹瓶頸網路的實時語義切割 OHDBNet: Octave Half Dilated Bottleneck Network with Mixed Loss for Real-Time Semantic Segmentation |
指導教授: |
張隆紋
Chang, Long-Wen |
口試委員: |
陳朝欽
Chen, Chaur-Chin 邱瀞德 Chiu, Ching-Te |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 26 |
中文關鍵詞: | 實時 、語義切割 |
外文關鍵詞: | real-time, segmentation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
作為像素級的預測任務,語義分割為了能夠獲得更好的效果通常需要足夠好的設備來處理大量的計算。但是近期隨著機器人、無人駕駛的研究開始興起,為了能夠在有限的資源上運行深度學習的網路的實時性(real-time),必須要在精確度和速度中找到一個平衡。近期的實時語義分割網路(real-time semantic segmentation network)大多屬於卷積類神經網路(CNN),其中有些網路會使用有效率的模塊來提升網路的速度。
在這篇論文中,我們提出了八度半膨脹瓶頸網路,它是基於深度不對稱瓶頸網路(depth-wise asymmetric bottleneck network, DABNet)的網路結構,再搭配我們設計的八度半膨脹瓶頸模塊而形成的網路。在八度半膨脹瓶頸模塊中,我們使用瓶頸架構(bottleneck module)減少參數量並利用八度卷積減少浮點數計算量,然後透過膨脹卷積增加感受域以更好的利用上下文的資訊。另外,我們透過使用混合的損失函數(mixed loss functions)來訓練網路,讓網路能夠更準確地預測出結果。
我們使用CamVid和Cityscapes的數據集來評估我們的網路,然後在GTX 1080Ti進行FPS的計算。我們網路的參數量只有0.38M在CamVid中測出來的mIoU為63.78%、FPS為207.2,在Cityscapes測出的mIoU為67.92%、FPS為155.3。從結果來看,我們的模塊及架構修改能夠有效減少參數量,而且在維持足夠好的預測結果下擁有相當高的速度,因此我們的網絡可以應用於實時語義分割。
As a pixel-level prediction task, semantic segmentation usually requires good enough devices to handle a large number of calculations in order to obtain better results. However, with the recent rise of research on robotics and automatic driving, for the sake of being able to run deep learning networks on limited resources while pursuing real-time, it is necessary to strike a balance between accuracy and speed. Recently, most real-time semantic segmentation networks are all convolutional neural networks (CNN), and there are some networks which use efficient modules to enhance the inference speed of the network.
In this thesis, we propose the Octave Half Dilated Bottleneck Network (OHDBNet). OHDBNet is based on the network structure of the depth-wise asymmetric bottleneck network (DABNet) [1] with the Octave Half Dilated Bottleneck (OHDB) module we designed. In the OHDB module, we use a bottleneck architecture to reduce the number of parameters and use octave convolution to reduce the amount of floating-point calculations. Then we use dilation convolution to increase the receptive field to utilize the context information of an image. In addition, we use a mixed loss function to train the network so that the network can predict the results more accurately.
We use CamVid [17] and Cityscapes [18] datasets to evaluate our network, and then perform FPS (frames per second) calculations on a single GTX 1080Ti. The parameter quantity of our network is only 0.38M (million). The mIoU is 63.78% and the FPS is 207.2 on Camvid and the mIoU is 67.92% and the FPS is 155.3 on Cityscapes. Our network can effectively reduce the number of parameters, and have a high speed while maintaining good enough predicted results, so our network can be applied to real-time semantic segmentation.
[1] Gen Li, Inyoung Yun, Jonghyun Kim, and Joongkyu Kim, “DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentatio,” In BMVC, 2019.
[2] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan Loddon Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” CoRR, abs/1412.7062, 2015.
[3] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam, “Rethinking atrous convolution for semantic image segmentation,” CoRR, abs/1706.05587, 2017.
[4] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan Loddon Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 40:834–848, 2018.
[5] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” In ECCV, 2018.
[6] Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker, “Learning to Adapt Structured Output Space for Semantic Segmentation,” arXiv:1802.10349, 2018.
[7] Chen-Wei Xie, Hong-Yu Zhou, and Jianxin Wu, “Vortex Pooling: Improving Context Representation in Semantic Segmentation,” arXiv:1804.06242v2, 2018.
[8] Paszke, A., Chaurasia, A., Kim, S., Culurciello, E., “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv:1606.02147v1, 2016.
[9] Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi, “ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation,” In ECCV, 2018.
[10] Yunpeng Chen, Haoqi Fan, Bing Xu, Zhicheng Yan, Yannis Kalantidis, Marcus Rohrbach, Shuicheng Yan, and Jiashi Feng, “Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution,” In ICCV, 2019.
[11] Matthias Holschneider, RichardKronland-Martinet, JeanMorlet, and Ph Tchamitchian, “A real-time algorithm for signal analysis with the help of the wavelet transform,” In Wavelets, pages 286–297. Springer, 1990.
[12] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers, “Surpassing human-level performance on imagenet classification,” In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
[14] J. Cheng, L. Dong, and M. Lapata, “Long short-term memory-networks for machine reading,” arXiv:1601.06733, 2016.
[15] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár, “Focal Loss for Dense Object Detection,” arXiv:1708.02002v2, 2018.
[16] Maxim Berman, Amal Rannen Triki, and Matthew B. Blaschko, “The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks,” In CVPR, 2018.
[17] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan Loddon Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” CoRR, abs/1412.7062, 2015.
[18] Marius Cordts,Mohamed Omran, Sebastian Ramos, Timo Rehfeld,Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele, “The cityscapes dataset for semantic urban scene understanding,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3213–3223, 2016.
[19] Diederik P. Kingma, and Jimmy Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980, 2014.
[20] Eduardo Romera, Jose M. Alvarez, Luis M. Bergasa1 and Roberto Arroyo1, “ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation,” In CVPR 2018.
[21] Mengyu Liu, and Hujun Yin, “Feature Pyramid Encoding Network for Real-time Semantic Segmentation,” In BMVC, 2019.
[22] Alex Kendall, “SegNet and Bayesian SegNet Tutorial,” http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html
[23] Tianyi Wu, Sheng Tang, Rui Zhang, Yongdong Zhang, “CGNet: A Light-weight
Context Guided Network for Semantic Segmentation,” arXiv:1811.08201v2, 2019.