研究生: |
葉哲欣 Yeh, Che-Hsin. |
---|---|
論文名稱: |
利用邊界引導信息改善查詢式實例分割 Improving Query based Instance Segmentation with Boundary-Guide Information |
指導教授: |
張隆紋
Chang, Long-Wen |
口試委員: |
陳朝欽
Chen, Chaur-Chin 陳永昌 Chen, Yung-Chang |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 34 |
中文關鍵詞: | 電腦視覺 、影像分割 、實例分割 |
外文關鍵詞: | ComputerVision, ImageSegmentation, InstanceSegmentation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
實例分割一直是電腦視覺中一項成熟的任務。先前最先進的作品往往是開發在全卷積網絡(FCN)上。例如基於兩階段的分割任務(Mask-RCNN系列)採用有效的檢測器來檢測對象並利用 FCN 生成逐像素分割掩碼(pixel-wise segmentation masks)。最近,基於查詢的實例分割框架擊敗了先檢測後分割(detect-then-segment)方法並取得了令人難以置信的性能。然而,粗略的掩碼預測(coarse mask predictions)和不精確的定位(imprecise localization) 是他們共同的問題,因為一系列下採樣操作造成邊界信息遺失。因此,我們在本文中提出了一種使用邊界引導信息改進基於查詢的實例分割的方法,稱為 BQInst ( Boundary information for query-based instance segmentation)。通過融合對象邊界信息和迭代更新策略,我們可以改進掩碼定位以生成細緻的掩碼。
我們通過一系列實驗證明了我們提案模塊的有效性。就結果而言,我們的方法在 COCO 驗證數據集上實現了 35.3 box AP 和 32.4 mask AP。此外, 在相同的訓練策略中,我們的方法在基線模型(3 階段 QueryInst , Instance as Queries )上實現了 0.3 box AP 和 1.1 mask AP 的改進
Instance segmentation has been a well-developed task in computer vision. Previous state-of-the-art works tend to be developed on Fully Convolution Network (FCN), e.g, two-stage based segmentation tasks, Mask-RCNN [1] family, adopt an efficient detector to detect objects and utilize FCN to generate pixel-wise segmentation masks. Recently, query-based instance segmentation frameworks have beaten detect-then-segment methods and achieved incredible performance. However, coarse mask prediction and imprecise localization are classical problems in these methods since boundary information is lost after a series of downsampling operations.
Therefore, we propose a method in this thesis that improves query-based instance segmentation with boundary-guide information, named BQInst (Boundary information for query-based instance segmentation). Through fusing object boundary information and iteratively updating strategy, we can improve mask localization to generate the fine-grained mask.
We demonstrate the effectiveness of our proposed modules through a series of experiments. As the result, our approach achieves 35.3 box AP and 32.4 mask AP on COCO validation datasets. Moreover, our method achieves 0.3 box AP and 1.1 points mask AP improvement over the baseline model, 3-stage QueryInst (Instance as Queries).
[1] K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980-2988, doi: 10.1109/ICCV.2017.322.
[2] Fang, Y., Yang, S., Wang, X., Li, Y., Fang, C., Shan, Y., Feng, B., & Liu, W. (2021). Instances as Queries. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 6890-6899.
[3] Tianheng Cheng, Xinggang Wang, Lichao Huang, and Wenyu Liu. Boundary-preserving mask r-cnn. In Proceedings of the European Conference on Computer Vision, pages 660–676, 2020.
[4] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to end object detection with transformers. In ECCV, 2020.
[5] Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., & Luo, P. (2021). Sparse R-CNN: End-to-End Object Detection with Learnable Proposals. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14449-14458.
[6] D. Bolya, C. Zhou, F. Xiao and Y. J. Lee, "YOLACT: Real-Time Instance Segmentation," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9156-9165, doi: 10.1109/ICCV.2019.00925.
[7] H. Chen, K. Sun, Z. Tian, C. Shen, Y. Huang and Y. Yan, "BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8570-8578, doi: 10.1109/CVPR42600.2020.00860.
[8] X. Wang, R. Zhang, C. Shen, T. Kong and L. Li, "SOLO: A Simple Framework for Instance Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3111116.
[9] Z. Tian, C. Shen, H. Chen and T. He, "FCOS: Fully Convolutional One-Stage Object Detection," 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9626-9635, doi: 10.1109/ICCV.2019.00972.
[10] S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017, doi: 10.1109/TPAMI.2016.2577031.
[11] T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2999-3007, doi: 10.1109/ICCV.2017.324.
[12] Zhi Tian, Chunhua Shen, and Hao Chen. Conditional convolutions for instance segmentation. In ECCV, 2020.
[13] Yuan, Y., Xie, J., Chen, X., Wang, J. (2020). SegFix: Model-Agnostic Boundary Refinement for Segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020.
[14] He, H., Li, X., Yang, K., Cheng, G., Shi, J., Tong, Y., Zha, Z., & Weng, L. (2021). BoundarySqueeze: Image Segmentation as Boundary Squeezing. ArXiv, abs/2105.11668.
[15] Zhang, G., Lu, X., Tan, J., Li, J., Zhang, Z., Li, Q., & Hu, X. (2021). RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6857-6865.
[16] Fu, J., Liu, J., Tian, H., Fang, Z., & Lu, H. (2019). Dual Attention Network for Scene Segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3141-3149
[17] Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 640-651.
[18] Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 580-587.
[19] Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is All you Need. ArXiv, abs/1706.03762.
[20] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
[21] Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., & Belongie, S.J. (2017). Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 936-944.
[22] Sudre, C.H., Li, W., Vercauteren, T.K., Ourselin, S., & Cardoso, M.J. (2017). Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. Deep learning in medical image analysis and multimodal learning for clinical decision support : Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, held in conjunction with MICCAI 2017 Quebec City, QC,..., 2017, 240-248 .
[23] Rezatofighi, S.H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I.D., & Savarese, S. (2019). Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 658-666.
[24] Cai, Z., & Vasconcelos, N. (2021). Cascade R-CNN: High Quality Object Detection and Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 1483-1498.