研究生: |
李佳陽 Lee, Jia-Yang |
---|---|
論文名稱: |
具有粗到精分割的全景感知合併網路 PAMNet: Panoptic Aware Merge Network with Coarse-to-Fine Segmentation |
指導教授: |
張隆紋
Chang, Long-Wen |
口試委員: |
陳永昌
Chen, Yung-Chang 陳朝欽 Chen, Chaur-Chin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 電腦視覺 、深度學習 、影像分割 、全景分割 |
外文關鍵詞: | Computer Vision, Deep Learning, Image Segmentation, Panoptic Segmentation |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在過去的兩年中,一些統一且簡單的框架,用於具有動態內核或可學習查詢的端到端全景分割(panoptic segmentation)。專門設計的更新策略使每個內核(kernel)或查詢(query)能夠動態對應輸入圖像中有意義的像素組。然而,它需要太多的計算,因為它利用了自我注意機制。相比之下,我們提出了一種新的策略,名為全景感知合併 PAM (Panoptic Aware Merge) 模塊,它具有內核重用的概念,它使用預測內核來重複提取全局特徵。我們還在 PAM 上結合全景邊界以增強結構線索感知。基於 PAM,我們提出了一個從粗糙到細致的全景分割網絡,名為全景感知合併網路PAMNet (Panoptic Aware Merge Network)。由於上下文模塊的成功,我們提出了一個名為深度特徵金字塔池模塊DFPPM (Deep Feature Pyramid Pooling Module)的上下文信息提取器來提高性能。此外,為了時間效率,我們引入了一個稱為高效跨層通道注意模塊(the Efficient Cross-Layer Channel Attention module)的快速特徵融合模塊,該模塊將多尺度特徵與通道注意聚合在一起,以捕獲每個尺度特徵的不同通道依賴關係。為了驗證我們提出的網絡的有效性,我們在 COCO 數據集上進行了幾次實驗。我們的模型不僅在單個 GPU RTX 3080ti 上實現了 43.0% PQ (the panoptic quality) 和 34.8 FPS (frame per second),且具有很快的速度和良好的性能。
In the past two years, some unified and simple frameworks have used dynamic kernels or learnable queries for end-to-end panoptic segmentation. The specially designed update strategy enables each kernel or query to dynamically correspond to the meaningful pixel group in the input image. However, it takes too much computation, since it utilizes the self-attention mechanism. We propose a novel strategy, named Panoptic Aware Merge (PAM) module with the concept of kernel reuse which uses predicted kernels to extract the global feature repeatedly. We also integrate the panoptic boundary on the PAM to enhance structural cues perception. Based on the PAM, we present a coarse-to-fine panoptic segmentation network named Panoptic Aware Merge Network (PAMNet). Due to the success of the context module, we propose a context information extractor named Deep Feature Pyramid Pooling Module (DFPPM) to improve the performance. Furthermore, for the time efficiency, we introduce a fast feature fusion module called the Efficient Cross-Layer Channel Attention module, which aggregates the multi-scale features with channel attention to capture the different channel dependencies for each scale feature. To validate the effectiveness of the proposed method, we conduct several experiments on the COCO dataset. Our model not only achieves 43.0% PQ (the panoptic quality) and 34.8 FPS (frame-per-second) on a single GPU RTX 3080ti but also has a fast speed and good performance.
[1] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ar. Panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9404-9413, 2019.
[2] Alexander Kirillov, Ross Girshick, Kaiming He, and Piotr Doll ar. Panoptic feature pyramid networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6399-6408, 2019.
[3] Kaiming He, Georgia Gkioxari, Piotr Doll ar, and Ross Girshick. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2980-2988, 2017.
[4] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2881-2890, 2017.
[5] Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, and Xingang Wang. Attention-guided unified network for panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7026-7035, 2019.
[6] Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, and Raquel Urtasun. Upsnet: A unified panoptic segmentation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8818-8826, 2019.
[7] Tien-Ju Yang, Maxwell D Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu, Xiao Zhang, Vivienne Sze, George Papandreou, and Liang-Chieh Chen. Deeperlab: Single-shot image parser. arXiv preprint arXiv:1902.05093, 2019.
[8] Bowen Cheng, Maxwell D Collins, Yukun Zhu, Ting Liu, Thomas S Huang, Hartwig Adam, and Liang-Chieh Chen. Panoptic-DeepLab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12475-12485, 2020.
[9] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In In Proceedings of the European Conference on Computer Vision (ECCV), pages 833-851, 2018.
[10] Weixiang Hong, Qingpei Guo, Wei Zhang, Jingdong Chen, Wei Chu. LPSNet: A Lightweight Solution for Fast Panoptic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 16746-16754, 2021.
[11] Chia-Yuan Chang, Shuo-En Chang, Pei-Yung Hsiao, Li-Chen Fu. EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 689-705, 2020.
[12] Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. Yolact: Real-time instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 9157-9166, 2019.
[13] Wenwei Zhang, Jiangmiao Pang, Kai Chen, and Chen Change Loy. K-net: Towards unified image segmentation. In Proceedings of the Advances in Neural Information Processing Systems 34, 2021.
[14] Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, and Jiaya Jia. Fully convolutional networks for panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 13320-13328, 2021.
[15] Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, Chunhua Shen. SOLOv2: Dynamic and Fast Instance Segmentation. In Proceedings of the Advances in Neural Information Processing Systems 33, 2020.
[16] Qizhu Li, Xiaojuan Qi, and Philip H. S. Torr. Unifying training and inference for panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 13320-13328, 2020.
[17] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In In Proceedings of the European Conference on Computer Vision (ECCV), pages 213-229, 2020.
[18] Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu. Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers. arXiv preprint arXiv:2109.03814, 2021.
[19] Bowen Cheng, Alexander G. Schwing, Alexander Kirillov. Per-Pixel Classification is Not All You Need for Semantic Segmentation. In Proceedings of the Advances in Neural Information Processing Systems 34, 2021.
[20] Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar. Masked-attention Mask Transformer for Universal Image Segmentation. arXiv preprint arXiv:2112.01527, 2021.
[21] Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2117-2125, 2017.
[22] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), pages 740-755, 2014.
[23] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
[24] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385, 2015.
[25] Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), pages 642-656, 2020.
[26] Xingyi Zhou, Dequan Wang, and Philipp Kr¨ahenb¨uhl. Objects as points. arXiv preprint arXiv: 1904.07850, 2019.
[27] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 2980-2988, 2017.
[28] Rosanne Liu, Joel Lehman, Piero Molino, Felipe Petroski Such, Eric Frank, Alex Sergeev, and Jason Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution. In Proceedings of the Advances in Neural Information Processing Systems 31, 2018.
[29] Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11531-11539, 2020.
[30] Tianheng Cheng, Xinggang Wang, Lichao Huang, and Wenyu Liu. Boundary-Preserving Mask R-CNN. In Proceedings of the European Conference on Computer Vision (ECCV), pages 660-676, 2020.
[31] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Fourth International Conference on 3D Vision (3DV), pages 565-571, 2016.
[32] Fan Zhang, Yanqin Chen, Zhihang Li, Zhibin Hong, Jingtuo Liu, Feifei Ma, Junyu Han, and Errui Ding. ACFNet: Attentional Class Feature Network for Semantic Segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 6797-6806, 2019.
[33] Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
[34] Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, and Youliang Yan. BlendMask: Top-down meets bottom-up for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8573-8581, 2020.