簡易檢索 / 詳目顯示

研究生: 劉士弘
Liu, Shih-Hung
論文名稱: 利用高斯分布學習實例分割
Learning Gaussian Instance Segmentation in Point Clouds
指導教授: 陳煥宗
Chen, Hwann-Tzong
口試委員: 林彥宇
Lin, Yen-Yu
陳嘉平
Chen, Chia-Ping
劉庭祿
Liu, Tyng-Luh
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 34
中文關鍵詞: 實例分割
外文關鍵詞: instance segmentation
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文提出了一種新的3D點雲實例分割方法。提出的方法稱為高斯實例中心網絡(GICN),它可以將分佈在整個場景中的實例中心的分佈近似為高斯中心熱圖,基於預測的熱圖,可以輕鬆地選擇少量的中心候補進行後續高效預測,步驟包括:一、預測每個中心的實例大小,以確定提取特徵的範圍,二、生成中心的邊界框,以及三、生成最終的實例模板。GICN是一種單階段、無預設錨框、端到端的深度學習網路結構,易於訓練,並且可以高效地進行測試。得益於採用自適應實例大小選擇的中心指示機制,我們的方法在ScanNet和S3DIS數據集上的3D實例分割任務中實現了最好的結果。


    This paper presents a novel method for instance segmentation of 3D point clouds. The proposed method is called Gaussian Instance Center Network(GICN), which can approximate the distributions of instance centers scattered in the whole scene as Gaussian center heatmaps. Based on the predicted heatmaps, a small number of center candidates can be easily selected for the subsequent predictions with efficiency, including i) predicting the instance size of each center to decide a range for extracting features, ii) generating bounding boxes for centers, and iii) producing the final instance masks. GICN is a single-stage, anchor-free, and end-to-end architecture that is easy to train and efficient to perform inference. Benefited from the center-dictated mechanism with adaptive instance size selection, our method achieves state-of-the-art performance in the task of 3D instance segmentation on ScanNet and S3DIS datasets.

    Contents List of Tables 4 List of Figures 5 摘 要 6 Abstract 7 1 Introduction 8 2 Relatedwork 11 3 OurApproach 14 3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Center Prediction Network ΦC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Bounding-Box Prediction Network ΦB . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1.3 Mask Prediction Network ΦM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.4 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Experiments 21 4.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.3 Evaluation on S3DIS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.4 Evaluation on ScanNet Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.6 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5 ConclusionandFutureWork 28 6 Bibliography 29 List of Tables 4.1 Comparisons on S3DIS instance segmentation (6-fold cross validation) . . . . . . . . . . . . 22 4.2 ScanNet v2 instance segmentation online benchmark. The table shows AP@50% score of each semantic class. Our method achieves the best mean AP@50% performance among all existing methods published in the literature . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Ablation study on Area-5 of S3DIS dataset (†: random selection. ∗: top Tθ) . . . . . . . . . 25 4.4 The timing results on the ScanNet v2 validation split (312 scenes) . . . . . . . . . . . . . . 26 List of Figures 1-1 An overview of GICN. The global and local features are extracted from the input point cloud and then passed through the center prediction network ( 1 ⃝) to generate the Gaussian approximation heatmap. We use a center selection mechanism to choose a small number of probable candidates, which will yield the bounding boxes and the instance masks using the bounding box prediction network ( 2 ⃝) and the mask prediction network ( 3 ⃝) . . . . . . . . . . . . . . 9 3-1 Visualization of predicted and ground-truth center heatmaps on ScanNet . . . . . . . . . . . 15 3-2 Bounding-box prediction network ΦB. The network first predicts the instance size for each ofthe T selected centers, and then uses a shared PointNet++ network to extract features from the point cloud within the neighborhood of the predicted size. The extracted local features combined with the global features will go through convolutional layers to predict T bounding boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4-1 ResultsofS3DISdataset. The first column shows the input point clouds. The second column depicts the predicted masks. The third column shows the ground-truth masks. Note that the color code assigned to each instance does not have to match the ground truth. Only the structure of the mask matters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4-2 Qualitative results of the validation split of ScanNet v2 dataset. Different colors indicate different instances. Moe results are in the supplementary material . . . . . . . . . . . . . . 25 4-3 Comparison with 3D-BoNet from the validation split of ScanNet v2 dataset. The red circles show some examples that 3D-BoNet fails to segment but the proposed GICN successfully produces the instance masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    [1] J. Ahn, S. Cho, and S. Kwak. Weakly supervised learning of instance segmentation with inter-pixel relations. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 2209– 2218, 2019.
    [2] K.Arase,Y.Mukuta,andT.Harada. Rethinkingtaskandmetricsofinstancesegmentation on 3d point clouds. CoRR, abs/1909.12655, 2019.
    [3] I.Armeni,O.Sener,A.R.Zamir,H.Jiang,I.Brilakis,M.Fischer,andS.Savarese. 3d semanticparsingoflarge-scaleindoorspaces. In2016IEEEConferenceonComputer Vision and Pattern Recognition (CVPR), pages 1534–1543, 2016.
    [4] M. Bai and R. Urtasun. Deep watershed transform for instance segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2858–2866, 2017.
    [5] B. D. Brabandere, D. Neven, and L. V. Gool. Semantic instance segmentation with a discriminative loss function. CoRR, abs/1708.02551, 2017.
    [6] L. Chen, A. Hermans, G. Papandreou, F. Schroff, P. Wang, and H. Adam. MaskLab: instance segmentation by refining object detection with semantic and direction features. In2018IEEE Conferenceon Computer Visionand PatternRecognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 4013–4022, 2018.
    29
    [7] H. Chiang, Y. Lin, Y. Liu, and W. H. Hsu. A unified point-based framework for 3d segmentation. In 2019 International Conference on 3D Vision, 3DV 2019, Québec City, QC, Canada, September 16-19, 2019, pages 155–163, 2019.
    [8] C. B. Choy, J. Gwak, and S. Savarese. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 3075– 3084, 2019.
    [9] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, M. NieSSner, and S. Savarese. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2432– 2443, 2017.
    [10] C. Elich, F. Engelmann, T. Kontogianni, and B. Leibe. 3d bird’s-eye-view instance segmentation. In Pattern Recognition - 41st DAGM German Conference, DAGM GCPR 2019, Dortmund, Germany, September 10-13, 2019, Proceedings, pages 48– 61, 2019.
    [11] F. Engelmann, T. Kontogianni, and B. Leibe. Dilated point convolutions: On the receptive field of point convolutions. CoRR, abs/1907.12046, 2019.
    [12] F. Engelmann, T. Kontogianni, and B. Leibe. Dilated point convolutions: On the receptive field of point convolutions. 2019.
    [13] K.FukunagaandL.D.Hostetler. Theestimationofthegradientofadensityfunction, with applications in pattern recognition. IEEE Trans. Information Theory, 21(1):32– 40, 1975.
    [14] N. Gao, Y. Shan, Y. Wang, X. Zhao, Y. Yu, M. Yang, and K. Huang. SSAP: singleshot instance segmentation with affinity pyramid. In IEEE International Conference on Computer Vision, ICCV 2019, 2019.
    [15] B. Graham, M. Engelcke, and L. van der Maaten. 3d semantic segmentation with submanifold sparse convolutional networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 9224–9232, 2018.
    30
    [16] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 2980–2988, 2017.
    [17] J.Hou,A.Dai,andM.Nießner. 3D-SIS:3dsemanticinstancesegmentationofRGBD scans. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 4421–4430, 2019.
    [18] S. Kong and C. C. Fowlkes. Recurrent pixel embedding for instance grouping. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 9018–9028, 2018.
    [19] J. Lahoud, B. Ghanem, M. Pollefeys, and M. R. Oswald. 3d instance segmentation via multi-task metric learning. In The IEEE International Conference on Computer Vision (ICCV), October 2019.
    [20] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei. Fully convolutional instance-aware semantic segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 4438–4446, 2017.
    [21] X. Liang, L. Lin, Y. Wei, X. Shen, J. Yang, and S. Yan. Proposal-free network for instance-level object segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 40(12):2978–2991, 2018.
    [22] Z.Liang,M.Yang,andC.Wang. 3dgraphembeddinglearningwithastructure-aware loss function for point cloud semantic instance segmentation. 2019.
    [23] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. Focal loss for dense object detection. In The IEEE International Conference on Computer Vision (ICCV), pages 2999–3007, 2017.
    [24] C. Liu and Y. Furukawa. MASC: multi-scale affinity with sparse convolution for 3d instance segmentation. CoRR, abs/1902.04478, 2019.
    [25] S. Liu, J. Jia, S. Fidler, and R. Urtasun. SGN: sequential grouping networks for instancesegmentation. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 3516–3524, 2017.
    31
    [26] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia. Path aggregation network for instance segmentation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 8759–8768, 2018.
    [27] Y.Liu,S.Yang,B.Li,W.Zhou,J.Xu,H.Li,andY.Lu. Affinityderivationandgraph merge for instance segmentation. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part III, pages 708–724, 2018.
    [28] G. Narita, T. Seno, T. Ishikawa, and Y. Kaji. PanopticFusion: online volumetric semantic mapping at the level of stuff and things. CoRR, abs/1903.01177, 2019.
    [29] D.Neven,B.D.Brabandere,M.Proesmans,andL.V.Gool. Instancesegmentationby jointlyoptimizingspatialembeddingsandclusteringbandwidth. InIEEEConference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 8837–8845, 2019.
    [30] D. Novotný, S. Albanie, D. Larlus, and A. Vedaldi. Semi-convolutional operators for instance segmentation. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part I, pages 89–105, 2018.
    [31] Q. Pham, D. T. Nguyen, B. Hua, G. Roig, and S. Yeung. JSIS3D: joint semanticinstance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 8827–8836, 2019.
    [32] C. R. Qi, O. Litany, K. He, and L. J. Guibas. Deep hough voting for 3d object detection in point clouds. In The IEEE International Conference on Computer Vision (ICCV), October 2019.
    [33] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from RGB-D data. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 918–927, 2018.
    32
    [34] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 77–85, 2017.
    [35] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems30: AnnualConferenceonNeuralInformationProcessingSystems2017,4-9 December 2017, Long Beach, CA, USA, pages 5099–5108, 2017.
    [36] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 658–666, 2019.
    [37] V. A. Sindagi, Y. Zhou, and O. Tuzel. Mvx-net: Multimodal voxel net for 3d object detection. In International Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada, May 20-24, 2019, pages 7276–7282, 2019.
    [38] W. Wang, R. Yu, Q. Huang, and U. Neumann. SGPN: similarity group proposal network for 3d point cloud instance segmentation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 2569–2578, 2018.
    [39] X. Wang, S. Liu, X. Shen, C. Shen, and J. Jia. Associatively segmenting instances and semantics in point clouds. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 4096– 4105, 2019.
    [40] W.Wu,Z.Qi,andF.Li. Pointconv: Deepconvolutionalnetworkson3dpointclouds. InIEEEConferenceonComputerVisionandPatternRecognition, CVPR2019, Long Beach, CA, USA, June 16-20, 2019, pages 9621–9630, 2019.
    [41] B. Yang, J. Wang, R. Clark, Q. Hu, S. Wang, A. Markham, and N. Trigoni. Learning object bounding boxes for 3d instance segmentation on point clouds. CoRR, abs/1906.01140, 2019.
    33
    [42] L. Yi, W. Zhao, H. Wang, M. Sung, and L. J. Guibas. GSPN: generative shape proposal network for 3d instance segmentation in point cloud. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 3947–3956, 2019.
    [43] Y. Zhou and O. Tuzel. Voxelnet: End-to-end learning for point cloud based 3d object detection. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 4490–4499, 2018.

    QR CODE