簡易檢索 / 詳目顯示

研究生: 谷俊杰
Ku, Chun-Chieh
論文名稱: 通過二維影像語意資訊增強點雲中三維物件偵測的無監督領域自適應
Boosting Unsupervised Domain Adaptation for 3D Object Detection in Point Clouds via 2D Image Semantic Information
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 許秋婷
Hsu, Chui-Ting
林惠勇
Lin, Huei-Yung
江振國
Chiang, Chen-Kuo
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 33
中文關鍵詞: 深度學習三維物件偵測無監督領域自適應
外文關鍵詞: Deep learning, 3D object detection, Unsupervised domain adaptation
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 三維和 RGB-D 資訊都能被應用在三維物件偵測的研究領域中,但是由於兩者資料各有不同的建立程序,導致兩種數據表示存在顯著的幾何偏差。這兩種資料類型之間的幾何偏差會導致跨領域測試的性能下降,因此我們提出了一種無監督自適應的框架,以利用不同格式的標註資料進行室內三維物件偵測。我們的方法將從二維圖像預測的像素級語意標籤反向投射到點雲上,以便在兩個方向上進行三維物件偵測和無監督領域自適應。對於從三維到RGB-D 資料這種更具挑戰的無監督領域自適應任務,我們將兩個域中提取
    的特徵用在對抗訓練作為額外的策略。我們的方法減少了兩種資料的域間隙,並且利用從二維 RGB 圖像預測的語意標籤信息來提高三維物件偵測模型的準確性。據我們所知,目前沒有應用在室內三維物件偵測無監督自適應的工作或通用基準,因此我們以ScanNet 和 SUNRGB-D 兩個廣泛用於室內三維物件偵測的資料集作為雙向域適應的資料集。和沒有應用任何域適應的方法相比,我們的方法在跨資料集測試的兩個方向分別將 mAP@0.25 提高了6.4% 和 10.3%。


    Both 3D and RGB-D data are applicable for 3D object detection, yet there exists significant geometric bias between these two data representations owing to the different reconstruction procedures.
    The geometric bias between these two data types induces performance drops for cross-domain testing; hence we propose an unsupervised domain adaption (UDA) framework to leverage annotated data in different data formats for indoor 3D object detection.
    Our method inverse-projects the pixel-wise semantic labels predicted from 2D images onto point clouds for object detection and UDA in both directions.
    For the more challenging UDA from 3D to RGB-D data, we propose some additional strategies to reduce the domain gap by aligning the extracted features from two domains with adversarial training.
    Our method reduces the domain gap between two types of data and leverages the semantic label information predicted from 2D RGB images to boost the accuracy of the 3D object detection model.
    To our knowledge, there are no prior works or common benchmarks on unsupervised domain adaptation for indoor object detection. Thus, we validate our approach with ScanNet and SUN RGB-D as the source and the target datasets in both directions of domain adaptation. The proposed method improves the mAP@0.25 by 6.4\% and 10.3\% for the two directions of cross-dataset testing compared with that without applying any domain adaptation.

    1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Related Work 4 2.1 Point-based 3D Object Detection . . . . . . . . . . . . . . . . . . . 4 2.2 3D Object Detection and Semantic Segmentation . . . . . . . . . . 5 2.3 Unsupervised Domain Adaptation . . . . . . . . . . . . . . . . . . 5 3 Proposed Method 7 3.1 Inverse-Projection of 2D Semantic Labels . . . . . . . . . . . . . . 8 3.2 3D Object Detection Branch . . . . . . . . . . . . . . . . . . . . . 10 3.3 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Experiments 15 4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 16 4.4 Experimantal Results . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4.1 Unsupervised Domain Adaptation Results on 3D Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4.2 Within-domain Results on 3D Object Detection . . . . . . . 18 4.5 Ablations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.5.1 Discussion of Frame Selection . . . . . . . . . . . . . . . . 19 4.5.2 Contribution of Adversarial Training . . . . . . . . . . . . 19 4.5.3 Discussion of Input Features of Domain Classifier . . . . . 24 4.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.6.1 Visualization of Unsupervised Domain Adaptation Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.6.2 Visualization of Within-domain Detection Results . . . . . 25 4.6.3 Failure Cases Study . . . . . . . . . . . . . . . . . . . . . . 26 5 Conclusions 30 References 31

    [1] Ahmed, S. M., and Chew, C. M. Density-based clustering for 3d object detection in point clouds. In Proceedings of the IEEE/CVF Conference on Computer
    Vision and Pattern Recognition (CVPR) (June 2020).
    [2] Baruch, G., Chen, Z., Dehghan, A., Dimry, T., Feigin, Y., Fu, P., Gebauer, T.,
    Joffe, B., Kurz, D., Schwartz, A., and Shulman, E. ARKitscenes - a diverse
    real-world dataset for 3d indoor scene understanding using mobile RGB-d data.
    In Thirty-fifth Conference on Neural Information Processing Systems Datasets
    and Benchmarks Track (Round 1) (2021).
    [3] Chen, J., Lei, B., Song, Q., Ying, H., Chen, D. Z., and Wu, J. A hierarchical
    graph network for 3d object detection on point clouds. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 389–
    398.
    [4] Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. Multi-view 3d object detection
    network for autonomous driving, 2016.
    [5] Cheng, B., Sheng, L., Shi, S., Yang, M., and Xu, D. Back-tracing representative points for voting-based 3d object detection in point clouds, 2021.
    [6] Cheng, B., Sheng, L., Shi, S., Yang, M., and Xu, D. Back-tracing representative points for voting-based 3d object detection in point clouds. In 2021
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    (jun 2021), pp. 8959–8968.
    [7] Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., and Niessner,
    M. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
    (CVPR) (July 2017).
    [8] DeBortoli, R., Fuxin, L., Kapoor, A., and Hollinger, G. A. Adversarial training on point clouds for sim-to-real 3d object detection. IEEE Robotics and
    Automation Letters 6, 4 (2021), 6662–6669.
    [9] Ganin, Y., and Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the 32nd International Conference on International
    Conference on Machine Learning - Volume 37 (2015), ICML’15, JMLR.org,
    p. 1180–1189.
    [10] Hou, J., Dai, A., and Niessner, M. 3d-sis: 3d semantic instance segmentation
    of rgb-d scans. In Proceedings of the IEEE/CVF Conference on Computer
    Vision and Pattern Recognition (CVPR) (June 2019).
    [11] Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.-W., and Jia, J. Pointgroup: Dual-set
    point grouping for 3d instance segmentation. In Proceedings of the IEEE/CVF
    Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020).
    [12] Kundu, A., Yin, X., Fathi, A., Ross, D., Brewington, B., Funkhouser, T., and
    Pantofaru, C. Virtual multi-view fusion for 3d semantic segmentation. InComputer Vision – ECCV 2020 (Cham, 2020), A. Vedaldi, H. Bischof, T. Brox,
    and J.-M. Frahm, Eds., Springer International Publishing, pp. 518–535.
    [13] Lahoud, J., and Ghanem, B. 2d-driven 3d object detection in rgb-d images.
    In 2017 IEEE International Conference on Computer Vision (ICCV) (2017),
    pp. 4632–4640.
    [14] Lambert, J., Liu, Z., Sener, O., Hays, J., and Koltun, V. Mseg: A composite
    dataset for multi-domain semantic segmentation, 2021.
    [15] Langer, F., Milioto, A., Haag, A., Behley, J., and Stachniss, C. Domain transfer
    for semantic segmentation of lidar data using deep neural networks. In 2020
    IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    (2020), pp. 8263–8270.
    [16] Liu, Z., Zhang, Z., Cao, Y., Hu, H., and Tong, X. Group-free 3d object detection via transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021), pp. 2929–2938.
    [17] Miao, Z., Chen, J., Pan, H., Zhang, R., Liu, K., Hao, P., Zhu, J., Wang, Y., and
    Zhan, X. Pvgnet: A bottom-up one-stage 3d object detector with integrated
    multi-level features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2021), pp. 3279–3288.
    [18] Qi, C. R., Chen, X., Litany, O., and Guibas, L. J. Imvotenet: Boosting 3d
    object detection in point clouds with image votes. In IEEE Conference on
    Computer Vision and Pattern Recognition (CVPR) (2020).
    [19] Qi, C. R., Litany, O., He, K., and Guibas, L. J. Deep hough voting for 3d
    object detection in point clouds. In Proceedings of the IEEE/CVF International
    Conference on Computer Vision (ICCV) (October 2019).
    [20] Qi, C. R., Liu, W., Wu, C., Su, H., and Guibas, L. J. Frustum pointnets for 3d
    object detection from rgb-d data. In Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition (CVPR) (June 2018).
    [21] Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point
    sets for 3d classification and segmentation. Proc. Computer Vision and Pattern
    Recognition (CVPR), IEEE (2017).
    [22] Qi, C. R., Yi, L., Su, H., and Guibas, L. J. Pointnet++: Deep hierarchical
    feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems(Red Hook,
    NY, USA, 2017), NIPS’17, Curran Associates Inc., p. 5105–5114.
    [23] Rukhovich, D., Vorontsova, A., and Konushin, A. Imvoxelnet: Image to voxels projection for monocular and multi-view general-purpose 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of
    Computer Vision (2022), pp. 2397–2406.
    [24] Shen, X., and Stamos, I. Frustum voxnet for 3d object detection from rgb-d or
    depth images, 2019.
    [25] Shen, X., and Stamos, I. 3d object detection and instance segmentation from
    3d range and 2d color images. Sensors 21, 4 (2021).
    [26] Sindagi, V. A., Zhou, Y., and Tuzel, O. Mvx-net: Multimodal voxelnet for 3d
    object detection. In 2019 International Conference on Robotics and Automation (ICRA) (2019), IEEE, pp. 7276–7282.
    [27] Song, S., Lichtenberg, S. P., and Xiao, J. Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE Conference on Computer
    Vision and Pattern Recognition (CVPR) (June 2015).
    [28] Wang, Y., Chen, X., Cao, L., Huang, W., Sun, F., and Wang, Y. Multimodal
    token fusion for vision transformers, 2022.
    [29] Wu, B., Zhou, X., Zhao, S., Yue, X., and Keutzer, K. Squeezesegv2: Improved
    model structure and unsupervised domain adaptation for road-object segmentation from a lidar point cloud. In 2019 International Conference on Robotics
    and Automation (ICRA) (2019), pp. 4376–4382.
    [30] Xie, Q., Lai, Y.-K., Wu, J., Wang, Z., Zhang, Y., Xu, K., and Wang, J. Mlcvnet:
    Multi-level context votenet for 3d object detection, 2020.
    [31] Yan, Y., Mao, Y., and Li, B. Second: Sparsely embedded convolutional detection. Sensors 18, 10 (2018), 3337.
    [32] Yang, J., Shi, S., Wang, Z., Li, H., and Qi, X. St3d: Self-training for unsupervised domain adaptation on 3d object detection. In Proceedings of the
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
    (June 2021), pp. 10368–10378.
    [33] Zhang, Z., Sun, B., Yang, H., and Huang, Q. H3dnet: 3d object detection
    using hybrid geometric primitives. In Computer Vision –ECCV 2020: 16th
    European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part
    XII (Berlin, Heidelberg, 2020), Springer-Verlag, p. 311–329.

    QR CODE