研究生: |
鄭家鈞 Cheng, Chia-Chun |
---|---|
論文名稱: |
從連續單眼影像進行三維物件偵測 3D Object Detection from Consecutive Monocular Images |
指導教授: |
賴尚宏
Lai, Shang-Hong |
口試委員: |
林彥宇
Lin, Yen-Yu 林嘉文 Lin, Chia-Wen 邱瀞德 Chiu, Ching-Te |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 38 |
中文關鍵詞: | 三維電腦視覺 、深度學習 、運動與追蹤 、色彩與深度影像處理 、機器人視覺 |
外文關鍵詞: | Deep Learning for Computer Vision, 3D Computer Vision, Motion and Tracking, RGBD and Depth Image Processing, Robot Vision |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在自駕車和機器人領域中,偵測物體在三維空間的位置是相當重要的課題。由於激光雷達設備的價格昂貴,許多基於圖像的方法相繼被提出。然而,單眼影像中缺乏深度資訊且難以偵測被遮擋的物體。在我們這篇論文中,我們利用連續的單眼影像來解決這些問題。我們額外預測物體在前後幀的相對運動來重建上一幀的場景,藉此我們可以透過多視角幾何的約束來還原深度資訊。為了能在未標記的資料中學習物體運動,我們提出了一個可以在連續影像中直接學習物體運動的無監督式損失函數。實驗中我們顯示了該運動損失函數以及注意模塊在我們模型中的貢獻。我們使用 KITTI 資料集衡量我們的方法,KITTI 為三維定位提供了三維物體偵測的衡量基準並於自駕車領域中被廣泛使用。在 KITTI 資料集上,我們提出的方法在三維行人和騎自行車人的檢測中均優於目前的最新方法,也在汽車檢測中獲得了具有競爭力結果。
Detecting objects in 3D space plays an important role in scene understanding, such as urban autonomous driving and mobile robot navigation. Many image-based methods are recently proposed due to the high cost of LiDAR. However, monocular images are lack of depth information and difficult to detect objects with occlusion. In this paper, we propose to integrate 2D/3D object detection and 3D motion estimation for consecutive monocular images to overcome these problems. Additionally, we estimate the relative motion of the object between frames to reconstruct the scene in the previous timestamp.
Then, we can recover depth cues from multi-view geometric constraints. To learn motion estimation from unlabeled data, we propose an unsupervised motion loss which learns 3D motion estimation from consecutive images. Our experiments on KITTI dataset show that the proposed method outperforms the state-of-the-art methods for 3D Pedestrian and Cyclist detection and achieves competitive results for 3D Car detection.
[1] Behl, A., Hosseini Jafari, O., Karthik Mustikovela, S., Abu Alhaija, H., Rother, C., and Geiger, A. Bounding boxes, segmentations and object coordinates: How important is recognition for 3d scene flow estimation in autonomous driving scenarios? In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 2574–2583.
[2] Brazil, G., and Liu, X. M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 9287–9296.
[3] Casser, V., Pirk, S., Mahjourian, R., and Angelova, A. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In Proceedings of the AAAI Conference on Artificial Intelligence (2019), vol. 33, pp. 8001–8008.
[4] Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2147–2156.
[5] Chen, X., Kundu, K., Zhu, Y., Berneshawi, A. G., Ma, H., Fidler, S., and Urtasun, R. 3d object proposals for accurate object class detection. In Advances in Neural Information Processing Systems (2015), pp. 424–432.
[6] Choi, M., Kim, H., Han, B., Xu, N., and Lee, K. M. Channel attention is all you need for video frame interpolation. AAAI.
[7] Ding, M., Huo, Y., Yi, H., Wang, Z., Shi, J., Lu, Z., and Luo, P. Learning depth-guided convolutions for monocular 3d object detection. arXiv preprint arXiv:1912.04799 (2019).
[8] Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 2002–2011.
[9] Geiger, A., Lenz, P., and Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), IEEE, pp. 3354–3361.
[10] Guerry, J., Boulch, A., Le Saux, B., Moras, J., Plyer, A., and Filliat, D.Snapnet-r: Consistent 3d multi-view semantic labeling for robotics. In Proceedings of the IEEE International Conference on Computer Vision Workshops (2017), pp. 669–678.
[11] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.
[12] Ku, J., Pon, A. D., and Waslander, S. L. Monocular 3d object detection leveraging accurate proposals and shape reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 11867–11876.
[13] Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019),pp. 12697–12705.
[14] Liu, Z., Wu, Z., and Tóth, R. Smoke: Single-stage monocular 3d object detection via keypoint estimation. arXiv preprint arXiv:2002.10111 (2020).
[15] Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., and Fan, X. Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 6851–6860.
[16] Mousavian, A., Anguelov, D., Flynn, J., and Kosecka, J. 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 7074–7082.
[17] Naiden, A., Paunescu, V., Kim, G., Jeon, B., and Leordeanu, M. Shift r-cnn: Deep monocular 3d object detection with closed-form geometric constraints. In 2019 IEEE International Conference on Image Processing (ICIP) (2019), IEEE, pp. 61–65.
[18] Qi, C. R., Liu, W., Wu, C., Su, H., and Guibas, L. J. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 918–927.
[19] Qi, C. R., Su, H., Mo, K., and Guibas, L. J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 652–660.
[20] Qin, Z., Wang, J., and Lu, Y. Monogrnet: A geometric reasoning network for monocular 3d object localization. In Proceedings of the AAAI Conference on Artificial Intelligence (2019), vol. 33, pp. 8851–8858.
[21] Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (2015), pp. 91–99.
[22] Shi, S., Wang, X., and Li, H. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 770–779.
[23] Simonelli, A., Bulo, S. R., Porzi, L., López-Antequera, M., and Kontschieder, P. Disentangling monocular 3d object detection. In Proceedings of the IEEE International Conference on Computer Vision (2019), pp. 1991–1999.
[24] Smith, L. N., and Topin, N. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications (2019), vol. 11006, International Society for Optics and Photonics, p. 1100612.
[25] Wang, B., An, J., and Cao, J. Voxel-fpn: multi-scale voxel feature aggregation in 3d object detection from point clouds. arXiv preprint arXiv:1907.05286 (2019).
[26] Wang, Y., Chao, W.-L., Garg, D., Hariharan, B., Campbell, M., and Weinberger, K. Q. Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 8445–8453.
[27] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[28] Weng, X., and Kitani, K. Monocular 3d object detection with pseudo-lidar point cloud. In Proceedings of the IEEE International Conference on Computer Vision Workshops (2019), pp. 0–0.
[29] Xu, B., and Chen, Z. Multi-level fusion based 3d object detection from monocular images. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 2345–2353.
[30] You, Y., Wang, Y., Chao, W.-L., Garg, D., Pleiss, G., Hariharan, B., Campbell, M., and Weinberger, K. Q. Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving. arXiv preprint arXiv:1906.06310 (2019).
[31] Zhou, Y., and Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 4490–4499.