研究生: |
郭芯妤 Kuo, Xin-Yu |
---|---|
論文名稱: |
基於光流與單目相機視覺改良自我運動估測之方法 Enhancement of Monocular Camera Ego-Motion Estimation via Optical Flow for Visual Odometry |
指導教授: |
李濬屹
Lee, Chun-Yi |
口試委員: |
周志遠
Chou, Jerry 胡敏君 Hu, Min-Chun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 32 |
中文關鍵詞: | 視覺里程計 、光流 、單目相機 |
外文關鍵詞: | visualodometry, opticalflow, monocularcamera |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們提出了一種基於機器學習的框架,稱其為單目視覺測距法的去耦自我運動估計方法,縮寫為\textit{DCVO}。 DCVO的主要目的是透過輸入由四個模塊組成的體系結構來提高視覺里程計(VO)的預測精準度:光流預測模型,深度預測模型,光流與深度融合模塊以及相機運動預測模型。透過將影片影格的RGB照片、預測的光流圖和深度圖以不同的方式融合在一起,使DCVO能夠學習出有助於相機運動估計的不同特徵。在本論文中,我們探索了多種融合上述特徵的融合策略,並在定性和定量方面進行了比較。為了確定這些特徵對VO預測準確性的影響,我們在本文中考察了四種不同的訓練方法:一種監督式訓練方法,以及三種同時將監督式和非監督式訓練使用的損失函數應用於DCVO框架的其他方法。監督式訓練方法仰賴於與真實數據的差異,而後三種方案除了監督式方法外還包含輔助用的損失函數項。我們在KITTI里程表數據集上進行了廣泛的實驗,並將DCVO與許多代表性基準模型進行比較分析。我們使用KITTI數據集中的訓練用以及測試用的路段影片的測量結果顯示,與其他基準模型相比,我們的方法在使用光流圖和RGB幀的融合配置時可達到最低的錯誤率。為了進一步改善DCVO,我們對不同組件進行了多次組合分析,包括架構和訓練方式的不同組合。分析的結果顯示,使用深度差作為模型的輸入方式相較於使用深度有較好的結果,在未來VO研究中是具有潛力的特徵項目。
In this thesis, we propose a learning based framework called decoupled ego-motion estimation methodology for monocular visual odometry, abbreviated as \textit{DCVO}. The primary objective of DCVO is to enhance the prediction accuracy of visual odometry (VO) by introducing an architecture consisting of four modules: a flow estimation module, a depth estimation module, a flow-depth fusion module, as well as a pose estimation module. By allowing the RGB input frames to be fused together with the predicted flow and depth maps in different manners, DCVO enables investigation of different features that contribute to the estimation of camera motion. In this thesis, we explore various fusion strategies for combining the above features, and compare them in qualitative and quantitative perspectives. In order to identify the impacts of these features on the VO prediction accuracy, we inspect four different training schemes in this thesis: one supervised training scheme and three additional schemes that concurrently applying both supervised and unsupervised training loss terms to the DCVO framework. The supervised training scheme relies on the comparison against the ground truth data, while the latter three schemes incorporate auxiliary loss terms in addition to the supervised scheme. We extensively perform experiments on the KITTI Odometry Dataset, and examine the proposed DCVO against a number of representative baseline methods. Our measured results on the training and testing sequences of the KITTI dataset reveal that one of the fusion configuration that concentrates on flow maps and RGB frames leads to the lowest error rates, when compared to the other baselines considered in our experiments. In order to further improve DCVO, we conduct multiple ablation analysis for different components, comprising the architectures as well as the training schemes. The analysis points out that a new delta depth representation is a promising candidate feature to be incorporated in the future VO researches.
[1] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and ego-motion from video. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 6612–6619, Jul. 2017.
[2] T. Shen, Z. Luo, L. Zhou, H. Deng, R. Zhang, T. Fang, and L. Quan. Beyond photometric loss for self-supervised ego-motion estimation. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 6359–6365, May 2019.
[3] D. Nist ́er, O. Naroditsky, and J. R. Bergen. Visual odometry. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), volume 1, pages I–I, 2004.
[4] S. Leutenegger, M. Chli, and R. Siegwart. Brisk: Binary robust invariant scalable keypoints. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 2548–2555, 2011.
[5] M. Calonder, V. Lepetit, M. O ̈zuysal, T. Trzcin ́ski, C. Strecha, and P. Fua. Brief: Computing a local binary descriptor very fast. IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), 34:1281–1298, 2012.
[6] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. In Proc. European Conf. Computer Vision (ECCV), 2006.
[7] E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski. ORB: An efficient alternative to SIFT or SURF. In Proc. IEEE Int. Conf. Computer Vision
(ICCV), pages 2564–2571, 2011.
[8] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. of Computer Vision, 60:91–110, 2004.
[9] H. Bay, T. Tuytelaars, and L. V. Gool. SURF: Speeded up robust features. In Proc. European Conf. Computer Vision (ECCV), May 2006.
[10] F. Xue, Q. Wang, X. Wang, W. Dong, J. Wang, and H. Zha. Guided feature selection for deep visual odometry. ArXiv, abs/1811.09935, 2018.
[11] W. Zhang and G. Zhang. Image feature matching based on semantic fusion description and spatial consistency. Symmetry, 10:725, 2018.
[12] J. Flynn, I. Neulander, J. Philbin, and N. Snavely. Deep stereo: Learning to predict new views from the world’s imagery. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 5515–5524, Jun. 2015.
[13] R. Li, S. Wang, Z. Long, and D. Gu. UnDeepVO: Monocular visual odometry through unsupervised deep learning. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 7286–7291, May 2017.
[14] N. Yang, Y. Xu, J. Stu ̈ckler, and D. Cremers. Deep virtual stereo odometry: Leveraging deep depth prediction for monocular direct sparse odometry. In Proc. European Conf. Computer Vision (ECCV), Sep. 2018.
[15] V. Mohanty, S. Agrawal, S. Datta, A. Ghosh, V. D. Sharma, and D. Chakravarty. Deepvo: A deep learning approach for monocular visual odometry. ArXiv, abs/1611.06069, 2016.
[16] S. Wang, R. Clark, H. Wen, and A. Trigoni. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 2043–2050, 2017.
[17] Y. Almalioglu, M. Risqi U. Saputra, P. P. B. de Gusm ̃ao, A. Markham, and A. Trigoni. Ganvo: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 5474–5480, 2018.
[18] R. Mahjourian, M. Wicke, and A. Angelova. Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 5667–5675, 2018.
[19] V. M. Babu, S. Kumar, A. Majumder, and K. Das. UnDEMoN 2.0: Im- proved depth and ego motion estimation through deep image sampling. arXiv:1811.10884, Nov. 2018.
[20] G. Costante and T. A. Ciarfuglia. LS-VO: Learning dense optical subspace for robust visual odometry estimation. In Proc. IEEE Robotics and Automation Letters (RA-L), volume 3, pages 1735–1742, Sep. 2017.
[21] P. Muller and A. E. Savakis. Flowdometry: An optical flow and deep learning based approach to visual odometry. In Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), pages 624–631, May 2016.
[22] M. R. U. Saputra, P. P. B. de Gusmao, S. Wang, A. Markham, and A. Trigoni. Learning monocular visual odometry through geometry-aware curriculum learning. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 3549–3555, May 2019.
[23] Z. Yin and J. Shi. GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 1983–1992, 2018.
[24] J.-K. Lee and K. Yoon. Real-time joint estimation of camera orientation and vanishing points. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 1866–1874, 2015.
[25] P. Kim, B. Coltin, and H. J. Kim. Visual odometry with drift-free rotation estimation using indoor scene regularities. In Proc. British Machine Vision Conference (BMVC), 2017.
[26] J. Straub, N. Bhandari, J. J. Leonard, and J. W. Fisher. Real-time manhattan world rotation estimation in 3d. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pages 1913–1920, 2015.
[27] M. Kaess, K. Ni, and F. Dellaert. Flow separation for fast and robust stereo odometry. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 3539–3544, 2009.
[28] L. M. Paz, P. Pinies, J. D. Tardo ́s, and J. Neira. Large-scale 6-dof slam with stereo-in-hand. IEEE Trans. Robotics, 24:946–957, 2008.
[29] J. Tardif, Y. Pavlidis, and K. Daniilidis. Monocular visual odometry in urban environments using an omnidirectional camera. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pages 2531–2538, 2008.
[30] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazırba ̧s, V. Golkov, P. v.d. Smagt, D. Cremers, and T. Brox. FlowNet: Learning optical flow with
convolutional networks. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 2758–2766, Dec. 2015.
[31] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 1647–1655, Jul. 2017.
[32] S. Wang, R. Clark, H. Wen, and A. Trigoni. End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. Int. J. Robotics Res., 37:513–542, 2018.
[33] Y. Li, G. Wang, X. Ji, Y. Xiang, and D. Fox. Deepim: Deep iterative matching for 6d pose estimation. In Proc. European Conf. Computer Vision (ECCV), 2018.
[34] F. Xue, X. Wang, S. Li, Q. Wang, J. Wang, and H. Zha. Beyond track- ing: Selecting memory and refining poses for deep visual odometry. ArXiv, abs/1904.01892, 2019.
[35] V. M. Babu, K. Das, A. Majumdar, and S. Kumar. UnDEMoN: Unsupervised deep network for depth and ego-motion estimation. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pages 1082–1088, Sep. 2018.
[36] P. Denis, J. H. Elder, and F. J. Estrada. Efficient edge-based methods for estimating manhattan frames in urban imagery. In Proc. European Conf. Computer Vision (ECCV), 2008.
[37] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. Manhattan-world stereo.
In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
[38] C. Kerl, Ju ̈rgen Sturm, and D. Cremers. Robust odometry estimation for rgb-d cameras. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 3748–3754, 2013.
[39] J. Straub, N. Bhandari, J. J. Leonard, and J. W. Fisher. Real-time manhattan world rotation estimation in 3d. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pages 1913–1920, 2015.
[40] Y. Zhou, L. Kneip, C. R. Opazo, and H. Li. Divide and conquer: Efficient density-based tracking of 3d sensors in manhattan worlds. In Proc. Asian Conf. Computer Vision (ACCV), 2016.
[41] P. Kim, B. Coltin, and H. Jin Kim. Visual odometry with drift-free rotation estimation using indoor scene regularities. In Proc. British Machine Vision Conference (BMVC), 2017.
[42] P. Kim, B. Coltin, and H. J. Kim. Low-drift visual odometry in structured environments by decoupling rotational and translational motion. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 7247–7253, 2018.
[43] B. Caprile and V. Torre. Using vanishing points for camera calibration. Int. J. Computer Vision, 4:127–139, 1990.
[44] Y. Jo, J. Jang, and J. Paik. Camera orientation estimation using motion based vanishing point detection for automatic driving assistance system. In Proc. IEEE Int. Conf. Consumer Electronics (ICCE), pages 1–2, 2018.
[45] J.-K. Lee and K. Yoon. Joint estimation of camera orientation and vanishing points from an image sequence in a non-manhattan world. Int. J. Computer Vision, 127:1426 – 1442, 2019.
[46] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 3354–3361, Jun. 2012.
[47] L. H. Matthies and S. A. Shafer. Error modeling in stereo navigation. IEEE J. Robotics and Automation, 3:239–248, 1987.
[48] H. Moravec. Obstacle avoidance and navigation in the real world by a seeing robot rover. Technical Report CMU-RI-TR-80-03, Carnegie Mellon University, Pittsburgh, PA, Sep. 1980.
[49] C. F. Olson, L. H. Matthies, M. Schoppers, and M. W. Maimone. Rover navi- gation using stereo ego-motion. Robotics and Autonomous Systems, 43(4):215 – 229, Jun. 2003.
[50] S. Lacroix, A. Mallet, R. Chatila, and L. Gallo. Rover self localization in planetary-like environments. In Proc. Int. Symposium on Artificial Intelligence, Robotics and Automation in Space (i-SAIRAS), volume 440, page 433, Jul. 1999.
[51] M. Kaess, K. Ni, and F. Dellaert. Flow separation for fast and robust stereo odometry. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 3539–3544, 2009.
[52] A. Howard. Real-time stereo visual odometry for autonomous ground vehicles. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pages 3946–3952, Sep. 2008.
[53] I. Mahon, S. B. Williams, O. Pizarro, and M. Johnson-Roberson. Efficient view- based SLAM using visual loop closures. IEEE Trans. Robotics, 24:1002–1014, 2008.
[54] A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 1403–1410, Oct. 2003.
[55] J. Campbell, R. Sukthankar, I. R. Nourbakhsh, and A. Pahwa. A robust visual odometry and precipice detection system using consumer-grade monocular vision. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 3421–3427, Apr. 2005.
[56] D. Scaramuzza and R. Siegwart. Appearance-guided monocular omnidirectional visual odometry for outdoor ground vehicles. IEEE Trans. Robotics (TRO), 24(5):1015–1026, Oct. 2008.
[57] X. Wang, H. B. Zhang, X. Yin, M. Du, and Q. Chen. Monocular visual odometry scale recovery using geometrical constraint. In Proc. IEEE Int. Conf. Robotics and Automation (ICRA), pages 988–995, May 2018.
[58] G. Iyer, J. K. Murthy, G. Gupta, K. M. Krishna, and L. Paull. Geometric consistency for self-supervised end-to-end visual odometry. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshop, pages
380–388, Jun. 2018.
[59] K. R. Konda and R. Memisevic. Learning visual odometry with a convolutional network. In Proc. Int. Conf. Computer Vision Theory and Applications (VISAPP), 2015.
[60] T. Zhang, X. Liu, K. Ku ̈hnlenz, and M. Buss. Visual odometry for the autonomous city explorer. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), pages 3513–3518, 2009.
[61] T. A. Ciarfuglia, G. Costante, P. Valigi, and E. Ricci. Evaluation of non- geometric methods for visual odometry. Robotics and Autonomous Systems, 62:1717–1730, 2014.
[62] G. Costante, M. Mancini, P. Valigi, and T. A. Ciarfuglia. Exploring represen- tation learning with cnns for frame-to-frame ego-motion estimation. In Proc. IEEE Robotics and Automation Letters (RA-L), volume 1, pages 18–25, 2016.
[63] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping. In Proc. European Conf. Computer Vision (ECCV), May 2004.
[64] N. Qian. Binocular disparity and the perception of depth, 1997.
[65] C. Godard, O. M. Aodha, and G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 270–279, Jul. 2017.
[66] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition
(CVPR), pages 770–778, 2015.
[67] M. Aubry, D. Maturana, A. A. Efros, B. C. Russell, and J. Sivic. Seeing 3D chairs: Exemplar part-based 2D-3D alignment using a large dataset of CAD models. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 3762–3769, 2014.
[68] N. Mayer, E. Ilg, P. H ̈ausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), pages 4040–4048, 2016.
[69] R. Szeliski. Computer Vision: Algorithms and Applications. Springer- Verlag, Berlin, Heidelberg, 1st edition, 2010.
[70] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing, 13(4):600–612, Apr. 2004.
[71] P. Heise, S. Klose, B. Jensen, and A. Knoll. PM-Huber: Patchmatch with
huber regularization for stereo matching. In Proc. IEEE Int. Conf. Computer Vision (ICCV), pages 2360–2367, Dec. 2013.
[72] R. Mur-Artal and J. D. Tardo ́s. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robotics, 33(5):1255– 1262, Jun. 2017.
[73] A. Geiger, J. Ziegler, and C. Stiller. StereoScan: Dense 3d reconstruction in real-time. In Proc. IEEE Intelligent Vehicles Symposium (IV), pages 963–968, Jun. 2011.
[74] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, and Z. Chen et al. Tensor- Flow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
[75] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proc. Int. Conf. Learning Representations (ICLR), May 2015.