研究生: |
曾聖博 Tseng, Sheng-Po |
---|---|
論文名稱: |
從影片重建具有細節的深度圖 Recovering detail-preserving depth maps from a video sequence |
指導教授: |
賴尚宏
Lai, Shang-Hong |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 三維重建 、影片 、深度圖 |
外文關鍵詞: | 3D reconstruction, video, depth maps |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在這篇論文中,我們提出一個可根據室外景影片來建立深度圖的系統。由於影片的種種特性,我們的作法比傳統深度重建的方式參考了更多在時間域上的資訊。
首先我們在影片的連續影格上找出尺度不變特徵轉換的連續對應點,並利用它來實做基於運動的三維重建,以得到影片中所有影像對應的攝影機資訊,包含了位移與旋轉等等。接著,我們針對一些選出的影格計算出有限制的光流法資訊,藉此我們可進一步利用過度限制的線性系統來解出每一張影格的預測深度圖。之後,使用基於平均值移動影像分割來減少無紋理區域的錯誤及異常值。如此一來,參考預測深度圖、分割結果及其他物理限制,便可建立出初始深度圖。此初始深度圖可做為用來建立最終深度圖之馬可夫隨機場的資料項。經由最小化每張影格馬可夫隨機場的能量函數,我們可以讓初始深度圖成為視覺上舒適、能保留細節,且在時間域上連續的深度結果。
In this thesis, we propose a novel system to estimate the depth of outdoor scenes from a video sequence. According to the characteristics of a video, our approach considers more information in the temporal domain than the traditional depth reconstruction methods.
We perform Structure From Motion (SfM) on images sampled from a video by extracting and matching a set of Scale Invariant Feature Transform (SIFT) feature points. This provides some camera information, including 3D translation and rotation, for all the images. Then, we compute the constrained optical flow between selected scenes so that we can solve an over-constrained linear system to estimate the depth map for each frame. After that, mean shift image segmentation [11] is applied to alleviate the estimation problem with textureless regions and outlier points. The initial depth maps can be done by incorporating predicted depth maps, segmentation results, and some geometric constraints. This initial depth map becomes the data term of our pixel-based and region-based Markov Random Field formulation for depth map estimation. By minimizing the associated MRF energy function for each frame, we can refine the depth maps to achieve visually pleasing, detail-preserving and temporally consistent depth estimation results.
[1] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms. In ICCV, 2009.
[2] S. Yu, H. Zhang, and J. Malik. Inferring spatial layout from a single image via depth-ordered grouping. In the 6th IEEE Computer Society Workshop on Perceptual Organization in Computer Vision, Anchorage, Alaska, 23 June 2008.
[3] A. Saxena, M. Sun, and A. Y. Ng. Make3D: Learning 3D Scene Structure from a Single Still Image. In PAMI, 2008.
[4] B. Liu, S. Gould, D. Koller. Single Image Depth Estimation From Predicted Semantic Labels. In CVPR, 2010.
[5] O. Pele, M. Werman. A Linear Time Histogram Metric for Improved SIFT Matching. In ECCV, 2008.
[6] Z. Wang, Z. Zheng. A Region Based Stereo Matching Algorithm Using Cooperative Optimization. In CVPR, 2008.
[7] L. Xu, J. Jia. Stereo Matching: An Outlier Confidence Appraoch. In ECCV, 2008.
[8] A. Klaus, M. Sormann, K. Karer. Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Measure. In ICPR, 2006.
[9] D. Martinec, T. Pajdla. 3D Reconstruction by Fitting Low-Rank Matrices with Missing Data. CVPR 2005, pp. 198-205, IEEE June 2005.
[10] M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, R. Koch, Visual modeling with a hand-held camera, International Journal of Computer Vision 59(3), 207-232, 2004.
[11] D. Comanicu, P. Meer: "Mean shift: A robust approach toward feature space analysis". IEEE Trans. Pattern Anal. Machine Intell., May 2002.
[12] D. Hoiem, A. A. Efros, and M. Hebert. Automatic photo pop-up. In SIGGRAPH, 2005.
[13] A. Saxena, S. H. Cheng, A. Y. Ng. Learning Depth from Single Monocular Images. In NIPS, 2005.
[14] A. Saxena, S. H. Cheng, A. Y. Ng. 3-D depth reconstruction from a single still image. In IJCV, 2007.
[15] A. Saxena, J. Schulte, A. Y. Ng. Depth estimation using monocular and stereo cues. In IJCAI, 2007.
[16] M. Brown, D. G. Lowe. Unsupervised 3D object recognition and reconstruction in unordered datasets. In Proceedings of the international conference on 3D digital imaging and modeling, 2005.
[17] Noah Snavely, Steven M. Seitz, Richard Szeliski. Modeling the World from Internet Photo Collections. International Journal of Computer Vision, 2007.
[18] M. Lourakis, A. Argyros, (2004). The design and implementation of a generic sparse bundle adjustment software package based on the Levenberg–Marquardt algorithm (Technical Report 340). Inst. of Computer Science-FORTH, Heraklion, Crete, Greece.
[19] G. Zhang, J. Jia, T. Wong, H. Bao. Recovering Consistent Video Depth Maps via Bundle Optimization. In CVPR, 2008.
[20] B. K. P. Horn, B. G. Schunck, Determine optical flow. Artificial Intelligence, vol. 17,pp. 185-203, 1981.
[21] C. H. Teng, S. H. Lai, Y. S. Chen. Accurate optical flow computation under non-uniform brightness variations. Computer Vision and Image Understanding, vol. 97, no.3, pp. 315-346, 2005.
[22] C. K. Hsieh, S. H. Lai, Y. C. Chen. Expression-Invariant Face Recognition With Constrained Optical Flow Warping. IEEE Transactions on Multimedia, vol. 11, no. 4, pp. 600-610, 2009.
[23] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. A Comparative Study of Energy Minimization Methods for Markov Random Fields. In Ninth European Conference on Computer Vision (ECCV 2006), volume 2, pages 16-29, Graz, Austria, May 2006.
[24] Y. Boykov, O. Veksler, and R. Zabih. Fast Approximate Energy Minimization via Graph Cuts. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 23, no. 11, pages 1222-1239, November 2001.
[25] V. Kolmogorov and R. Zabih. What Energy Functions can be Minimized via Graph Cuts? In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 26, no. 2, pages 147-159, February 2004. An earlier version appeared in European Conference on Computer Vision (ECCV), May 2002.
[26] Y. Boykov and V. Kolmogorov. An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 26, no. 9, pages 1124-1137, September 2004.
[27] M. T. Pourazad, P. Nasiopoulos, R. K. Ward. An H.264-based Scheme for 2D to 3D Video Conversion. In IEEE Transactions on consumer Electronics, Vol. 55, No2, 2009.
[28] M. Bleyer, M. Gelautz. Temporally Consistent Disparity Maps from Uncalibrated Stereo Videos. In Image and Signal Processing and Analysis, 2009.
[29] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001.
[30] Y. Taguchi, B. Wilburn, C. L. Zitnick. Stereo Reconstruction with Mixed Pixels Using Adaptive Over-Segmentation. In CVPR, 2008.
[31] G. Zhang, J. Jia, T. Wong and H. Bao. Consistent Depth Maps Recovery from a Video Sequence. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 31(6):974-988, 2009.
[32] C. C. Cheng, C.-T. Li, P.-S. Huang, T.-K. Lin, Y.-M. Tsai, and L.-G. Chen. A Block-based 2D-to-3D Conversion System with Bilateral Filter. International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, Jan. 2009.