簡易檢索 / 詳目顯示

研究生: 覃韋勝
Chin, Wei-Sheng
論文名稱: 針對單張影像並結合高階與低階深度線索的二維到三維轉換
2D-to-3D Conversion by Integrating High Level and Low Level Depth Cues for Single Still Image
指導教授: 陳永昌
Chen, Yung-Chang
口試委員: 賴文能
Lie, Wen-Nung
林惠勇
Lin, Huei-Yung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 71
中文關鍵詞: 深度估測場景重建馬可夫場
外文關鍵詞: Depth Estimation, Scene Reconstruction, Markov Random Field
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於立體顯示裝置日漸降低的價格,立體多媒體開始逐漸進入人們的日常生活中。然而,立體多媒體產業及其相關應用的發展仍然受到了內容不足的明顯阻礙,相反地,傳統的多媒體內容卻十分的成熟且數量龐大。因此我們相當確定「將大量的平面多媒體自動地轉換為立體格式」是一個很好的研究題目。為了達成我們的目的,最重要的一步就是建立起可以描述平面影像與其對應的立體場景之間關係的數學模型。
    在過去的十年之間,已經有許多的場景重建方法被提出,包含了基於紋理特徵、對焦而產生的影像模糊程度、物體表面的光影變化與物體移動的速度推測出對應的場景等的演算法。但是大部分的演算法都只考慮了少數幾種特徵,甚至只有一種的深度線索。根據史丹福大學的研究,只要擁有訓練用的資料,機器學習是一次考慮許多深度線索的好方法。除此之外,我們亦從高階的電腦視覺中找出有關的場景資訊。以上所說的資訊將會利用馬可夫場結合成一個二階的目標函數,再加上我們所設計的不等式限制條件,影像所對應的深度圖就可以用二次規劃來求解。
    總的來說,我們考慮了高階與低階的深度線索而提出了一個二維到三維的自動轉換演算法。這個方法可以做為推廣立體多媒體與進階影像分析的工具來使用。


    Due to the rapid cost down of 3D display devices, 3D multimedia gradually comes into our daily life in recent years. However, the promotion of 3D multimedia and related applications are still restricted by insufficient stereoscopic contents. By contrast, we notice that 2D materials are richer and easily available, therefore, we can fairly conclude that generating 3D contents from these existing materials is a good idea. To accomplish our purpose, it is necessary to design a mathematical model to describe the relation between an image and its corresponding scene.
    In the past 10 years, many researches about 2D-to-3D conversion have been published such as depth-from-texture, depth-from-focus, depth-from-shading and depth-from-motion, but most of them consider only few of depth cues. According to the work done by Sexena et al., we use machine learning to combine a lot of depth cues as our feature vector, and explore scene structure with high level computer vision, e.g. natural boundary detection and surface classification. In detail, the basic unit of a scene forms a small plane called ‘superpixel’ that aggregates similar pixels. All depth cues and scene information, which correspond to local term and smoothness term respectively, are combined using a high order Markov random field. Because we chose L2-norm as our error function, the problem to find optimal depthmap could be solved using quadratic programming.
    Consequently, we proposed a 2D-to-3D conversion algorithm considering high level and low level vision cues for generation of 3D contents automatically. We can regard it as a tool to promote 3D multimedia and advanced image analysis.

    Abstract ..................................................i Table of Contents ........................................ii List of Figures ..........................................iv List of Tables ..........................................vii Chapter 1: Introduction....................................1 1-1. Overview of 3-D multimedia.........................1 1-2. Motivation.........................................6 1-3. Thesis organization................................7 Chapter 2: Related Works...................................8 2-1. Scene reconstruction...............................8 2-2. Scene understanding...............................15 Chapter 3: Proposed Method................................17 3-1. Algorithm overviews...............................17 3-2. Color transfer....................................19 3-3. Small plane model.................................20 3-4. Surface classification...........................22 3-4-1. Generating spatial support......................22 3-4-2. Surface feature selection and extraction........24 3-4-3. Inference process...............................31 3-5. Local depth estimation............................33 3-6. Connectivity and co-planarity structure...........37 3-7. Energy function...................................40 Chapter 4: Experiments and Results........................44 4-1. Training Process..................................44 4-1-1. Classifiers for surface classification..........45 4-1-2. Depth estimator.................................48 4-1-3. Classifiers for co-planarity....................51 4-2. Inference.........................................52 Chapter 5: Conclusions and Future Works...................63 Reference.................................................66

    [1] K. Müller, "3D Video Formats and Coding Standard," Fraunhofer Institute for Telecommunication, 11 2 2009. [Online]. Available: http://see.xidian.edu.cn/conference/mpegjpeg/workshop/PPT/karstenmuller.pdf.
    [2] W.-Y. Chen, . Y.-L. Chang, H.-K. Chiu, S.-Y. Chien and L.-G. Chen, "Real-Time Depth Image based Rendering Hardware Accelerator for Advanced Three Dimensional Television System," in IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, 2006.
    [3] lcdprice, "Learn principles to show 3D images from past to present," 2 9 2011. [Online]. Available: http://www.lcdprice.net/learn-principles-3dimages/. [Accessed 17 6 2012].
    [4] W. contributors, "Parallex," Wikipedia, 6 6 2012. [Online]. Available: http://en.wikipedia.org/w/index.php?title=Parallax&oldid=496240258. [Accessed 11 6 2012].
    [5] A. Vetro, A. M. Tourapis, K. Müller and T. Chen, "3D-TV Content Storage and Transmission," IEEE Transactions on Broadcasting, vol. 57, no. 2, pp. 384-394, 2011.
    [6] H. Brust, A. Smolic, K. Mueller, G. Tech and T. Wiegand, "Mixed Resolution Coding of Stereoscopic Video for Mobile Devices," in 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, Potsdam, Germany, 2009.
    [7] P. Merkle, A. Smolic, K. Müller and T. Wiegand, "Efficient Prediction Structures for Multiview Video Coding," IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 11, pp. 1461-1473, 2007.
    [8] A. J. Woods, J. O. Merritt, S. A. Benton and M. T. Bolas, "Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV," in Stereoscopic Displays and Virtual Reality Systems XI, San Jose, CA, USA, 2004.
    [9] F. Cozman and E. Krotkov, "Depth from Scattering," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR), San Juan , Puerto Rico, 1997.
    [10] J. H. Elder and S. W. Zucker, "Local Scale Control for Edge Detection and Blur," IEEE Trans. on Pattern Analysis and Machine Vision, vol. 20, no. 7, pp. 699-716, 1998.
    [11] S. K. Nayar and Y. Nakagawa , "Shape from Focus," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 8, pp. 824-831, 1994.
    [12] . D. Ziou, S. Wang and J. Vaillancourt, "Depth from Defocus using the Hermite Transform," in International Conference on Image Processing, Chicago, Illinois, USA, 1998.
    [13] P. Favaro, S. Soatto, M. Burger and S. . J. Osher, "Shape from Defocus via Diffusion," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 518-531, 2008.
    [14] T. Lindeberg and J. Garding, "Shape from texture from a multi-scale," in 4rd International Conference on Computer Vision, Berlin, Germany, 1993.
    [15] A. M. Loh and R. Hartley, "Shape from Non-homogeneous, Nonstationary, Anisotropic, Perspective Texture," in British Machine Vision Conference, Oxford, UK, 2005.
    [16] S. Battiato, S. Curti, M. L. Cascia, M. Tortora and E. Scordato, "Depth Map Generation by Image Classification," in Proceedings of SPIE 5302, San Jose, CA, USA, 2004.
    [17] Y.-M. Tsai, Y.-L. Chang and L.-G. Chen, "Block-based Vanishing Line and Vanishing Point Detection for 3D Scene Reconstruction," Proceedings of the International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 586-589, 2006.
    [18] R. Szeliski, Computer Vision: Algorithms and Applications, Springer, 2010.
    [19] R. Zhang, P.-S. Tsai, J. . E. Cryer and M. Shah, "Shape-from-shading: a Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 8, pp. 690-706, 1999.
    [20] D. Hoiem, A. A. Efros and M. Hebert, "Recovering Occlusion Boundaries from an Image," International Journal of Computer Vision, vol. 91, no. 3, 2011.
    [21] X. Yan, Y. Yang, G. Er and Q. Dai, "Depth Map Generation for 2D-to-3D Conversion by Limited User Inputs and Depth Propagation," in 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, Antalya, Turkey, 2011.
    [22] Y. Lu, J. Z. Zhang, Q. M. J. Wu and Z.-n. Li, "A Survey of Motion-parallax-based 3D Reconstruction Algorithms," IEEE Trans. on Systems, Man, and Cybernetics, vol. 34, pp. 532-548, 2004.
    [23] Q. Gang and R. Chellappa, "Structure from Motion Using Sequential Monte Carlo Methods," Proceedings on IEEE International Conference on Computer Vision, vol. 2, pp. 614-621, 2004.
    [24] D. Kim, D. Min and K. Sohn, "A stereoscopic video generation method using stereoscopic display characterization and motion analysis," IEEE Trans. on Broadcasting, vol. 54, no. 2, pp. 188-197, 2008.
    [25] E. Tola, V. Lepetit and P. Fua, "A Fast Local Descriptor for Dense Matching," in IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008.
    [26] Z. Guofeng , J. Jia, T.-T. Wong and H. Bao, "Recovering Consistent Video Depth Maps via Bundle Optimization," in IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008.
    [27] T. Kanade and M. Okutomi, "A stereo matching algorithm with an adaptive window: theory and experiment," Proceedings - IEEE International Conference on Robotics and Automation, vol. 1095, no. 2, p. 1088, 1991.
    [28] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2004.
    [29] S. Gould, R. Fulton and D. Koller, "Decomposing a scene into geometric and semantically consistent regions," Proc. IEEE International Conference on Computer Vision, pp. 1-8, Sep.–Oct. 2009.
    [30] C. M. Biship, Pattern Recognition and Machine Learning, Springer, 2006.
    [31] A. Saxena, M. Sun and A. Y. Ng, "Make3D: Learning 3D Scene Structure from a Single Still Image," IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), vol. 31, no. 5, pp. 824-840, 2009.
    [32] A. Vailaya, M. A. T. Figueiredo, A. K. Jain and H.-J. Zhang, "Image Classification for Content-Based Indexing," IEEE Transactions on Image Processing, vol. 10, no. 1, pp. 117-129, 2001.
    [33] D. LU and Q. WENG, "A Survey of Image Classification Methods and Techniques for Improving Classification Performance," International Journal of Remote Sensing, vol. 28, no. 5, pp. 823-870, 2007.
    [34] Z. Lu, Y. Peng and H. H. Ip, "Image categorization via robust pLSA," Pattern Recognition Letters, vol. 31, pp. 36-43, 2010.
    [35] Z. Lin and L. . S. Davis, "A Pose-Invariant Descriptor for Human Detection and Segmentation," in 10th European Conference on Computer Vision, Marseille, France, 2008.
    [36] H. Schneiderman and T. Kanade, "A Statistical Method for 3D Object Detection Applied to Faces and Cars," in IEEE Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 2000.
    [37] C. Li , L. Guo and . Y. Hu, "A New Method Combining HOG and Kalman Filter for Video-based Human Detection and Tracking," in 3rd International Congress on Image and Signal Processing, Yantai, China, 2010.
    [38] C. Liu, J. Yuen and A. Torralba, "SIFT Flow: Dense Correspondence across Scenes and Its Applications," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 5, pp. 978-994, May 2011.
    [39] C. Liu, J. Yuen and A. Torralba, "Nonparametric Scene Parsing via Label Transfer," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 33, pp. 2368 - 2382, 2011.
    [40] P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient belief propagation for early vision," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 2004.
    [41] A. Saxena, S. H. Chung and A. Y. Ng, "Learning Depth from Single Monocular Images," Neural Information Processing Systems, vol. 18, 2005.
    [42] E. Reinhard, M. Adhikhmin, B. Gooch and P. Shirley, "Color Trnasfer Between Images," IEEE Computer Graphics and Applications, vol. 21, no. 34-41, 2001.
    [43] P. Felzenszwalb and D. Huttenlocher, "Efficient Graph-Based Image Segmentation," International Journal of Computer Vision, vol. 59, no. 2, pp. 167-181, 2004.
    [44] D. Hoiem, A. A. Efros and M. Hebert, "Recovering Surface Layout from an Image," IJCV, vol. 75, no. 1, pp. 151-172, 2007.
    [45] S. Gould, R. Fulton and D. Koller, "Decomposing a Scene into Geometric and Semantically Consistent Regions," in IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 2009.
    [46] D. Hoiem, A. A. Efros and M. Hebert, "Geometric Context from a Single Image," in Tenth IEEE International Conference on Computer Vision, Beijing, China, 2005.
    [47] T. Cour, F. Benezit and J. Shi, "Spectral Segmentation with Multiscale Graph Decomposition," in IEEE International Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005.
    [48] S. Jianbo and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, 2000.
    [49] D. Hoiem, "Seeing the world behind the image," Robotics Institute, Carnegie Mellon University, Pittsburgh,, 2007.
    [50] T. Leung and J. Malik, "Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons," International Journal of Computer Vision, vol. 43, no. 1, pp. 29-44, 2001.
    [51] J. Košecká and W. Zhang, "Video Compass," in 7rd European Conference on Computer Vision(ECCV), Copenhagen, 2002.
    [52] W. contributors, "Connected-component labeling," Wikipedia, The Free Encyclopedia. , 18 3 2012. [Online]. Available: http://en.wikipedia.org/wiki/Connected-component_labeling. [Accessed 26 5 2012].
    [53] J. Choi, W. Kim, H. Kong and C. Kim, "Real-time Vanishing Point Detection Using the Local Dominant Orientation Signature," 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, pp. 1-4, 16-18 3 2011.
    [54] A. Minagawa, N. Tagawa, T. Moriya and T. Gotoh, "Line clustering with vanishing point and vanishing line," International Conference on Image Analysis and Processing, pp. 388-393, 1999.
    [55] C. Tomasi and R. Manduchi , "Bilateral Filtering for Gray and Color Images," in Sixth International Conference on Computer Vision, Bombay, India, 1998.
    [56] A. Buades and J.-M. Morel, "A Non-local Algorithm for Image Denoising," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, 2005.
    [57] J. Friedman, T. Hastie and R. Tibshirani, "Additive Logistic Regression: A Statistical View of Boosting," The Annals of Statistics, vol. 28, no. 2, p. 337–407, 2000.
    [58] T. Hassner and R. Basri, "Example Based 3D Reconstruction from Single 2D Images," in Conference on Computer Vision and Pattern Recognition Workshop, New York, USA, 2006.
    [59] A. Saxena, S. H. Chung and . A. . Y. Ng, "3-D Depth Reconstruction from a Single Still Image," International Journal of Computer Vision, vol. 76, pp. 53-69 , 2008.
    [60] M. Rachidi, A. Marchadier, C. Gadois, E. Lespessailles, C. Chappard and C. L. Benhamou, "Laws' Masks Descriptors Applied to Bone Texture Analysis: an Innovative and Discriminant Tool in Osteoporosis," Skeletal Radiology, vol. 37, no. 6, pp. 541-548, 2008.
    [61] D. G. Lowe, "Distinctive Image Features from Scale-invariant Keypoints," International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
    [62] M. Antonini, M. Barlaud, P. Mathieu and I. Daubecuies, "Image Coding Using Wavelet Transform," IEEE Transactions on Image Processing, vol. 1, no. 2, pp. 205-220, 1992.
    [63] P. Pérez, M. Gangnet and A. Blake, "Poisson image editing," ACM SIGGRAPH, pp. 313-318, 2003.
    [64] D. R. Martin, C. C. Fowlkes and J. Malik, "Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues," IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 26, no. 5, pp. 530-549, 2004.
    [65] M. Basu, "Gaussian Derivative Model for Edge Enhancement," Pattern Recognition, vol. 27, no. 11, pp. 1451-1461, 1994.
    [66] K. G. Derpanis and J. M. Gryn, "Three-Dimensional Nth Derivative of Gaussian Separable Steerable Filters," in IEEE International Conference on Image Processing, Genova, Italy, 2005.
    [67] Y. Rubner and C. Tomasi, "Coalescing Texture Descriptors," in ARPA Image Understanding Workshop, 1996, pp. 927-935.
    [68] M. Grant , S. Boyd and . H. Kimura, "Graph implementations for nonsmooth convex programs," in Recent Advances in Learning and Control, Springer-Verlag Limited, 2008, pp. 95-110.
    [69] M. Grant and S. Boyd, "CVX: Matlab Software for Disciplined Convex Programming, version 1.21," 3 2011. [Online]. Available: http://cvxr.com/cvx/citing/. [Accessed 7 6 2012].
    [70] H. Derek, "Surface Context," Illinois University, 2005. [Online]. Available: http://www.cs.illinois.edu/homes/dhoiem/. [Accessed 8 6 2012].
    [71] A. Saxena and N. Y. Andrew, "http://make3d.cs.cornell.edu/," [Online].
    [72] T. Hofmann, "Unsupervised learning by probabilistic latent semantic analysis," Machine Learning, vol. 41, pp. 177-196, 2001.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE