研究生: |
吳佩珊 Wu, Pei-Shan |
---|---|
論文名稱: |
以具語意之區域為基礎重建單張靜態影像的深度圖 Semantic-Region Based Depth Map Reconstruction from a Single Still Image |
指導教授: |
陳永昌
Chen, Yung-Chang |
口試委員: |
賴文能
Lie, Wen-Nung 林惠勇 Lin, Huei-Yung |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 56 |
中文關鍵詞: | 二維至三維的轉換 、影像切割 、場景辨識 、相對深度估計 |
外文關鍵詞: | 2D-to-3D Conversion, Image Segmentation, Scene Recognition, Relative Depth Estimation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
三維立體影像與影音引導人類視覺享受進入了另一個新世代,三維成像與人類真實世界的感官一致,因此預期將會成為新的主流。市面上已有能直接取得三維資訊的攝影機,但造價高昂且普遍性不高。由於希望現有與過往的二維影像與影音也能有立體視覺的效果,二維至三維的轉換技術成為急迫且實用的解決方式。
與傳統二維平面影像不同的是,深度資訊是三維立體成像的必要因素。現有的二維至三維轉換的演算法,有一些簡單、低運算量,能應用在即時系統,但經常出現錯誤的結果,或者無法處理單張影像。有一些能產生出合理的結果,缺點是需要花費大量時間在電腦運算上。為了平衡這兩種情況,我們提出了一個有效率地、經由單張影像來重建三維資訊的演算法。
在許多用來重建三維資訊的深度特徵中,考慮到廣泛適用性的問題,我們選擇了相對高度特徵(relative height)。輸入一張影像,首先做影像分割將其切割為區塊,這些區塊接著進入到語意(semantic)與表面(surface)的機器學習系統(machine learning system),為之後的深度估測提供了最主要的資訊。接著,我們找出影像中會被人類最先注意到的主要區塊(salient regions),並做了地平線的預測,來提升機器學習估測的準確度。此外,人類在影像中常為主要角色,我們添加了以HOG 特徵為基準的人類偵測來補足整個系統。最後使用語意資訊以及相對深度特徵估算出對應的影像深度圖(depthmap)。
Three-dimensional (3D) image or video has led human vision to a new generation and brought to the next revolution, because the content, provided for 3D display, is much closer to the real world. Since the device which could capture direct 3D content is of high cost and not common at present compared with 2D cameras, also, the tremendous amount of current and past media data in 2D format should be possible to be viewed with a stereoscopic effect. For these reasons, 2D to 3D conversion becomes a practical and urgent solution to meet the requirement for 3D content providers.
Different from traditional two-dimensional (2D) content, depth information is necessary for 3D virtual view generation. Many depth cues can be used to reconstruct 3D information from 2D image. To deal with outdoor scene usually with horizon, we choose relative height cue since it is more general. For an input image, we first use an efficient graph-based image segmentation to compute the segmentation regions. These regions are then used to semantic and surface machine learning system, which gives the main information to estimate the depth map. Second, we identify salient regions that are visually more noticeable to provide the semantic information for the subsequent stages. Horizon detection helps us not only to estimate the depth value but also to enhance semantic and surface classification. Moreover, applying human detection to detect human is a big challenge during semantic labeling stage. Depth estimation based on relative height depth cue is accomplished after these processes.
[1] Q. Wei, "Converting 2D to 3D: A Survey," Faculty of Electrical Engineering, Mathematics and Computer Science, The Netherlands, 2005.
[2] D. Scharstein and R. Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," International Journal of Computer Vision, pp. 7-42, 2002.
[3] C. L. Zitnick and T. Kanade, "A Cooperative Algorithm for Stereo Matching and Occlusion Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 675-684, 7 2000.
[4] Y. Xiong and S. A. SHafer, "Depth from Focusing and Defocusing," in Proceedings of Computer Vision and Pattern Recognition and Proceedings of Image Understanding Workshop, 1993.
[5] P. Li, D. Farin, R. K. Gunnewiek and P. H. N. De, "On Creating Depth Maps from Monoscopic Video using Structure from Motion," IEEE Workshop on Content Generation and Coding for 3D-television, 2006.
[6] S. Battiato, S. Curti, M. L. Cascia, M. Tortora and E. Scordato, "Depth-Map Generation by Image Classification," in Proc. of SPIE Electronic Imaging, 2004.
[7] S. Battiatoa, A. Caprab, S. Curtib and M. L. Casciac, "3D Stereoscopic Image Pairs by Depth-Map Generation," in Proc. of International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), 2004.
[8] A. Saxena, M. Sun and A. Y. Ng, "Make3D: Learning 3-D Scene Structure from a Single Still Image," in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2008.
[9] D. Hoiem, A. Efros and M. Hebert, "Recovering Occlusion Boundaries from an Image," International Journal of Computer Vision (IJCV), pp. 328-346, 2011.
[10] J. Kim, A. Baik, Y. J. Jung and D. Park, "2D-to-3D Conversion by Using Visual Attention Analysis," in Proc. of SPIE Electronic Imaging, 2010.
[11] K. T. Wong and F. Ernst, "Single Image Depth-from-Defocus," Delft university of Technology & Philips Natlab Research, Eindhoven, The Netherlands, 2004.
[12] V. Nedovic, A. W. M. Smeulders, A. Redert and J. M. Geusebroek, "Depth Estimation Via Stage Classification," in IEEE 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2008.
[13] J. Geusebroek and A. Smeulders, "A Six-Stimulus Theory for Stochastic Texture," International Journal of Computer Vision, pp. 7-16, 2005.
[14] M. Dimiccoli and P. Salembier, "Hierarchical region-based representation for segmentation and filtering with depth in single images," in Int. Conf. on Image Processing ICIP, 2009.
[15] G. Palou and P. Salembier, "Occlusion-based depth ordering on monocular images with Binary Partition Tree," in Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011.
[16] Y. J. Jung, A. Baik, J. Kim and D. Park, "A novel 2D-to-3D conversion technique based on relative height depth cue," in Proc. of SPIE, 2009.
[17] C.-C. Cheng, C.-T. Li and L.-G. Chen, "A Novel 2D-to-3D Conversion System Using Edge Information," IEEE Transactions on Consumer Electronics, pp. 1739-1745, 2010.
[18] A. Saxena, S. H. Chung and A. Y. Ng., "Learning Depth from Single Monocular Images," in Neural Information Processing Systems (NIPS) 18, 2005.
[19] D. Hoiem, "Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding," Doctoral Dissertation, CMU-RI-TR-07-28, Robotics Institute, Carnegie Mellon University, 2007.
[20] P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient Graph-Based Image Segmentation," International Journal of Computer Vision, 2004.
[21] S. Goferman, L. Zelnik-Manor and A. Tal, "Context-Aware Saliency Detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
[22] P. S. A. Torralba, "Statistical Context Priming for Object Detection," in International Conference on Computer Vision (ICCV), 2001.
[23] A. Oliva and A. Torralba, "Modeling the shape of the scene: a holistic representation of the spatial envelope," International Journal of Computer Vision (IJCV), 2001.
[24] N. Gershnfeld, The Nature of Mathematical Modeling, Cambridge university press, 1999.
[25] B. C. Russell, A. Torralba, K. P. Murphy and W. T. Freeman, "LabelMe: a database and web-based tool for image annotation," International Journal of Computer Vision, pp. 157-173, 2008.
[26] F. Korč and W. Förstner, "eTRIMS Image Database for Interpreting Images of Man-Made Scenes," Technical report TR-IGG-P-2009-01, University of Bonn, Dept. of Photogrammetry, 2009.
[27] F. Björn, R. Erik and D. Joachim, "A Fast Approach for Pixelwise Labeling of Facade Images," in International Conference on Pattern Recognition (ICPR), 2010.
[28] T. Leung and J. Malik, "Representing and recognizing the visual appearance of materials using three-dimensional textons," International Journal of Computer Vision (IJCV), 2001.
[29] B. E. Boser, I. Guyon and V. Vapnik, "A training algorithm for optimal margin," in Proceedings of the Fifth Annual Workshop on Computational Learning, 1992.
[30] C. Cortes and V. Vapnik, Support-vector network, Machine Learning, 1995.
[31] T.-F. Wu, C.-J. Lin and R. C. Weng, "Probability Estimates for Multi-class Classification by Pairwise Coupling," Journal of Machine Learning Research, 2004.
[32] P. Refregier and F. Vallet, "Probabilistic approach for multiclass classification with neural networks," in Proceedings of International Conference on Artificial Networks, 1991.
[33] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.