以具語意之區域為基礎重建單張靜態影像的深度圖

簡易檢索 / 詳目顯示

回結果列表

研究生：	吳佩珊 Wu, Pei-Shan
論文名稱：	以具語意之區域為基礎重建單張靜態影像的深度圖 Semantic-Region Based Depth Map Reconstruction from a Single Still Image
指導教授：	陳永昌 Chen, Yung-Chang
口試委員:	賴文能 Lie, Wen-Nung 林惠勇 Lin, Huei-Yung
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2012
畢業學年度：	100
語文別：	英文
論文頁數：	56
中文關鍵詞：	二維至三維的轉換、影像切割、場景辨識、相對深度估計
外文關鍵詞：	2D-to-3D Conversion, Image Segmentation, Scene Recognition, Relative Depth Estimation
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

三維立體影像與影音引導人類視覺享受進入了另一個新世代，三維成像與人類真實世界的感官一致，因此預期將會成為新的主流。市面上已有能直接取得三維資訊的攝影機，但造價高昂且普遍性不高。由於希望現有與過往的二維影像與影音也能有立體視覺的效果，二維至三維的轉換技術成為急迫且實用的解決方式。
與傳統二維平面影像不同的是，深度資訊是三維立體成像的必要因素。現有的二維至三維轉換的演算法，有一些簡單、低運算量，能應用在即時系統，但經常出現錯誤的結果，或者無法處理單張影像。有一些能產生出合理的結果，缺點是需要花費大量時間在電腦運算上。為了平衡這兩種情況，我們提出了一個有效率地、經由單張影像來重建三維資訊的演算法。
在許多用來重建三維資訊的深度特徵中，考慮到廣泛適用性的問題，我們選擇了相對高度特徵(relative height)。輸入一張影像，首先做影像分割將其切割為區塊，這些區塊接著進入到語意(semantic)與表面(surface)的機器學習系統(machine learning system)，為之後的深度估測提供了最主要的資訊。接著，我們找出影像中會被人類最先注意到的主要區塊(salient regions)，並做了地平線的預測，來提升機器學習估測的準確度。此外，人類在影像中常為主要角色，我們添加了以HOG 特徵為基準的人類偵測來補足整個系統。最後使用語意資訊以及相對深度特徵估算出對應的影像深度圖(depthmap)。

Three-dimensional (3D) image or video has led human vision to a new generation and brought to the next revolution, because the content, provided for 3D display, is much closer to the real world. Since the device which could capture direct 3D content is of high cost and not common at present compared with 2D cameras, also, the tremendous amount of current and past media data in 2D format should be possible to be viewed with a stereoscopic effect. For these reasons, 2D to 3D conversion becomes a practical and urgent solution to meet the requirement for 3D content providers.
Different from traditional two-dimensional (2D) content, depth information is necessary for 3D virtual view generation. Many depth cues can be used to reconstruct 3D information from 2D image. To deal with outdoor scene usually with horizon, we choose relative height cue since it is more general. For an input image, we first use an efficient graph-based image segmentation to compute the segmentation regions. These regions are then used to semantic and surface machine learning system, which gives the main information to estimate the depth map. Second, we identify salient regions that are visually more noticeable to provide the semantic information for the subsequent stages. Horizon detection helps us not only to estimate the depth value but also to enhance semantic and surface classification. Moreover, applying human detection to detect human is a big challenge during semantic labeling stage. Depth estimation based on relative height depth cue is accomplished after these processes.

Abstract    i
Table of Contents    ii
List of Figures    v
List of Tables    vii

Chapter 1 Introduction    1
1 Overview of 3D Reconstruction    1
2 Motivation    2
3 Thesis Organization    3

Chapter 2 Related Work    4
1 Monocular Depth Cues    4
1.1 Defocus    5
1.2 Linear Perspective    6
1.3 Texture Gradient    8
1.4 Occlusion    10
1.5 Relative Height    11
2 Pattern Recognition and Machine Learning    12
2.1 Make 3D: Markov Random Field (MRF) 3D Model    12
2.2 Recovering Major Occlusion Boundaries    14
3 Visual Attention Analysis    15

Chapter 3 Depth Information Reconstruction    18
1 Efficient Graph-based Image Segmentation    19
1.1 Graph Definition    19
1.2 Minimal Spanning Tree Based Method    19
2 Saliency Detection    21
2.1 Principles of Context-aware Saliency    22
2.2 Local-global Single-scale Saliency    22
2.3 Multi-scale Saliency Enhancement    23
3 Horizon Detection    24
3.1 GIST Descriptor    24
3.2 Gaussian Mixture Regression    25
4 Semantic Labeling and Surface Layout    26
4.1 Datasets    27
4.2 Semantic Labeling    27
4.3 Surface Layout    28
4.4 Cues Computation    28
4.5 Support Vector Machine    30
4.6 Rectification Option    31
5 Human Detection    32
5.1 Gradient Computation    33
5.2 Orientation Binning    33
5.3 Descriptor Blocks    34
5.4 Block Normalization    34
5.5 Classifier    34
5.6 Datasets    34
5.7 Combined with Semantic Labeling    35
6 Depth Estimation    36
6.1 Initial Row Depth    37
6.2 Outlier Removal    37
6.3 Initial Column Depth    38
6.4 Final Depth Fusion    38
6.5 Region-based Median Filter    38

Chapter 4 Analysis and Experimental Results    40
1 Machine Learning Analysis    40
1.1 Semantic Labeling Analysis    40
1.2 Surface Layout Analysis    43
2 Depthmap Results    43
3 Execution Time    48

Chapter 5 Conclusions and Future Works    51
1 Conclusions    51
2 Future works    52

Reference    53
                                

[1] Q. Wei, "Converting 2D to 3D: A Survey," Faculty of Electrical Engineering, Mathematics and Computer Science, The Netherlands, 2005.
[2] D. Scharstein and R. Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," International Journal of Computer Vision, pp. 7-42, 2002.
[3] C. L. Zitnick and T. Kanade, "A Cooperative Algorithm for Stereo Matching and Occlusion Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 675-684, 7 2000.
[4] Y. Xiong and S. A. SHafer, "Depth from Focusing and Defocusing," in Proceedings of Computer Vision and Pattern Recognition and Proceedings of Image Understanding Workshop, 1993.
[5] P. Li, D. Farin, R. K. Gunnewiek and P. H. N. De, "On Creating Depth Maps from Monoscopic Video using Structure from Motion," IEEE Workshop on Content Generation and Coding for 3D-television, 2006.
[6] S. Battiato, S. Curti, M. L. Cascia, M. Tortora and E. Scordato, "Depth-Map Generation by Image Classification," in Proc. of SPIE Electronic Imaging, 2004.
[7] S. Battiatoa, A. Caprab, S. Curtib and M. L. Casciac, "3D Stereoscopic Image Pairs by Depth-Map Generation," in Proc. of International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), 2004.
[8] A. Saxena, M. Sun and A. Y. Ng, "Make3D: Learning 3-D Scene Structure from a Single Still Image," in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2008.
[9] D. Hoiem, A. Efros and M. Hebert, "Recovering Occlusion Boundaries from an Image," International Journal of Computer Vision (IJCV), pp. 328-346, 2011.
[10] J. Kim, A. Baik, Y. J. Jung and D. Park, "2D-to-3D Conversion by Using Visual Attention Analysis," in Proc. of SPIE Electronic Imaging, 2010.
[11] K. T. Wong and F. Ernst, "Single Image Depth-from-Defocus," Delft university of Technology & Philips Natlab Research, Eindhoven, The Netherlands, 2004.
[12] V. Nedovic, A. W. M. Smeulders, A. Redert and J. M. Geusebroek, "Depth Estimation Via Stage Classification," in IEEE 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2008.
[13] J. Geusebroek and A. Smeulders, "A Six-Stimulus Theory for Stochastic Texture," International Journal of Computer Vision, pp. 7-16, 2005.
[14] M. Dimiccoli and P. Salembier, "Hierarchical region-based representation for segmentation and filtering with depth in single images," in Int. Conf. on Image Processing ICIP, 2009.
[15] G. Palou and P. Salembier, "Occlusion-based depth ordering on monocular images with Binary Partition Tree," in Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011.
[16] Y. J. Jung, A. Baik, J. Kim and D. Park, "A novel 2D-to-3D conversion technique based on relative height depth cue," in Proc. of SPIE, 2009.
[17] C.-C. Cheng, C.-T. Li and L.-G. Chen, "A Novel 2D-to-3D Conversion System Using Edge Information," IEEE Transactions on Consumer Electronics, pp. 1739-1745, 2010.
[18] A. Saxena, S. H. Chung and A. Y. Ng., "Learning Depth from Single Monocular Images," in Neural Information Processing Systems (NIPS) 18, 2005.
[19] D. Hoiem, "Seeing the World Behind the Image: Spatial Layout for 3D Scene Understanding," Doctoral Dissertation, CMU-RI-TR-07-28, Robotics Institute, Carnegie Mellon University, 2007.
[20] P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient Graph-Based Image Segmentation," International Journal of Computer Vision, 2004.
[21] S. Goferman, L. Zelnik-Manor and A. Tal, "Context-Aware Saliency Detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
[22] P. S. A. Torralba, "Statistical Context Priming for Object Detection," in International Conference on Computer Vision (ICCV), 2001.
[23] A. Oliva and A. Torralba, "Modeling the shape of the scene: a holistic representation of the spatial envelope," International Journal of Computer Vision (IJCV), 2001.
[24] N. Gershnfeld, The Nature of Mathematical Modeling, Cambridge university press, 1999.
[25] B. C. Russell, A. Torralba, K. P. Murphy and W. T. Freeman, "LabelMe: a database and web-based tool for image annotation," International Journal of Computer Vision, pp. 157-173, 2008.
[26] F. Korč and W. Förstner, "eTRIMS Image Database for Interpreting Images of Man-Made Scenes," Technical report TR-IGG-P-2009-01, University of Bonn, Dept. of Photogrammetry, 2009.
[27] F. Björn, R. Erik and D. Joachim, "A Fast Approach for Pixelwise Labeling of Facade Images," in International Conference on Pattern Recognition (ICPR), 2010.
[28] T. Leung and J. Malik, "Representing and recognizing the visual appearance of materials using three-dimensional textons," International Journal of Computer Vision (IJCV), 2001.
[29] B. E. Boser, I. Guyon and V. Vapnik, "A training algorithm for optimal margin," in Proceedings of the Fifth Annual Workshop on Computational Learning, 1992.
[30] C. Cortes and V. Vapnik, Support-vector network, Machine Learning, 1995.
[31] T.-F. Wu, C.-J. Lin and R. C. Weng, "Probability Estimates for Multi-class Classiﬁcation by Pairwise Coupling," Journal of Machine Learning Research, 2004.
[32] P. Refregier and F. Vallet, "Probabilistic approach for multiclass classiﬁcation with neural networks," in Proceedings of International Conference on Artiﬁcial Networks, 1991.
[33] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文