簡易檢索 / 詳目顯示

研究生: 黃聖龍
Huang, Sheng-Long
論文名稱: 基於一致多視圖產生和MAT雙結構骨架的單視圖端對端三維物件重建
End-to-End 3D Object Reconstruction from Single Images using Consistent Multi-View Generation and MAT Dual-Structure Skeleton
指導教授: 李哲榮
Lee, Che-Rung
口試委員: 李潤容
Lee, Ruen-Rone
朱宏國
Chu, Hung-Kuo
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 44
中文關鍵詞: 三維重建深度學習多視圖擴散模型中軸變換
外文關鍵詞: 3D Reconstruction, Deep Learning, Multi-View Diffusion Model, MedialAxis Transformation
相關次數: 點閱:51下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著深度學習的發展和3D形狀數據集的普及,神經網絡能夠有效地編碼圖像中的隱藏結構信息,在3D重建領域取得了顯著成功。然而,現有的方法往往無法捕捉幾何細節,導致精度不足和魯棒性欠缺,從而導致模糊或變形。在本文中,我們提出了一種創新的單視圖3D重建方法,稱為EDSNet,該方法模擬設計師的建模方法,並使用多視圖圖像作為輔助媒介集成前饋生成技術。首先,EDSNet將圖像中的對象分為精細結構和粗略結構,分別學習和優化這兩種結構的MAT表示。接下來,它生成結構的輪廓以構建其形狀。為了識別可見和不可見部分的3D幾何形狀,我們設計了一種跨視圖注意機制,以在多個視圖之間交換跨視圖信息,從而增強機器從單一視圖理解對象部分的能力。實驗結果表明,在ShapeNet數據集上的評估指標如F-Score、Chamfer距離和Volume Intersection over Union(Volume IoU)方面,相較於現有方法有顯著提升。具體而言,EDSNet實現了0.8986的F-Score、0.0223的Chamfer距離和0.5975的Volume IoU,優於大多數最先進的方法。


    With the development of deep learning and the proliferation of 3D shape datasets, neural networks have been able to effectively encode the hidden structural information in images, achieving significant success in the field of 3D reconstruction. However, existing methods often fail to capture geometric details, resulting in insufficient accuracy and a lack of robustness, leading to blurring or distortion. In this paper, we proposed an innovative single-view 3D reconstruction method, called EDSNet, that mimics the modeling approach of designers and integrates feedforward generation techniques using multi-view images as auxiliary media. First, EDSNet divides the objects in an image into fine structure and coarse structure, learning and optimizing MAT representation for these two structures separately. Next, it generates the contour of structures to build their shapes. To recognizing the 3D geometric shapes of both visible and invisible parts, we designed a cross-view attention mechanism to exchange cross-view information among multiple views, which enhances the machine's understanding of the parts of an object from a single view. Experimental results on the ShapeNet dataset demonstrate significant improvements over existing methods in evaluation metrics such as F-Score, Chamfer distance, and Volume Intersection over Union (Volume IoU). Specifically, EDSNet achieves an F-Score of 0.8986, a Chamfer distance of 0.0223, and a Volume IoU of 0.5975, which outperforms most state-of-the-art methods.

    中文摘要 1 Abstract 2 List of Figures 5 List of Tables 6 Chapter 1 Introduction 7 Chapter 2 Related work 11 2.1 Single-View 3D Reconstruction Based on Deep Learning 11 2.1.1 Voxel Representation 11 2.1.2 Point Cloud Representation 12 2.1.3 Implicit Function Representation 13 2.1.4 Mesh Representation 13 2.2 Medial-Axis Transformation and 3D Shapes 14 2.3 3D Reconstruction Using 2D Prior Diffusion Models 16 Chapter 3 Method 18 3.1 Overview 18 3.2 Multi-View Generation Using Diffusion Models 20 3.3 Cross-View Attention 21 3.4 Reconstructing Medial Spheres Skeleton from RGB Images 24 3.5 Dual-Structure Parallel Learning Module 26 3.6 Connectivity Generation 29 Chapter 4 Experiments 30 4.1 Dataset and Implementation details 30 4.2 Qualitative Results 31 4.3 Comparison to Other Models 33 4.4 Ablation Study 36 Chapter 5 Conclusion and Future Work 38 References 41

    [1]Shi, R., et al. Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model. 2023. arXiv:2310.15110 DOI: 10.48550/arXiv.2310.15110.
    [2]Li, P., et al., Q-MAT: Computing Medial Axis Transform By Quadratic Error Minimization. ACM Trans. Graph., 2016. 35(1): p. Article 8.
    [3]Chang, A.X., et al. ShapeNet: An Information-Rich 3D Model Repository. 2015. arXiv:1512.03012 DOI: 10.48550/arXiv.1512.03012.
    [4]Lin, Chen-Hsuan, et al. "Magic3d: High-resolution text-to-3d content creation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
    [5]Liu, Ruoshi, et al. "Zero-1-to-3: Zero-shot one image to 3d object." Proceedings of the IEEE/CVF international conference on computer vision. 2023.
    [6]Liu, Y., et al. SyncDreamer: Generating Multiview-consistent Images from a Single-view Image. 2023. arXiv:2309.03453 DOI: 10.48550/arXiv.2309.03453.
    [7]Liu, Minghua, et al. "One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization." Advances in Neural Information Processing Systems 36 (2024).
    [8]Ji, S., et al., 3D Convolutional Neural Networks for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013. 35(1): p. 221-231.
    [9]Lorensen, W.E. and H.E. Cline, Marching cubes: A high resolution 3D surface construction algorithm. SIGGRAPH Comput. Graph., 1987. 21(4): p. 163–169.
    [10]Choy, Christopher B., et al. "3d-r2n2: A unified approach for single and multi-view 3d object reconstruction." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. Springer International Publishing, 2016.
    [11]Wu, Jiajun, et al. "Marrnet: 3d shape reconstruction via 2.5 d sketches." Advances in neural information processing systems 30 (2017).
    [12]Tatarchenko, Maxim, Alexey Dosovitskiy, and Thomas Brox. "Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs." Proceedings of the IEEE international conference on computer vision. 2017.
    [13]Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [14]Qi, Charles Ruizhongtai, et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space." Advances in neural information processing systems 30 (2017).
    [15]Li, C.-L., et al. Point Cloud GAN. 2018. arXiv:1810.05795 DOI: 10.48550/arXiv.1810.05795.
    [16]Sun, Yongbin, et al. "Pointgrow: Autoregressively learned point cloud generation with self-attention." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020.
    [17]Xu, Qiangeng, et al. "Disn: Deep implicit surface network for high-quality single-view 3d reconstruction." Advances in neural information processing systems 32 (2019).
    [18]Li, Yangyan, et al. "Fpnn: Field probing neural networks for 3d data." Advances in neural information processing systems 29 (2016).
    [19]Jun, H. and A. Nichol Shap-E: Generating Conditional 3D Implicit Functions. 2023. arXiv:2305.02463 DOI: 10.48550/arXiv.2305.02463.
    [20]Wang, Nanyang, et al. "Pixel2mesh: Generating 3d mesh models from single rgb images." Proceedings of the European conference on computer vision (ECCV). 2018.
    [21]Xu, Gang, et al. "Temporal modulation network for controllable space-time video super-resolution." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
    [22]Lin, Cheng, et al. "Point2skeleton: Learning skeletal representations from point clouds." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
    [23]Tang, Jiapeng, et al. "A skeleton-bridged deep learning approach for generating meshes of complex topologies from single rgb images." Proceedings of the ieee/cvf conference on computer vision and pattern recognition. 2019.
    [24]Hu, J., et al., IMMAT: Mesh Reconstruction from Single View Images by Medial Axis Transform Prediction. Comput. Aided Des., 2022. 150(C): p. 12.
    [25]Hu, J., et al., S3DS: Self-supervised Learning of 3D Skeletons from Single View Images, in Proceedings of the 31st ACM International Conference on Multimedia. 2023, Association for Computing Machinery: Ottawa ON, Canada. p. 6948–6958.
    [26]Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021.
    [27]Poole, B., et al. DreamFusion: Text-to-3D using 2D Diffusion. 2022. arXiv:2209.14988 DOI: 10.48550/arXiv.2209.14988.
    [28]Ramesh, Aditya, et al. "Zero-shot text-to-image generation." International conference on machine learning. Pmlr, 2021.
    [29]Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
    [30]He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [31]Liu, Shichen, et al. "Soft rasterizer: A differentiable renderer for image-based 3d reasoning." Proceedings of the IEEE/CVF international conference on computer vision. 2019.
    [32]Melas-Kyriazi, Luke, et al. "Realfusion: 360deg reconstruction of any object from a single image." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.
    [33]Qian, G., et al. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. 2023. arXiv:2306.17843 DOI: 10.48550/arXiv.2306.17843.

    QR CODE