簡易檢索 / 詳目顯示

研究生: 謝沛錫
Xie, Pei-Xi
論文名稱: MVRNet:基於單視角的著衣人體重建之多模態體積表示網絡
MVRNet: Multimodal Volumetric Representation Network for Monocular Clothed Human Reconstruction
指導教授: 李祈均
Lee, Chi-Chun
口試委員: 胡敏君
Hu, Min-Chun
林奕成
Lin, I-Chen
黃敬群
Huang, Ching-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 33
中文關鍵詞: 單視角著衣人體重建三維電腦視覺多模態視覺參數化身體模型
外文關鍵詞: Monocular Clothed Human Reconstruction, 3D Computer Vision, Multi-modal Vision, Parametric body models
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 從單一影像重建三維人體模型是一項對於人機交互影響的重要挑戰性任務,特別是在虛擬人體可動化身領域。目前最先進的方法通常依賴多層級聯模型以實現更高分辨率的二維表示,或者利用多人線性蒙皮(Skinned Multi-Person Linear, SMPL)模型來推斷三維關係。然而,這些方法可能耗時,而且對SMPL模型的重度依賴可能導致不盡人意的結果,特別是對於寬鬆服裝。為了克服這些限制,我們引入了多模態體積表示網絡(MVRNet),這是一種新型的深度神經網絡,它採用弱線性參數模型條件和多模態特徵融合,以改進從單一的二維圖像中獲得之三維表示。關鍵是我們的模型僅在公開可用的數據集上進行訓練,並在多樣的數據集和現實場景中進行了廣泛的評估,顯示出與最先進方法相比,它在準確性、穩健性和泛化能力方面具有優越的性能。我們的研究有助於發展更穩健的三維著衣人體重建,特別是在具有挑戰性的姿勢中,具有在AR/VR、動畫與電影製作以及娛樂行業等領域的潛在應用。


    Reconstructing 3D human models from single images is a challenging task with significant implications for human-computer interaction, particularly in the realm of virtual avatars. State-of-the-art methods often rely on cascading models to achieve higher-resolution 2D representations or leverage the SMPL model to infer 3D relationships. However, these approaches can be time-consuming, and the heavy reliance on the SMPL model can lead to suboptimal results, especially for clothing with loose fits. To overcome these limitations, we introduce Multimodal Volumetric Representation Network (MVRNet), a novel deep neural network that employs weak parametric model-conditioned and multimodal feature fusion to improve 3D representation from single 2D images. Crucially, our model is trained solely on publicly available datasets, and we conduct extensive evaluations on diverse datasets and in-the-wild scenarios, demonstrating its superior performance in terms of accuracy, robustness, and generalization ability compared to state-of-the-art methods. Our research contributes to the advancement of robust 3D clothed human reconstruction, particularly in challenging poses, with potential applications in AR/VR, animation and film production, and entertainment industries.

    誌謝 i 摘要 iii Abstract v 1 Introduction 1 2 Related Works 5 2.1 Single-view Human Reconstruction 5 2.2 Feature Encoder of Implicit Function 6 3 Surface Representation 7 3.1 Implicit Function 7 3.2 Parametric body model 8 4 Method 9 4.1 Clothed Normal Estimation 9 4.2 MVRNet 10 5 Experiments 15 5.1 Datasets 15 5.2 Network Architecture 16 5.3 Training Details 18 5.4 Evaluations 18 6 Discussion 21 6.1 Ablation Study 21 6.2 Inference Time Consumption 22 6.3 Limitation 22 7 Conclusion 25 A Supplementary 27 A.1 Implicit Function 27 A.2 Training details 27 A.3 SAIL-VOS 3D 28 References 31

    [1] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,” in Seminal Graphics Papers: Pushing the Boundaries, Vol- ume 2, pp. 851–866, 2015.

    [2] S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, and H. Li, “Pifu: Pixel- aligned implicit function for high-resolution clothed human digitization,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 2304–2314, 2019.

    [3] Z. Zheng, T. Yu, Y. Wei, Q. Dai, and Y. Liu, “Deephuman: 3d human reconstruction from a single image,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7739–7749, 2019.

    [4] X. Zhao, Y.-T. Hu, Z. Ren, and A. G. Schwing, “Occupancy planes for single-view RGB-d human reconstruction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 3633–3641, 2023. Issue: 3.

    [5] Z. Zheng, T. Yu, Y. Liu, and Q. Dai, “Pamir: Parametric model-conditioned implicit repre- sentation for image-based human reconstruction,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 6, pp. 3170–3184, 2021. Publisher: IEEE.

    [6] Y. Xiu, J. Yang, D. Tzionas, and M. J. Black, “Icon: Implicit clothed humans obtained from normals,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13286–13296, IEEE, 2022.

    [7] A. S. Jackson, C. Manafas, and G. Tzimiropoulos, “3d human body reconstruction from a single image via volumetric regression,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0, 2018.

    [8] X. Ma, J. Su, C. Wang, W. Zhu, and Y. Wang, “3D Human Mesh Estimation from Virtual Markers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 534–543, 2023.

    [9] G. Pavlakos, V. Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, “Expressive body capture: 3d hands, face, and body from a single image,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
    pp. 10975–10985, 2019.

    [10] T. Yu, Z. Zheng, K. Guo, P. Liu, Q. Dai, and Y. Liu, “Function4d: Real-time human volu- metric capture from very sparse consumer rgbd sensors,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5746–5756, 2021.

    [11] Q. Ma, J. Yang, A. Ranjan, S. Pujades, G. Pons-Moll, S. Tang, and M. J. Black, “Learning to dress 3d people in generative clothing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6469–6478, 2020.

    [12] Z. Su, T. Yu, Y. Wang, and Y. Liu, “Deepcloth: Neural garment representation for shape and style editing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1581–1593, 2022. Publisher: IEEE.

    [13] S. Saito, T. Simon, J. Saragih, and H. Joo, “Pifuhd: Multi-level pixel-aligned implicit function for high-resolution 3d human digitization,” in Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pp. 84–93, 2020.

    [14] M. Pesavento, M. Volino, and A. Hilton, “Super-resolution 3d human shape from a sin- gle low-resolution image,” in European Conference on Computer Vision, pp. 447–464, Springer, 2022.

    [15] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estima- tion,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 483–499, Springer, 2016.

    [16] K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514, 2019.

    [17] Y. Feng, V. Choutas, T. Bolkart, D. Tzionas, and M. J. Black, “Collaborative regression of expressive bodies using moderation,” in 2021 International Conference on 3D Vision (3DV), pp. 792–804, IEEE, 2021.

    [18] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger, “Occupancy net- works: Learning 3d reconstruction in function space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4460–4470, 2019.

    [19] R. Li, Y. Xiu, S. Saito, Z. Huang, K. Olszewski, and H. Li, “Monocular real-time volumet- ric performance capture,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 49–67, Springer,
    2020.

    [20] W. E. Lorensen and H. E. Cline, “Marching cubes: A high resolution 3D surface construc- tion algorithm,” in Seminal graphics: pioneering efforts that shaped the field, pp. 347– 353, 1998.

    [21] Y. Wu and K. He, “Group normalization,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.

    [22] L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks us- ing large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications, vol. 11006, pp. 369–386, SPIE, 2019.

    [23] C. Sminchisescu and B. Triggs, “Building roadmaps of local minima of visual models,” in Computer Vision—ECCV 2002: 7th European Conference on Computer Vision Copen- hagen, Denmark, May 28–31, 2002 Proceedings, Part I 7, pp. 566–582, Springer, 2002.

    [24] Y.-T. Hu, J. Wang, R. A. Yeh, and A. G. Schwing, “Sail-vos 3d: A synthetic dataset and baselines for object detection and 3d mesh reconstruction from video data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1418– 1428, 2021.

    QR CODE