簡易檢索 / 詳目顯示

研究生: 張耕維
Chang, Keng-Wei
論文名稱: 基於關鍵幀的3D高斯飛濺方法在單目圖像序列中的應用
KeyGS: A Keyframe-Centric Gaussian Splatting Method For Monocular Image Sequences
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 許秋婷
Hsu, Chiu-Ting
朱宏國
Chu, Hung-Kuo
徐繼聖
Hsu, Gee-Sern
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 37
中文關鍵詞: 高斯飛濺單目圖像3D重建
外文關鍵詞: 3DGS, Gaussian Splatting, Monocular Image
相關次數: 點閱:44下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 從稀疏的二維圖像中重建高質量的三維模型在計算機視覺領域引起了廣泛關
    注。最近,三維高斯噴射 (3DGS) 因其顯式表示、高效的訓練速度和實時渲
    染能力而備受矚目。然而,現有方法在重建過程中仍然嚴重依賴準確的相機
    姿態。儘管一些新近的方法嘗試在不經過結運動回復結構 (SfM) 預處理的情
    況下,直接從單目視頻數據集中訓練 3DGS 模型,但這些方法的訓練時間較
    長,使得它們在許多應用場景中不太實用。
    在本文中,我們提出了一個無需深度資訊或匹配模型的高效框架。我們
    的方法首先使用 SfM 快速獲取粗略的相機姿態,僅需幾秒鐘,然後通過利
    用 3DGS 中的稠密表示來優化這些姿態。該框架有效解決了長時間訓練的問
    題。此外,我們將稠密化過程與聯合優化相結合,提出了一種從粗到細的頻
    率感知稠密化方法,以重建不同層次的細節。此方法可以防止相機姿態估計
    陷入局部極小值或由於高頻信號而產生漂移。與以往方法相比,我們的方法
    顯著減少了訓練時間,從數小時縮短到數分鐘,同時在新視角合成和相機姿
    態估計方面取得了更高的準確性。


    Reconstructing high-quality 3D models from sparse 2D images has garnered significant attention in computer vision. Recently, 3D Gaussian Splatting (3DGS) has
    gained prominence due to its explicit representation with efficient training speed and
    real-time rendering capabilities. However, existing methods still heavily depend on
    accurate camera poses for reconstruction. Although some recent approaches attempt
    to train 3DGS models without the Structure-from-Motion (SfM) preprocessing
    from monocular video datasets, these methods suffer from prolonged training times,
    making them impractical for many applications.
    In this paper, we present an efficient framework that operates without any depth
    or matching model. Our approach initially uses SfM to quickly obtain rough camera poses within seconds, and then refines these poses by leveraging the dense representation in 3DGS. This framework effectively addresses the issue of long training
    times. Additionally, we integrate the densification process with joint refinement and
    propose a coarse-to-fine frequency-aware densification to reconstruct different
    levels of details. This approach prevents camera pose estimation from being trapped
    in local minima or drifting due to high-frequency signals. Our method significantly
    reduces training time from hours to minutes while achieving more accurate novel
    view synthesis and camera pose estimation compared to previous methods.

    1 Introduction 1 1.1 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Related Work 4 2.1 Novel view synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Neural Radiance Field . . . . . . . . . . . . . . . . . . . . 4 2.1.2 3D Gaussian Splatting . . . . . . . . . . . . . . . . . . . . 4 2.2 Structure From Motion . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Joint Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3.1 Joint Refine Noise Pose . . . . . . . . . . . . . . . . . . . 5 2.3.2 Unpose joint Refinment . . . . . . . . . . . . . . . . . . . 6 2.4 Coarse-to-Fine Stategy . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4.1 BARF: Bundle-Adjusting Neural Radiance Fields . . . . . . 6 2.4.2 Joint TensoRF . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Methodology 8 3.1 Preliminary: 3D Gaussian Splatting . . . . . . . . . . . . . . . . . 8 3.2 KeyGS Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Joint Refinement and Densification . . . . . . . . . . . . . . . . . . 10 3.3.1 Relationship Establishment . . . . . . . . . . . . . . . . . . 10 3.3.2 Signal Alignment And Gradient . . . . . . . . . . . . . . . 11 3.4 Coarse-to-Fine Frequency-Aware Densificaion . . . . . . . . . . . 13 3.4.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4.2 Smooth Signal . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.6 Implementation Detail . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Experiment 17 4.1 Evaluation And Metrics . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 Tanks And Temples Dataset . . . . . . . . . . . . . . . . . . . . . 20 4.3.1 Quantitative Comparison . . . . . . . . . . . . . . . . . . . 20 4.3.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4 CO3DV2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4.1 Quantitative Comparison . . . . . . . . . . . . . . . . . . . 24 4.4.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.5.1 Significance of Key Components . . . . . . . . . . . . . . 29 4.5.2 Impact of Camera Pose Refinement . . . . . . . . . . . . . 29 4.5.3 Impact of Keyframe Interval . . . . . . . . . . . . . . . . . 31 4.5.4 Better Performance for Joint Refinement . . . . . . . . . . 32 4.6 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Conclusion 34 References 35

    [1] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting
    for real-time radiance field rendering.,” vol. 42, pp. 139–1, 2023.
    [2] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and
    R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” vol. 65, pp. 99–106, ACM New York, NY, USA, 2021.
    [3] V. Edavamadathil Sivaram, T.-M. Li, and R. Ramamoorthi, “Neural geometry
    fields for meshes,” in ACM SIGGRAPH 2024 Conference Papers, pp. 1–11,
    2024.
    [4] P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” 2021.
    [5] Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli, and U. Neumann, “Pointnerf: Point-based neural radiance fields,” in Proceedings of the IEEE/CVF
    conference on computer vision and pattern recognition, pp. 5438–5448, 2022.
    [6] A. Tagliasacchi and B. Mildenhall, “Volume rendering digest (for nerf),” 2022.
    [7] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
    pp. 4104–4113, 2016.
    [8] C.-H. Lin, W.-C. Ma, A. Torralba, and S. Lucey, “Barf: Bundle-adjusting neural radiance fields,” in Proceedings of the IEEE/CVF international conference
    on computer vision, pp. 5741–5751, 2021.
    [9] B.-Y. Chen, W.-C. Chiu, and Y.-L. Liu, “Improving robustness for joint optimization of camera pose and decomposed low-rank tensorial radiance fields,”
    in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38,
    pp. 990–1000, 2024.
    [10] Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, and J. Park, “Selfcalibrating neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5846–5854, 2021.
    [11] Y. Fu, S. Liu, A. Kulkarni, J. Kautz, A. A. Efros, and X. Wang, “Colmap-free
    3d gaussian splatting,” 2023.
    [12] S. Liu, S. Lin, J. Lu, A. Supikov, and M. Yip, “Baa-ngp: Bundle-adjusting
    accelerated neural graphics primitives,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 850–857, 2024.
    [13] H. Heo, T. Kim, J. Lee, J. Lee, S. Kim, H. J. Kim, and J.-H. Kim, “Robust
    camera pose refinement for multi-resolution hash encoding,” in International
    Conference on Machine Learning, pp. 13000–13016, PMLR, 2023.
    [14] K. Park, P. Henzler, B. Mildenhall, J. T. Barron, and R. Martin-Brualla, “Camp:
    Camera preconditioning for neural radiance fields,” vol. 42, pp. 1–11, ACM
    New York, NY, USA, 2023.
    [15] Y. Chen, X. Chen, X. Wang, Q. Zhang, Y. Guo, Y. Shan, and F. Wang, “Localto-global registration for bundle-adjusting neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8264–8273, 2023.
    [16] Y. Shi, D. Rong, B. Ni, C. Chen, and W. Zhang, “Garf: Geometry-aware generalized neural radiance field,” 2022.
    [17] A. Meuleman, Y.-L. Liu, C. Gao, J.-B. Huang, C. Kim, M. H. Kim, and J. Kopf,
    “Progressively optimized local radiance fields for robust view synthesis,” in
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
    Recognition, pp. 16539–16548, 2023.
    [18] W. Bian, Z. Wang, K. Li, J.-W. Bian, and V. A. Prisacariu, “Nope-nerf:
    Optimising neural radiance field with no pose prior,” in Proceedings of
    the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
    pp. 4160–4169, 2023.
    [19] Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu, “Nerf–: Neural radiance fields without known camera parameters,” 2021.
    [20] Y. Lin, T. Müller, J. Tremblay, B. Wen, S. Tyree, A. Evans, P. A. Vela, and
    S. Birchfield, “Parallel inversion of neural radiance fields for robust pose estimation,” in 2023 IEEE International Conference on Robotics and Automation
    (ICRA), pp. 9377–9384, IEEE, 2023.
    [21] E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and
    positioning in real-time,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6229–6238, 2021.
    [22] C. Yan, D. Qu, D. Xu, B. Zhao, Z. Wang, D. Wang, and X. Li, “Gs-slam: Dense
    visual slam with 3d gaussian splatting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19595–19604, 2024.
    [23] Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and
    M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12786–12796, 2022.
    [24] A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance
    fields,” in European conference on computer vision, pp. 333–350, Springer,
    2022.
    [25] R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 12179–12188, 2021.
    [26] T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” vol. 41, pp. 1–15, ACM New
    York, NY, USA, 2022.
    [27] A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “Plenoctrees for
    real-time rendering of neural radiance fields,” in Proceedings of the IEEE/CVF
    International Conference on Computer Vision, pp. 5752–5761, 2021.
    [28] C. Sun, M. Sun, and H.-T. Chen, “Direct voxel grid optimization: Superfast convergence for radiance fields reconstruction,” in Proceedings of the
    IEEE/CVF conference on computer vision and pattern recognition, pp. 5459–
    5469, 2022.
    [29] B. Kerbl, A. Meuleman, G. Kopanas, M. Wimmer, A. Lanvin, and G. Drettakis,
    “A hierarchical 3d gaussian representation for real-time rendering of very large
    datasets,” vol. 43, pp. 1–15, ACM New York, NY, USA, 2024.
    [30] M. Zwicker, H. Pfister, J. van Baar, and M. Gross, “Ewa splatting,” IEEE
    Transactions on Visualization and Computer Graphics, vol. 8, pp. 223–238,
    07/2002-09/2002 2002.
    [31] T. Xie, Z. Zong, Y. Qiu, X. Li, Y. Feng, Y. Yang, and C. Jiang, “Physgaussian: Physics-integrated 3d gaussians for generative dynamics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4389–4398, 2024.
    [32] Z. Ye, W. Li, S. Liu, P. Qiao, and Y. Dou, “Absgs: Recovering fine details in
    3d gaussian splatting,” in ACM Multimedia 2024, 2024.
    [33] V. Ye and A. Kanazawa, “Mathematical supplement for the gsplat library,”
    2023.
    [34] Z. Yu, A. Chen, B. Huang, T. Sattler, and A. Geiger, “Mip-splatting: Aliasfree 3d gaussian splatting,” in Proceedings of the IEEE/CVF Conference on
    Computer Vision and Pattern Recognition (CVPR), pp. 19447–19456, June
    2024.
    [35] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and temples: Benchmarking large-scale scene reconstruction,” vol. 36, pp. 1–13, ACM New York,
    NY, USA, 2017.
    [36] J. Reizenstein, R. Shapovalov, P. Henzler, L. Sbordone, P. Labatut, and
    D. Novotny, “Common objects in 3d: Large-scale learning and evaluation of
    real-life 3d category reconstruction,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10901–10911, 2021.
    [37] S. Kheradmand, D. Rebain, G. Sharma, W. Sun, J. Tseng, H. Isack, A. Kar,
    A. Tagliasacchi, and K. M. Yi, “3d gaussian splatting as markov chain monte
    carlo,” arXiv preprint arXiv:2404.09591, 2024.
    [38] P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “LightGlue: Local Feature
    Matching at Light Speed,” in ICCV, 2023.
    [39] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “LoFTR: Detector-free local
    feature matching with transformers,” CVPR, 2021.

    QR CODE