基於關鍵幀的3D高斯飛濺方法在單目圖像序列中的應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	張耕維 Chang, Keng-Wei
論文名稱：	基於關鍵幀的3D高斯飛濺方法在單目圖像序列中的應用 KeyGS: A Keyframe-Centric Gaussian Splatting Method For Monocular Image Sequences
指導教授：	賴尚宏 Lai, Shang-Hong
口試委員:	許秋婷 Hsu, Chiu-Ting 朱宏國 Chu, Hung-Kuo 徐繼聖 Hsu, Gee-Sern
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2024
畢業學年度：	113
語文別：	英文
論文頁數：	37
中文關鍵詞：	高斯飛濺、單目圖像、3D重建
外文關鍵詞：	3DGS, Gaussian Splatting, Monocular Image
相關次數：	點閱：44 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

從稀疏的二維圖像中重建高質量的三維模型在計算機視覺領域引起了廣泛關
注。最近，三維高斯噴射 (3DGS) 因其顯式表示、高效的訓練速度和實時渲
染能力而備受矚目。然而，現有方法在重建過程中仍然嚴重依賴準確的相機
姿態。儘管一些新近的方法嘗試在不經過結運動回復結構 (SfM) 預處理的情
況下，直接從單目視頻數據集中訓練 3DGS 模型，但這些方法的訓練時間較
長，使得它們在許多應用場景中不太實用。
在本文中，我們提出了一個無需深度資訊或匹配模型的高效框架。我們
的方法首先使用 SfM 快速獲取粗略的相機姿態，僅需幾秒鐘，然後通過利
用 3DGS 中的稠密表示來優化這些姿態。該框架有效解決了長時間訓練的問
題。此外，我們將稠密化過程與聯合優化相結合，提出了一種從粗到細的頻
率感知稠密化方法，以重建不同層次的細節。此方法可以防止相機姿態估計
陷入局部極小值或由於高頻信號而產生漂移。與以往方法相比，我們的方法
顯著減少了訓練時間，從數小時縮短到數分鐘，同時在新視角合成和相機姿
態估計方面取得了更高的準確性。

Reconstructing high-quality 3D models from sparse 2D images has garnered significant attention in computer vision. Recently, 3D Gaussian Splatting (3DGS) has
gained prominence due to its explicit representation with efficient training speed and
real-time rendering capabilities. However, existing methods still heavily depend on
accurate camera poses for reconstruction. Although some recent approaches attempt
to train 3DGS models without the Structure-from-Motion (SfM) preprocessing
from monocular video datasets, these methods suffer from prolonged training times,
making them impractical for many applications.
In this paper, we present an efficient framework that operates without any depth
or matching model. Our approach initially uses SfM to quickly obtain rough camera poses within seconds, and then refines these poses by leveraging the dense representation in 3DGS. This framework effectively addresses the issue of long training
times. Additionally, we integrate the densification process with joint refinement and
propose a coarse-to-fine frequency-aware densification to reconstruct different
levels of details. This approach prevents camera pose estimation from being trapped
in local minima or drifting due to high-frequency signals. Our method significantly
reduces training time from hours to minutes while achieving more accurate novel
view synthesis and camera pose estimation compared to previous methods.

Introduction 1
1 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Related Work 4
1 Novel view synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Neural Radiance Field . . . . . . . . . . . . . . . . . . . . 4
1.2 3D Gaussian Splatting . . . . . . . . . . . . . . . . . . . . 4
2 Structure From Motion . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Joint Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 Joint Refine Noise Pose . . . . . . . . . . . . . . . . . . . 5
3.2 Unpose joint Refinment . . . . . . . . . . . . . . . . . . . 6
4 Coarse-to-Fine Stategy . . . . . . . . . . . . . . . . . . . . . . . . 6
4.1 BARF: Bundle-Adjusting Neural Radiance Fields . . . . . . 6
4.2 Joint TensoRF . . . . . . . . . . . . . . . . . . . . . . . . 6
Methodology 8
1 Preliminary: 3D Gaussian Splatting . . . . . . . . . . . . . . . . . 8
2 KeyGS Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Joint Refinement and Densification . . . . . . . . . . . . . . . . . . 10
3.1 Relationship Establishment . . . . . . . . . . . . . . . . . . 10
3.2 Signal Alignment And Gradient . . . . . . . . . . . . . . . 11
4 Coarse-to-Fine Frequency-Aware Densificaion . . . . . . . . . . . 13
4.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Smooth Signal . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Implementation Detail . . . . . . . . . . . . . . . . . . . . . . . . 15
Experiment 17
1 Evaluation And Metrics . . . . . . . . . . . . . . . . . . . . . . . 17
2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Tanks And Temples Dataset . . . . . . . . . . . . . . . . . . . . . 20
3.1 Quantitative Comparison . . . . . . . . . . . . . . . . . . . 20
3.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 CO3DV2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Quantitative Comparison . . . . . . . . . . . . . . . . . . . 24
4.2 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1 Significance of Key Components . . . . . . . . . . . . . . 29
5.2 Impact of Camera Pose Refinement . . . . . . . . . . . . . 29
5.3 Impact of Keyframe Interval . . . . . . . . . . . . . . . . . 31
5.4 Better Performance for Joint Refinement . . . . . . . . . . 32
6 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Conclusion 34
References 35
                                

[1] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting
for real-time radiance field rendering.,” vol. 42, pp. 139–1, 2023.
[2] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and
R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” vol. 65, pp. 99–106, ACM New York, NY, USA, 2021.
[3] V. Edavamadathil Sivaram, T.-M. Li, and R. Ramamoorthi, “Neural geometry
fields for meshes,” in ACM SIGGRAPH 2024 Conference Papers, pp. 1–11,
2024.
[4] P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” 2021.
[5] Q. Xu, Z. Xu, J. Philip, S. Bi, Z. Shu, K. Sunkavalli, and U. Neumann, “Pointnerf: Point-based neural radiance fields,” in Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pp. 5438–5448, 2022.
[6] A. Tagliasacchi and B. Mildenhall, “Volume rendering digest (for nerf),” 2022.
[7] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 4104–4113, 2016.
[8] C.-H. Lin, W.-C. Ma, A. Torralba, and S. Lucey, “Barf: Bundle-adjusting neural radiance fields,” in Proceedings of the IEEE/CVF international conference
on computer vision, pp. 5741–5751, 2021.
[9] B.-Y. Chen, W.-C. Chiu, and Y.-L. Liu, “Improving robustness for joint optimization of camera pose and decomposed low-rank tensorial radiance fields,”
in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38,
pp. 990–1000, 2024.
[10] Y. Jeong, S. Ahn, C. Choy, A. Anandkumar, M. Cho, and J. Park, “Selfcalibrating neural radiance fields,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5846–5854, 2021.
[11] Y. Fu, S. Liu, A. Kulkarni, J. Kautz, A. A. Efros, and X. Wang, “Colmap-free
3d gaussian splatting,” 2023.
[12] S. Liu, S. Lin, J. Lu, A. Supikov, and M. Yip, “Baa-ngp: Bundle-adjusting
accelerated neural graphics primitives,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 850–857, 2024.
[13] H. Heo, T. Kim, J. Lee, J. Lee, S. Kim, H. J. Kim, and J.-H. Kim, “Robust
camera pose refinement for multi-resolution hash encoding,” in International
Conference on Machine Learning, pp. 13000–13016, PMLR, 2023.
[14] K. Park, P. Henzler, B. Mildenhall, J. T. Barron, and R. Martin-Brualla, “Camp:
Camera preconditioning for neural radiance fields,” vol. 42, pp. 1–11, ACM
New York, NY, USA, 2023.
[15] Y. Chen, X. Chen, X. Wang, Q. Zhang, Y. Guo, Y. Shan, and F. Wang, “Localto-global registration for bundle-adjusting neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8264–8273, 2023.
[16] Y. Shi, D. Rong, B. Ni, C. Chen, and W. Zhang, “Garf: Geometry-aware generalized neural radiance field,” 2022.
[17] A. Meuleman, Y.-L. Liu, C. Gao, J.-B. Huang, C. Kim, M. H. Kim, and J. Kopf,
“Progressively optimized local radiance fields for robust view synthesis,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 16539–16548, 2023.
[18] W. Bian, Z. Wang, K. Li, J.-W. Bian, and V. A. Prisacariu, “Nope-nerf:
Optimising neural radiance field with no pose prior,” in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
pp. 4160–4169, 2023.
[19] Z. Wang, S. Wu, W. Xie, M. Chen, and V. A. Prisacariu, “Nerf–: Neural radiance fields without known camera parameters,” 2021.
[20] Y. Lin, T. Müller, J. Tremblay, B. Wen, S. Tyree, A. Evans, P. A. Vela, and
S. Birchfield, “Parallel inversion of neural radiance fields for robust pose estimation,” in 2023 IEEE International Conference on Robotics and Automation
(ICRA), pp. 9377–9384, IEEE, 2023.
[21] E. Sucar, S. Liu, J. Ortiz, and A. J. Davison, “imap: Implicit mapping and
positioning in real-time,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6229–6238, 2021.
[22] C. Yan, D. Qu, D. Xu, B. Zhao, Z. Wang, D. Wang, and X. Li, “Gs-slam: Dense
visual slam with 3d gaussian splatting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19595–19604, 2024.
[23] Z. Zhu, S. Peng, V. Larsson, W. Xu, H. Bao, Z. Cui, M. R. Oswald, and
M. Pollefeys, “Nice-slam: Neural implicit scalable encoding for slam,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12786–12796, 2022.
[24] A. Chen, Z. Xu, A. Geiger, J. Yu, and H. Su, “Tensorf: Tensorial radiance
fields,” in European conference on computer vision, pp. 333–350, Springer,
2022.
[25] R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 12179–12188, 2021.
[26] T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” vol. 41, pp. 1–15, ACM New
York, NY, USA, 2022.
[27] A. Yu, R. Li, M. Tancik, H. Li, R. Ng, and A. Kanazawa, “Plenoctrees for
real-time rendering of neural radiance fields,” in Proceedings of the IEEE/CVF
International Conference on Computer Vision, pp. 5752–5761, 2021.
[28] C. Sun, M. Sun, and H.-T. Chen, “Direct voxel grid optimization: Superfast convergence for radiance fields reconstruction,” in Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, pp. 5459–
5469, 2022.
[29] B. Kerbl, A. Meuleman, G. Kopanas, M. Wimmer, A. Lanvin, and G. Drettakis,
“A hierarchical 3d gaussian representation for real-time rendering of very large
datasets,” vol. 43, pp. 1–15, ACM New York, NY, USA, 2024.
[30] M. Zwicker, H. Pfister, J. van Baar, and M. Gross, “Ewa splatting,” IEEE
Transactions on Visualization and Computer Graphics, vol. 8, pp. 223–238,
07/2002-09/2002 2002.
[31] T. Xie, Z. Zong, Y. Qiu, X. Li, Y. Feng, Y. Yang, and C. Jiang, “Physgaussian: Physics-integrated 3d gaussians for generative dynamics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4389–4398, 2024.
[32] Z. Ye, W. Li, S. Liu, P. Qiao, and Y. Dou, “Absgs: Recovering fine details in
3d gaussian splatting,” in ACM Multimedia 2024, 2024.
[33] V. Ye and A. Kanazawa, “Mathematical supplement for the gsplat library,”
2023.
[34] Z. Yu, A. Chen, B. Huang, T. Sattler, and A. Geiger, “Mip-splatting: Aliasfree 3d gaussian splatting,” in Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 19447–19456, June
2024.
[35] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and temples: Benchmarking large-scale scene reconstruction,” vol. 36, pp. 1–13, ACM New York,
NY, USA, 2017.
[36] J. Reizenstein, R. Shapovalov, P. Henzler, L. Sbordone, P. Labatut, and
D. Novotny, “Common objects in 3d: Large-scale learning and evaluation of
real-life 3d category reconstruction,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10901–10911, 2021.
[37] S. Kheradmand, D. Rebain, G. Sharma, W. Sun, J. Tseng, H. Isack, A. Kar,
A. Tagliasacchi, and K. M. Yi, “3d gaussian splatting as markov chain monte
carlo,” arXiv preprint arXiv:2404.09591, 2024.
[38] P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “LightGlue: Local Feature
Matching at Light Speed,” in ICCV, 2023.
[39] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “LoFTR: Detector-free local
feature matching with transformers,” CVPR, 2021.

簡易檢索 / 詳目顯示

相關論文