基於自編碼器的三維高斯飛濺訓練流程最佳化方法

簡易檢索 / 詳目顯示

回結果列表

研究生：	周子翔 Chou, Tzu-Hsiang
論文名稱：	基於自編碼器的三維高斯飛濺訓練流程最佳化方法 An Autoencoder-Based Approach to Optimizing the Training Process of 3D Gaussian Splatting
指導教授：	馬席彬 Ma, Hsi-Pin
口試委員:	蔡佩芸 Tsai, Pei-Yun 黃稚存 Huang, Chih-Tsun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2024
畢業學年度：	113
語文別：	中文
論文頁數：	63
中文關鍵詞：	三維高斯飛濺、自編碼器、環境重建
外文關鍵詞：	3D Gaussian splatting, Autoencoder, Scene reconstruction
相關次數：	點閱：46 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在當今快速發展的技術世界中，三維高斯飛濺技術已成為影像渲染領域的重要研究方向。三維高斯飛濺通過將三維點雲數據轉換為逼真的二維圖像，在虛擬現實、三維重建和自動駕駛等應用中具有廣泛的潛力。然而，該技術在處理場景數據時面臨計算複雜度和渲染質量的挑戰。

本研究提出了優化三維高斯飛濺的訓練方法及流程，以提升其渲染效果。為了不影響三維高斯飛濺在應用時的即時性，我僅在訓練過程中進行優化。首先，我進行影像前處理，以提高初始點雲數據的質量，並提出一個新的損失函數加入三維高斯飛濺的訓練過程中。隨後，使用自編碼器神經網絡架構修復渲染過程中的不良部分。最後，我利用訓練好的自編碼器進行圖像修復，並將其納入三維高斯飛濺的訓練過程。

通過這些優化訓練方法，我成功提升了三維高斯飛濺的渲染質量，在6個室內場景和5個室外場景的資料集中皆有不錯的表現，解決了三維高斯飛濺中存在的模糊和失真問題。在大多數場景數據集中，與原始三維高斯飛濺方法相比，峰值信噪比提升了0.5~1.5dB；結構相似性指數提升了1~5%；感知相似性指標降低了7~15%。本研究的成果為影像渲染技術的發展提供了新的訓練方式，並有望在各種現實應用中發揮重要作用。

In today's rapidly advancing technological world, 3D Gaussian splatting has become an important research focus in the field of image rendering. 3D Gaussian splatting transforms three-dimensional point cloud data into realistic two-dimensional images, showing great potential in applications such as virtual reality, 3D reconstruction, and autonomous driving. However, this technique faces challenges in computational complexity and rendering quality when processing scene data.

This study proposes an optimized training method and process for 3D Gaussian splatting to improve its rendering performance. To maintain the real-time application of 3D Gaussian splatting, optimizations are applied only during the training process. Firstly, image preprocessing is conducted to enhance the quality of the initial point cloud data, and a new loss function is introduced into the training process of 3D Gaussian splatting. Subsequently, an autoencoder neural network architecture is used to repair defects in the rendering process. Finally, the trained autoencoder is utilized for image restoration and incorporated into the 3D Gaussian splatting training process.

Through these optimized training methods, I successfully improved the rendering quality of 3D Gaussian splatting, achieving strong performance across six indoor scenes and five outdoor scenes in various datasets. These optimizations effectively addressed the issues of blurring and distortion in 3D Gaussian splatting. Compared to the original method, PSNR increased by 0.5 to 1.5 dB, SSIM improved by 1% to 5%, and LPIPS decreased by 7% to 15% in most scene datasets. The results of this study provide a new training approach for the development of image rendering techniques and have the potential to play a significant role in various real-world applications.

摘要 ----------------------------------- II
Abstract ------------------------------ III
第一章 緒論 ---------------------------- 1
第二章 文獻回顧 ------------------------- 5
第三章 研究方法 ------------------------- 21
第四章 實驗結果與討論 -------------------- 37
第五章 結論與未來規劃 -------------------- 59
參考文獻 -------------------------------- 61
                                

1. B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3D gaussian splatting for real-time radiance field rendering.” ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023.
2. P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th international conference on machine learning, ser. ICML ’08. New York, NY, USA: Association for Computing Machinery, 2008, p. 1096–1103.
3. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference. Springer, 2015, pp. 234–241.
4. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
5. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
6. M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M. Seitz, “Multi-view stereo for community photo collections,” in 2007 IEEE 11th international conference on computer vision, 2007, pp. 1–8.
7. J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in 2016 IEEE conference on computer vision and pattern recognition, 2016, pp. 4104–4113.
8. N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville, “On the spectral bias of neural networks,” in International conference on machine learning. PMLR, 2019, pp. 5301–5310.
9. M. Tancik, P. P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal,
R. Ramamoorthi, J. T. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,” in Proceedings of the 34th international conference on neural information processing systems, ser. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc., 2020.
10. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
11. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
12. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
13. M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
14. S. Fridovich-Keil, A. Yu, M. Tancik, Q. Chen, B. Recht, and A. Kanazawa, “Plenoxels:Radiance fields without neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5501–5510.
15. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Trans. Graph, vol. 41, no. 4, p. 1–15, Jul. 2022.
16. J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-nerf 360:Unbounded anti-aliased neural radiance fields,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5470–5479.
17. F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 2016 fourth international conference on 3D vision (3DV). Ieee, 2016, pp. 565–571.
18. O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al., “Attention U-Net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
19. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
20. Q. V. Le, “Building high-level features using large scale unsupervised learning,” in 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013,pp. 8595–8598.
21. “Colmap.” [Online]. Available: https://colmap.github.io/
22. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
23. J. Canny, “A computational approach to edge detection,” IEEE transactions on pattern analysis and machine intelligence, vol. PAMI-8, no. 6, pp. 679–698, 1986.
24. “Relu Pytorch API.” [Online]. Available: https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html
25. “Silu Pytorch API.” [Online]. Available: https://pytorch.org/docs/stable/generated/torch.nn.SiLU.html
26. A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and temples: Benchmarking large-scale scene reconstruction,” ACM Trans. Graph, vol. 36, no. 4, 2017.
27. P. Hedman, J. Philip, T. Price, J.-M. Frahm, G. Drettakis, and G. Brostow, “Deep blending for free-viewpoint image-based rendering,” in ACM Trans. Graph, vol. 37, no. 6. ACM,2018, pp. 257:1–257:15.
28. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
29. “Vgg-16 main documentation.” [Online]. Available: https://pytorch.org/vision/main/models/generated/torchvision.models.vgg16.html
30. “Cuda toolkit documentation v11.8.0.” [Online]. Available: https://docs.nvidia.com/cuda/archive/11.8.0/
31. “Pytorch.” [Online]. Available: https://pytorch.org/
32. “Torchvision.” [Online]. Available: https://pytorch.org/vision/stable/index.html
33. “Specular Reflection.” [Online]. Available: https://byjus.com/question-answer/state-the-characteristics-of-the-image-formed-by-a-plane-mirror/

簡易檢索 / 詳目顯示

相關論文