研究生: |
王昱婷 Wang, Yu-Ting |
---|---|
論文名稱: |
基於先驗幾何與殘差優化的自適應神經輻射場架構開發 Development of Adaptive Neural Radiance Fields Architecture Based on Prior Geometry and Residual Optimization |
指導教授: |
葉昭輝
Yeh, Chao-Hui |
口試委員: |
李韋辰
Li, Wei-Chen 呂寧遠 Lue, Ning-Yuan 吳家鴻 Wu, Chia-Hung |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2025 |
畢業學年度: | 113 |
語文別: | 英文 |
論文頁數: | 73 |
中文關鍵詞: | 神經輻射場 、三維場景重建 、運動回復結構 、稀疏點雲 |
外文關鍵詞: | Neural Radiance Fields, 3D Scene Reconstruction, Structure-from-Motion, Sparse Point Cloud |
相關次數: | 點閱:51 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,基於神經輻射場(Neural Radiance Fields, NeRF)的新視角合成方 法是三維場景重建領域中的熱門技術,這個技術能夠生成比傳統的方法更為細 緻且接近真實的影像。然而,傳統的NeRF方法非常依賴視角的數量,當視角數 量不足夠或分布不均勻時,重建結果通常不夠理想,導致渲染的影像品質不如 預期。為了解決這個問題,本研究提出了一種基於先驗知識的改進型NeRF模 型,在不增加硬體資源消耗與運算時間的情況下,提升在視角不充足或不均 勻分布時的重建效果。該模型結合了運動回復結構(Structure-from-Motion, SfM)技術、知識驅動神經網路(knowledge-based neural network)的框架以及 殘差學習(Residual Learning)的概念。
本研究提出的新模型利用SfM技術,提取不同視角拍攝的圖像之間的相對位 置資訊,並將生成稀疏點雲作為幾何先驗,並將其引入到NeRF模型中。這些稀 疏點雲提供了場景的基本結構和密度分布,為後續密度學習的步驟奠定基礎。 接著,模型基於先驗知識的策略進一步的學習場景中更加精細的密度分布,補 足稀疏點雲在表現場景細節方面的缺陷。並通過殘差學習的概念,使模型專注 在學習稀疏點雲的估計與真實場景密度之間的殘差,僅優化稀疏點雲無法覆蓋 的細節而非從頭學習場景資訊。
本研究將提出的新模型與NeRF方法進行全面比較,並在三種不同規模的資 料集上進行驗證。針對每一個資料集,我們分析了不同輸入視角數量對模型表 現的影響,分析模型在視角稀疏情境下的重建效果。實驗結果證明,與NeRF相 比,我們的新方法在視角數量不足的情況下能夠顯著提升重建品質和細節表 現,展現其泛化能力。
In recent years, Neural Radiance Fields (NeRF)-based novel view synthesis methods have become a popular technology in the field of 3D scene reconstruction. This technique can generate images that are more detailed and realistic compared to traditional methods. However, NeRF models heavily rely on multiple viewpoints. When the number of viewpoints is insufficient or their distribution is uneven, the reconstruction results are often suboptimal, leading to rendered images of lower quality than expected. To address this issue, this research proposes an enhanced NeRF model that incorporates prior knowledge to improve reconstruction performance under sparse or uneven viewpoints, without increasing computational cost or hardware requirements. The proposed model integrates Structure-from-Motion (SfM) techniques, a knowledge-driven neural network framework, and residual learning principles.
Our approach uses Structure from Motion (SfM) to gather relative positional data from a limited set of multi-view images, which helps create sparse point clouds as geometric references. These point clouds establish the basic structure and density distribution of the scene, serving as a foundation for the following density learning processes. The model employs a strategy based on prior knowledge to refine the density distribution within the scene, addressing the shortcomings of sparse point clouds in capturing intricate scene details. Moreover, residual learning enables the model to concentrate on optimizing the discrepancies between the estimates from the sparse point clouds and the actual scene density, rather than attempting to reconstruct the entire scene from the ground up.
Comprehensive experiments were conducted on three datasets of varying scales, comparing the proposed model with traditional NeRF approaches. The research also explores the effect of different numbers of input viewpoints on reconstruction performance, particularly under sparse conditions. Results show that the proposed method significantly improves reconstruction quality and detail representation compared to traditional NeRF, even with limited viewpoints, demonstrating its potential for broader applications in 3D scene reconstruction.
1. Y. Furukawa, C. Hernández et al., “Multi-view stereo: A tutorial,” *Foundations and Trends® in Computer Graphics and Vision*, vol. 9, no. 1-2, pp. 1–148, 2015.
2. A. Peiravi and B. Taabbodi, “A reliable 3d laser triangulation-based scanner with a new simple but accurate procedure for finding scanner parameters,” *Journal of American Science*, vol. 6, no. 5, pp. 80–85, 2010.
3. J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in *Proceedings of the IEEE conference on computer vision and pattern recognition*, 2016, pp. 4104–4113.
4. K.-M. Cheung, S. Baker, and T. Kanade, “Shape-from-silhouette across time part i: Theory and algorithms,” *International Journal of Computer Vision*, vol. 62, pp. 221–247, 2005.
5. R. J. Woodham, “Photometric method for determining surface orientation from multiple images,” *Optical engineering*, vol. 19, no. 1, pp. 139–144, 1980.
6. A. A. B. Pritsker, *Introduction to Simulation and SLAM II*. John Wiley & Sons, Inc., 1995.
7. J. Geng, “Structured-light 3d surface imaging: a tutorial,” *Advances in optics and photonics*, vol. 3, no. 2, pp. 128–160, 2011.
8. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” *Communications of the ACM*, vol. 65, no. 1, pp. 99–106, 2021.
9. A. Daw, A. Karpatne, W. D. Watkins, J. S. Read, and V. Kumar, “Physics-guided neural networks (pgnn): An application in lake temperature modeling,” in *Knowledge Guided Machine Learning*. Chapman and Hall/CRC, 2022, pp. 353–372.
10. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in *Proceedings of the IEEE conference on computer vision and pattern recognition*, 2016, pp. 770–778.
11. J. G. D. França, M. A. Gazziro, A. N. Ide, and J. H. Saito, “A 3d scanning system based on laser triangulation and variable field of view,” in *IEEE International Conference on Image Processing 2005*, vol. 1. IEEE, 2005, pp. I–425.
12. S.-C. Huang and T.-H. Le, *Principles and labs for deep learning*. Academic Press, 2021.
13. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” *Advances in neural information processing systems*, vol. 25, 2012.
14. M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, B. C. Van Esesn, A. A. S. Awwal, and V. K. Asari, “The history began from alexnet: A comprehensive survey on deep learning approaches,” *arXiv preprint arXiv:1803.01164*, 2018.
15. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” *Proceedings of the IEEE*, vol. 86, no. 11, pp. 2278–2324, 1998.
16. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” *arXiv preprint arXiv:1409.1556*, 2014.
17. J. Redmon, “You only look once: Unified, real-time object detection,” in *Proceedings of the IEEE conference on computer vision and pattern recognition*, 2016.
18. Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth inference for unstructured multi-view stereo,” in *Proceedings of the European conference on computer vision (ECCV)*, 2018, pp. 767–783.
19. H. Kato, D. Beker, M. Morariu, T. Ando, T. Matsuoka, W. Kehl, and A. Gaidon, “Differentiable rendering: A survey,” *arXiv preprint arXiv:2006.12057*, 2020.
20. N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville, “On the spectral bias of neural networks,” in *International conference on machine learning*. PMLR, 2019, pp. 5301–5310.
21. A. Vaswani, “Attention is all you need,” *Advances in Neural Information Processing Systems*, 2017.
22. K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, 2022, pp. 12 882–12 891.
23. M. Niemeyer, J. T. Barron, B. Mildenhall, M. S. Sajjadi, A. Geiger, and N. Radwan, “Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition*, 2022, pp. 5480–5490.
24. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” *International journal of computer vision*, vol. 60, pp. 91–110, 2004.
25. R. Hartley and A. Zisserman, *Multiple view geometry in computer vision*. Cambridge University Press, 2003.
26. B. Triggs, A. Zisserman, and R. Szeliski, “Vision algorithms: Theory and practice: International workshop on vision algorithms corfu, greece, september 21–22, 1999 proceedings,” in *International Workshop on Vision Algorithms*. Springer, 1999.
27. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, “Pytorch: An imperative style, high-performance deep learning library,” *Advances in neural information processing systems*, vol. 32, 2019.
28. B. Mildenhall, P. P. Srinivasan, R. Ortiz-Cayon, N. K. Kalantari, R. Ramamoorthi, R. Ng, and A. Kar, “Local light field fusion: Practical view synthesis with prescriptive sampling guidelines,” *ACM Transactions on Graphics (ToG)*, vol. 38, no. 4, pp. 1–14, 2019.
29. A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” *IEEE Transactions on communications*, vol. 43, no. 12, pp. 2959–2965, 1995.
30. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” *IEEE transactions on image processing*, vol. 13, no. 4, pp. 600–612, 2004.
31. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in *Proceedings of the IEEE conference on computer vision and pattern recognition*, 2018, pp. 586–595.