研究生: |
楊尚達 Yang, Shang-Ta |
---|---|
論文名稱: |
基於深度學習之單張全景影像室內三維格局估測技術 3D Room Layout Estimation from a Single RGB Panorama using Deep Learning |
指導教授: |
朱宏國
Chu, Hung-Kuo |
口試委員: |
胡敏君
Hu, Min-Chun 姚智原 Yao, Chih-Yuan |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 41 |
中文關鍵詞: | 深度學習 、場景感知 、格局預估 |
外文關鍵詞: | deep learning, scene understanding, layout estimation |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文提出一個深度學習框架,在滿足曼哈頓假設 (Manhattan-world) 的條件下,利用單張的彩色室內全景圖預估房間的三維格局,為了達到更高的預估準確率,我們利用了兩種不同的全景投影方式於同一個模型中,分別為等距長方投影的全景視野(panorama-view) 以及透視投影的天花板視野 (ceiling-view),這兩種視野分別含有不同的房間格局資訊,而本神經網路結構包含了兩個編碼以及解碼的分支架構,分別用於分析兩種視野,除此之外,本論文亦提出一種新的特徵融合架構,可以將兩個不同視野的特徵在模型中結合,同時訓練出可以預估二維格局圖以及格局高度的神經網路,為了可以學習和預估更複雜的三維格局形狀,本篇同時建立了一個含有大量複雜的三維格局資料庫,最終的實驗結果顯示本論文提出的架構和方法在精準度上已超越現有最先進的算法,尤其在比長方體 (cuboid) 還複雜的多邊三維格局下有更顯著的改進。
We present a deep learning framework to predict Manhattan-world 3D room layouts from a single RGB panorama. To achieve better prediction accuracy, our method leverages two projections of the panorama at once, namely the equirectangular panorama-view and the perspective ceiling-view, that each contains different clues about the room layouts. Our network architecture consists of two encoder-decoder branches for analyzing each of the two views. In addition, a novel feature fusion structure is proposed to connect the two branches, which are then jointly trained to predict the 2D floor plans and layout heights. To learn more complex room layouts, we introduce the Realtor360 dataset that contains panoramas of Manhattan-world room layouts with different numbers of corners. Experimental results show that our work outperforms recent state-of-the-art in prediction accuracy and performance, especially in the rooms with non-cuboid layouts
[1] Chuhang Zou, Alex Colburn, Qi Shan, and Derek Hoiem. Layoutnet: Reconstructing the
3d room layout from a single rgb image. In The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June 2018.
[2] Yinda Zhang, Shuran Song, Ping Tan, and Jianxiong Xiao. Panocontext: A whole-room
3d context model for panoramic scene understanding. In Computer Vision - ECCV 2014 -
13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part
VI, pages 668–686, 2014. doi: 10.1007/978-3-319-10599-4\_43. URL https://doi.org/
10.1007/978-3-319-10599-4_43.
[3] J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba. Recognizing scene viewpoint using
panoramic place representation. In 2012 IEEE Conference on Computer Vision and Pattern
Recognition, pages 2695–2702, June 2012. doi: 10.1109/CVPR.2012.6247991.
[4] James M. Coughlan and A. L. Yuille. Manhattan world: Compass direction from a single
image by bayesian inference. pages 941–, 1999. URL http://dl.acm.org/citation.cfm?
id=850924.851554.
[5] V. Hedau, D. Hoiem, and D. Forsyth. Recovering the spatial layout of cluttered rooms.
In 2009 IEEE 12th International Conference on Computer Vision, pages 1849–1856, Sept
2009. doi: 10.1109/ICCV.2009.5459411.
[6] S. Dasgupta, K. Fang, K. Chen, and S. Savarese. Delay: Robust spatial layout estimation
for cluttered indoor scenes. In 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 616–624, June 2016. doi: 10.1109/CVPR.2016.73.
[7] Chen-Yu Lee, Vijay Badrinarayanan, Tomasz Malisiewicz, and Andrew Rabinovich. Roomnet: End-to-end room layout estimation. CoRR, abs/1703.06241, 2017. URL http:
//arxiv.org/abs/1703.06241.
[8] D. C. Lee, M. Hebert, and T. Kanade. Geometric reasoning for single image structure
recovery. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages
2136–2143, June 2009. doi: 10.1109/CVPR.2009.5206872.
[9] Srikumar Ramalingam and Matthew Brand. Lifting 3d manhattan lines from a single
image. 2013 IEEE International Conference on Computer Vision, pages 497–504, 2013.
[10] D. Hoiem, A. A. Efros, and M. Hebert. Geometric context from a single image. In Tenth
IEEE International Conference on Computer Vision (ICCV’05) Volume 1, volume 1, pages
654–661 Vol. 1, Oct 2005. doi: 10.1109/ICCV.2005.107.
[11] Derek Hoiem, Alexei A. Efros, and Martial Hebert. Recovering surface layout from an image. International Journal of Computer Vision, 75(1):151–172, Oct 2007. ISSN 1573-1405.
doi: 10.1007/s11263-006-0031-y. URL https://doi.org/10.1007/s11263-006-0031-y.
[12] Abhinav Gupta, Martial Hebert, Takeo Kanade, and David M. Blei. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and
A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages
1288–1296. Curran Associates, Inc., 2010. URL http://papers.nips.cc/paper/
4120-estimating-spatial-layout-of-rooms-using-volumetric-reasoning-about-objects-and-surfaces.
pdf.
[13] Arun Mallya and Svetlana Lazebnik. Learning informative edge maps for indoor scene
layout prediction. In Proceedings of the 2015 IEEE International Conference on Computer
Vision (ICCV), ICCV ’15, pages 936–944, Washington, DC, USA, 2015. IEEE Computer
Society. ISBN 978-1-4673-8391-2. doi: 10.1109/ICCV.2015.113. URL http://dx.doi.
org/10.1109/ICCV.2015.113.
[14] Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, and Li Zhang. Physics
inspired optimization on semantic transfer features: An alternative method for room layout
estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
July 2017.
[15] J. Zhang, C. Kan, A. G. Schwing, and R. Urtasun. Estimating the 3d layout of indoor
scenes and its clutter from depth sensors. In 2013 IEEE International Conference on
Computer Vision, pages 1273–1280, Dec 2013. doi: 10.1109/ICCV.2013.161.
[16] C. Liu, P. Kohli, and Y. Furukawa. Layered scene decomposition via the occlusion-crf.
In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages
165–173, June 2016. doi: 10.1109/CVPR.2016.25.
[17] Yinda Zhang, Mingru Bai, Pushmeet Kohli, Shahram Izadi, and Jianxiong Xiao. Deepcontext: Context-encoding neural pathways for 3d holistic scene understanding. International
Conference on Computer Vision (ICCV 2017), 2017.
[18] Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. Omnidepth:
Dense depth estimation for indoors spherical panoramas. In The European Conference on
Computer Vision (ECCV), September 2018.
[19] G. Pintore, V. Garro, F. Ganovelli, E. Gobbetti, and M. Agus. Omnidirectional image
capture on mobile devices for fast automatic generation of 2.5d indoor maps. In 2016
IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1–9, March
2016. doi: 10.1109/WACV.2016.7477631.
[20] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohi,
J. Shotton, S. Hodges, and A. Fitzgibbon. Kinectfusion: Real-time dense surface mapping
and tracking. In 2011 10th IEEE International Symposium on Mixed and Augmented
Reality, pages 127–136, Oct 2011. doi: 10.1109/ISMAR.2011.6092378.
[21] Aron Monszpart, Nicolas Mellado, Gabriel J. Brostow, and Niloy J. Mitra. Rapter: Rebuilding man-made scenes with regular arrangements of planes. ACM Trans. Graph.,
34(4):103:1–103:12, July 2015. ISSN 0730-0301. doi: 10.1145/2766995. URL http:
//doi.acm.org/10.1145/2766995.
[22] Chen Liu, Jiaye Wu, and Yasutaka Furukawa. Floornet: A unified framework for floorplan
reconstruction from 3d scans. European Conference on Computer Vision (ECCV), 2018,
2018.
[23] Kosuke Fukano, Yoshihiko Mochizuki, Satoshi Iizuka, Edgar Simo-Serra, Akihiro Sugimoto,
and Hiroshi Ishikawa. Room reconstruction from a single spherical image by higher-order
energy minimization. 2016 23rd International Conference on Pattern Recognition (ICPR),
pages 1768–1773, 2016.
[24] H. Yang and H. Zhang. Efficient 3d room shape recovery from a single panorama. In 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5422–5430,
June 2016. doi: 10.1109/CVPR.2016.585.
[25] J. Xu, B. Stenger, T. Kerola, and T. Tung. Pano2cad: Room layout from a single panorama
image. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV),
pages 354–362, March 2017. doi: 10.1109/WACV.2017.46.
[26] Yang Yang, Shi Jin, Ruiyang Liu, Sing Bing Kang, and Jingyi Yu. Automatic 3d indoor
scene modeling from single panorama. In The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June 2018.
[27] Andrew P. Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, and Wenzhe
Shi. Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution,
resize convolution and convolution resize. CoRR, abs/1707.02937, 2017. URL http://
arxiv.org/abs/1707.02937.
[28] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary
DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017.
[29] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR,
abs/1412.6980, 2014. URL http://arxiv.org/abs/1412.6980.
[30] I. Armeni, A. Sax, A. R. Zamir, and S. Savarese. Joint 2D-3D-Semantic Data for Indoor
Scene Understanding. ArXiv e-prints, February 2017.
[31] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam.
Encoder-decoder with atrous separable convolution for semantic image segmentation. In
ECCV, 2018.