研究生: |
孫 誠 Sun, Cheng |
---|---|
論文名稱: |
從 360 全景影像重建不同細節層次之室內幾何 Indoor Geometry Reconstruction from 360 to Different Levels |
指導教授: |
陳煥宗
Chen, Hwann-Tzong 孫民 Sun, Min |
口試委員: |
劉庭祿
Liu, Tyng-Luh 王鈺強 Wang, Yu-Chiang 莊永裕 Chuang, Yung-Yu 林彥宇 Lin, Yen-Yu 劉育綸 Liu, Yu-Lun |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 135 |
中文關鍵詞: | 360 、全景影像 、室內幾何重建 、房間格局 、平面偵測 、深度預測 、重力對齊 |
外文關鍵詞: | 360, panorama, indoor geometry reconstruction, room layout, plane detection, depth estimation, gravity alignment |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在從室內場景中捕獲的單張全景圖像中重建不同層次的幾何結構。
具體而言,我們專注於研究高層次的房間佈局、中層次的平面結構和低層次的像素層級深度估計,這些重建結果在各種後續應用中廣受需求(例如,房地產展示的房間佈局、機器人避障導航的三維平面、以及用於一般場景理解的深度估計)。
從僅有彩色感知進行幾何重建是具有挑戰性且本質上是一個不適定性的問題。
幸運的是,人造結構在重力方向上展現出強烈的規律性。
此外,通過從全方位視野進行消失點分析,我們可以輕鬆將等距全景圖像的Y軸與重力方向對齊。
我們的方法主要受到重力對齊先驗的啟發。
為了重建房間佈局 (room layout),我們觀察到由於重力對齊的特性,牆與牆的邊界在圖像中呈垂直,而牆壁與天花板/地板之間的邊界,在圖像的每一列 (column) 只出現一次。
基於這一觀察,我們重新定義問題為圖片每列的回歸任務,而不是傳統的平面熱點圖預測任務。
為了偵測平面 (planes),我們提倡使用水平和垂直平面,因為它們簡單且仍能夠捕捉室內場景的要點。
在這方面,我們建立了一個大規模的 360 水平和垂直平面偵測資料集,並提出了一種新的分而治之的策略,以利於偵測全景圖像中的細結構。
為了估計像素深度 (depth),我們重新設計了模型架構,將模型特徵沿重力方向(圖片 y 軸)進行聚合,從而得到了緊湊的水平特徵編碼。
受惠於重力對齊的特性,我們提出的高效深度神經網絡相比於傳統使用平面特徵編碼方法,能夠在測資上達到更佳的結果。
總結來說,我們開發了三種方法有效利用 360 全景圖的重力對齊先驗,對室內佈局、平面和像素進行重建。這些方法在發表時都達到了最先進的成果。
本研究中開發的所有程式都可以公開獲得,以供重現和未來擴展使用。
- 360 室內格局: https://github.com/sunset1995/HorizonNet
- 360 平面偵測: https://github.com/sunset1995/PanoPlane360
- 360 深度預測: https://github.com/sunset1995/HoHoNet
This research aims to reconstruct different levels of geometry from a single panorama image captured in indoor scenes.
Specifically, we focus on studying the high-level room layout, middle-level planes, and low-level per-pixel depth estimation, which are in demand by various downstream applications (e.g., planes for robot affordance and avoidance, room layout for real estate showcase, depth for general scene understanding).
Reconstructing geometry from color-only perception is challenging and intrinsically an ill-posed problem.
Fortunately, human-made structures exhibit strong regularity aligned with the gravity direction.
In addition, by employing vanishing points analysis from the omnidirectional field-of-view, we can easily align the y-axis of the equirectangular panorama images with gravity direction.
Our methods are motivated by the gravity aligned prior.
To reconstruct room layout, we observe that wall-wall boundaries appear vertical in the images, and wall-ceiling/floor boundaries appear only once within an image column due to gravity alignment.
With this observation, we reformulate the problem as a horizontal 1D (per-column) regression task instead of the conventional 2D heatmaps prediction task.
To segment plane instances, we propose focusing on horizontal and vertical planes due to their simplicity and ability to capture the gist of indoor scenes.
In this respect, we construct a large-scale 360 horizontal and vertical plane instances dataset and present a new divide-and-conquer strategy to detect thin planes in panorama images.
To estimate dense depth, we have redesigned the model architecture to aggregate deep features along the gravity direction (y-axis), resulting in a compact 1D horizontal feature encoding.
Thanks to gravity alignment, the proposed efficient deep neural network can achieve superior quality compared to previous methods that use 2D feature encoding.
In summary, we have developed three methods that utilize the gravity alignment prior of 360 panoramas to reconstruct indoor geometry at the layout, planes, and pixel levels. These methods have achieved state-of-the-art results at the time of their publication.
All codes developed in this work are publicly available for reproduction and future extension.
- 360 Room Layout: https://github.com/sunset1995/HorizonNet
- 360 Plane Detection: https://github.com/sunset1995/PanoPlane360
- 360 Dense Depth: https://github.com/sunset1995/HoHoNet
[1] C. Sun, C. Hsiao, M. Sun, and H. Chen, “Horizonnet: Learning room layout with 1d representation and pano stretch data augmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 1047–1056, Computer Vision Foundation / IEEE, 2019.
[2] C. Sun, C. Hsiao, N. Wang, M. Sun, and H. Chen, “Indoor panorama planar 3d reconstruction via divide and conquer,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 11338–11347, Computer Vision Foundation / IEEE, 2021.
[3] C. Sun, M. Sun, and H. Chen, “Hohonet: 360 indoor holistic understanding with latent horizontal features,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 2573–2582, Computer Vision Foundation / IEEE, 2021.
[4] J. M. Coughlan and A. L. Yuille, “Manhattan world: Compass direction from a single image by bayesian inference,” in Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, vol. 2, pp. 941–947, IEEE, 1999.
[5] E. Delage, H. Lee, and A. Y. Ng, “A dynamic bayesian network model for autonomous 3d reconstruction from a single indoor image,” in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2, pp. 2418–2428, IEEE, 2006.
[6] D.C.Lee,M.Hebert,andT.Kanade,“Geometricreasoningforsingleimagestructure recovery,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 2136–2143, IEEE, 2009.
[7] V. Hedau, D. Hoiem, and D. Forsyth, “Recovering the spatial layout of cluttered rooms,” in Computer vision, 2009 IEEE 12th international conference on, pp. 1849–1856, IEEE, 2009.
[8] D.Hoiem,A.A.Efros,andM.Hebert,“Recoveringsurfacelayoutfromanimage,” International Journal of Computer Vision, vol. 75, no. 1, pp. 151–172, 2007.
[9] A. Gupta, M. Hebert, T. Kanade, and D. M. Blei, “Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces,” in Advances in neural information processing systems, pp. 1288–1296, 2010.
[10] A. G. Schwing and R. Urtasun, “Efficient exact inference for 3d indoor scene understanding,” in European Conference on Computer Vision, pp. 299–313, Springer, 2012.
[11] R. Urtasun, M. Pollefeys, T. Hazan, and A. Schwing, “Efficient structured prediction for 3d indoor scene understanding,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2815–2822, IEEE, 2012.
[12] S. Ramalingam, J.K. Pillai, A. Jain, and Y. Taguchi, “Manhattan junction catalogue for spatial reasoning of indoor scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3065–3072, 2013.
[13] L. DelPero, J. Bowdish, B. Kermgard, E. Hartley, and K. Barnard, “Understanding bayesian rooms using composite 3d object models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 153–160, 2013.
[14] Y. Zhao and S.-C. Zhu, “Scene parsing by integrating function, geometry and appearance models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3119–3126, 2013.
[15] Y. Zhang, S. Song, P. Tan, and J. Xiao, “Panocontext: A whole-room 3d context model for panoramic scene understanding,” in European Conference on Computer Vision, pp. 668–686, Springer, 2014.
[16] J. Xu, B. Stenger, T. Kerola, and T. Tung, “Pano2cad: Room layout from a single panorama image,” in Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pp. 354–362, IEEE, 2017.
[17] H. Yangand H. Zhang, “Efficient 3d room shape recovery from a single panorama,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5422–5430, 2016.
[18] Y. Yang, S. Jin, R. Liu, S.B. Kang, and J. Yu, “Automatic 3d indoor scene modeling from single panorama,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3926–3934, 2018.
[19] G. Pintore, V. Garro, F. Ganovelli, E. Gobbetti, and M. Agus, “Omnidirectional image capture on mobile devices for fast automatic generation of 2.5 d indoor maps,” in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, pp. 1–9, IEEE, 2016.
[20] R. Cabral and Y. Furukawa, “Piecewise planar and compact floorplan reconstruction from images,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 628–635, IEEE, 2014.
[21] A. Mallya and S.Lazebnik, “Learning in formative edge maps for indoor scene layout prediction,” in Proceedings of the IEEE international conference on computer vision, pp. 936–944, 2015.
[22] Y. Ren, S. Li, C. Chen, and C.C.J. Kuo, “A coarse-to-fine indoor layout estimation method,” in Asian Conference on Computer Vision, pp. 36–51, Springer, 2016.
[23] H. Zhao, M. Lu, A. Yao, Y. Guo, Y. Chen, and L. Zhang, “Physics inspired optimization on semantic transfer features: An alternative method for room layout estimation,” arXiv preprint arXiv:1707.00383, 2017.
[24] S. Dasgupta, K. Fang, K. Chen, and S. Savarese, “Delay: Robust spatial layout estimation for cluttered indoor scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 616–624, 2016.
[25] H. Izadinia, Q. Shan, and S. M. Seitz, “Im2cad,” in CVPR, 2017.
[26] C.-Y.Lee,V.Badrinarayanan,T.Malisiewicz,andA.Rabinovich,“Roomnet:Endto-end room layout estimation,” in Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 4875–4884, IEEE, 2017.
[27] C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “Layoutnet: Reconstructing the 3d room layout from a single rgb image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2051–2059, 2018.
[28] I. Armeni, A. Sax, A. R. Zamir, and S. Savarese, “Joint 2D-3D-Semantic Data for Indoor Scene Understanding,” ArXiv e-prints, Feb. 2017.
[29] C. Fernandez-Labrador, A. Perez-Yus, G. Lopez-Nicolas, and J. J. Guerrero, “Layouts from panoramic images with geometry and deep learning,” arXiv preprint arXiv:1806.08294, 2018.
[30] S.-T. Yang, F.-E. Wang, C.-H. Peng, P. Wonka, M. Sun, and H.-K. Chu, “Dulanet: A dual-projection network for estimating room layouts from a single rgb panorama,” arXiv preprint arXiv:1811.11977, 2018.
[31] C. Fernandez-Labrador, J. M. Fácil, A. Perez-Yus, C. Demonceaux, J. Civera, and J. J. Guerrero, “Corners for layout: End-to-end layout recovery from 360 images,” arXiv:1903.08094, 2019.
[32] C. Fernandez-Labrador, J. M. Facil, A. Perez-Yus, C. Demonceaux, and J. J. Guerrero, “Panoroom: From the sphere to the 3d layout,” arXiv preprint arXiv:1808.09879, 2018.
[33] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 770–778, IEEE Computer Society, 2016.
[34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[35] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
[36] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014.
[37] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015.
[38] C. Liu, K. Kim, J. Gu, Y. Furukawa, and J. Kautz, “PlaneRCNN: 3d plane detection and reconstruction from a single image,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 4450–4459, 2019.
[39] C. Liu, J. Yang, D. Ceylan, E. Yumer, and Y. Furukawa, “PlaneNet: piece-wise planar reconstruction from a single RGB image,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 2579–2588, 2018.
[40] A. Newell, Z. Huang, and J. Deng, “Associative embedding: End-to-end learning for joint detection and grouping,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 2277–2287, 2017.
[41] A. Dai, A. X. Chang, M. Savva, M. Halber, T. A. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 2432–2443, 2017.
[42] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from RGBD images,” in Computer Vision ECCV 2012 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V, pp. 746–760, 2012.
[43] A.X. Chang, A. Dai, T.A. Funkhouser, M. Halber, M. Nießner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3D: learning from RGB-D data in indoor environments,” in 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China, October 10-12, 2017, pp. 667–676, 2017.
[44] I. Armeni, S. Sax, A. R. Zamir, and S. Savarese, “Joint 2d-3d-semantic data for indoor scene understanding,” CoRR, vol. abs/1702.01105, 2017.
[45] N. Wang, B. Solarte, Y. Tsai, W. Chiu, and M. Sun, “360sd-net: 360° stereo depth estimation with learnable cost volume,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 August 31, 2020, pp. 582–588, IEEE, 2020.
[46] J.M. Coughlan and A.L.Yuille, “The manhattan world assumption: Regularitiesin scene statistics which enable bayesian inference,” in Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, Denver, CO, USA, pp. 845–851, MIT Press, 2000.
[47] G. Schindler and F. Dellaert, “Atlanta world: An expectation maximization framework for simultaneous low-level edge grouping and camera calibration in complex man-made environments,” in 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), with CD-ROM, 27 June 2 July 2004, Washington, DC, USA, pp. 203–209, IEEE Computer Society, 2004.
[48] Z. Yu, J. Zheng, D. Lian, Z. Zhou, and S. Gao, “Single-image piece-wise planar 3d reconstruction via associative embedding,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 1029–1037, 2019.
[49] F.YangandZ.Zhou,“Recovering3dplanesfromasingleimageviaconvolutional neural networks,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X, vol. 11214 of Lecture Notes in Computer Science, pp. 87–103, Springer, 2018.
[50] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick, “Mask R-CNN,” in IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2980–2988, 2017.
[51] B.D. Brabandere, D. Neven, and L.V. Gool, “Semantic instance segmentation with a discriminative loss function,” CoRR, vol. abs/1708.02551, 2017.
[52] A. Fathi, Z. Wojna, V. Rathod, P. Wang, H.O. Song, S. Guadarrama, and K.P. Murphy, “Semantic instance segmentation via deep metric learning,” CoRR, vol. abs/ 1703.10277, 2017.
[53] S.KongandC.C.Fowlkes,“Recurrentpixelembeddingforinstancegrouping,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 9018–9028, 2018.
[54] Z. Jiang, B. Liu, S. Schulter, Z. Wang, and M. Chandraker, “Peek-a-boo: Occlusion reasoning in indoor scenes with plane representations,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 110–118, IEEE, 2020.
[55] Y. Qian and Y. Furukawa, “Learning pairwise inter-plane relations for piecewise planar reconstruction,” in Computer Vision ECCV 2020 European Conference, 2020.
[56] H.Yang and H. Zhang, “Efficient 3d room shape recovery from a single panorama,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 5422–5430, 2016.
[57] S. Song, A. Zeng, A. X. Chang, M. Savva, S. Savarese, and T. A. Funkhouser, “Im2pano3d: Extrapolating 360° structure and semantics beyond the field of view,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 3847–3856, 2018.
[58] J. Xu, B. Stenger, T. Kerola, and T. Tung, “Pano2cad: Room layout from a single panorama image,” in 2017 IEEE Winter Conference on Applications of Computer Vision, WACV 2017, Santa Rosa, CA, USA, March 24-31, 2017, pp. 354–362, 2017.
[59] M. Eder, P. Moulon, and L. Guan, “Pano popups: Indoor 3d reconstruction with a plane-aware network,” in 2019 International Conference on 3D Vision, 3DV 2019, Québec City, QC, Canada, September 16-19, 2019, pp. 76–84, 2019.
[60] R. Liu, J. Lehman, P. Molino, F.P. Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pp. 9628–9639, 2018.
[61] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, “Structured3D: A large photo-realistic dataset for structured 3d modeling,” CoRR, vol. abs/1908.00222, 2019.
[62] C. Fernandez-Labrador, A. Pérez-Yus, G. López-Nicolás, and J. J. Guerrero, “Layouts from panoramic images with geometry and deep learning,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3153–3160, 2018.
[63] Y. Yang, S. Jin, R. Liu, S.B. Kang, and J. Yu, “Automatic 3d indoor scene modeling from single panorama,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 3926– 3934, 2018.
[64] Y. Zhang, S. Song, P. Tan, and J. Xiao, “PanoContext: A whole-room 3d context model for panoramic scene understanding,” in Computer Vision ECCV 2014 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI, pp. 668–686, 2014.
[65] C. Zou, A. Colburn, Q. Shan, and D. Hoiem, “LayoutNet: reconstructing the 3d room layout from a single RGB image,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 2051–2059, 2018.
[66] L. Jin, Y. Xu, J. Zheng, J. Zhang, R. Tang, S. Xu, J. Yu, and S. Gao, “Geometric structure based and regularized depth estimation from 360 indoor imagery,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 886–895, IEEE, 2020.
[67] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pp. 248–255, 2009.
[68] T. Wang, H. Huang, J. Lin, C. Hu, K. Zeng, and M. Sun, “Omnidirectional CNN for visual place recognition and navigation,” in 2018 IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, May 21-25, 2018, pp. 2341–2348, 2018.
[69] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[70] P. Arbelaez, M. Maire, C. C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 898–916, 2011.
[71] M. Eder, M. Shvets, J. Lim, and J. Frahm, “Tangent images for mitigating spherical distortion,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 12423–12431, IEEE, 2020.
[72] Y. K. Lee, J. Jeong, J. S. Yun, W. Cho, and K. Yoon, “Spherephd: Applying cnns on a spherical polyhedron representation of 360deg images,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 9181–9189, 2019.
[73] C. Zhang, S. Liwicki, W. Smith, and R. Cipolla, “Orientation-aware semantic segmentation on icosahedron spheres,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 November 2, 2019, pp. 3532–3540, IEEE, 2019.
[74] F. Wang, Y. Yeh, M. Sun, W. Chiu, and Y. Tsai, “Bifuse: Monocular 360 depth estimation via bi-projection fusion,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 459–468, IEEE, 2020.
[75] W. Zeng, S. Karaoglu, and T. Gevers, “Joint 3d layout and depth prediction from a single indoor panorama image,” in Computer Vision ECCV 2020 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XVI, vol. 12361 of Lecture Notes in Computer Science, pp. 666–682, Springer, 2020.
[76] S. Yang, F. Wang, C. Peng, P. Wonka, M. Sun, and H. Chu, “Dula-net: A dualprojection network for estimating room layouts from a single RGB panorama,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 3363–3372, 2019.
[77] C. Zou, J. Su, C. Peng, A. Colburn, Q. Shan, P. Wonka, H. Chu, and D. Hoiem, “3d manhattan room layout reconstruction from a single 360 image,” CoRR, vol. abs/ 1910.04099, 2019.
[78] D. S. Chaplot, R. Salakhutdinov, A. Gupta, and S. Gupta, “Neural topological SLAM for visual navigation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 12872–12881, IEEE, 2020.
[79] J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, and Z. Zhou, “Structured3d: A large photo-realistic dataset for structured 3d modeling,” in Proceedings of The European Conference on Computer Vision (ECCV), 2020.
[80] F. Wang, Y. Yeh, M. Sun, W. Chiu, and Y. Tsai, “Layoutmp3d: Layout annotation of matterport3d,” CoRR, vol. abs/2003.13516, 2020.
[81] T. S. Cohen, M. Geiger, J. Köhler, and M. Welling, “Spherical CNNs,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 May 3, 2018, Conference Track Proceedings, 2018.
[82] B. Coors, A. P. Condurache, and A. Geiger, “Spherenet: Learning spherical representations for detection and classification in omnidirectional images,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IX, pp. 525–541, 2018.
[83] Y.SuandK.Grauman,“Learningsphericalconvolutionforfastfeaturesfrom360° imagery,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 529–539, 2017.
[84] Y.SuandK.Grauman,“Kerneltransformernetworksforcompactsphericalconvolution,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 9442–9451, 2019.
[85] K. Tateno, N. Navab, and F. Tombari, “Distortion-aware convolutional filters for dense prediction in panoramic images,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XVI, vol. 11220 of Lecture Notes in Computer Science, pp. 732–750, Springer, 2018.
[86] H. Cheng, C. Chao, J. Dong, H. Wen, T. Liu, and M. Sun, “Cube padding for weakly-supervised saliency prediction in 360° videos,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 1420–1429, 2018.
[87] N. Zioulis, A. Karakottas, D. Zarpalas, and P. Daras, “OmniDepth: dense depth estimation for indoors spherical panoramas,” in Computer Vision ECCV 2018 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VI, pp. 453–471, 2018.
[88] T. Cohen, M. Weiler, B. Kicanaoglu, and M. Welling, “Gaug eequivariant convolutional networks and the icosahedral CNN,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, vol. 97 of Proceedings of Machine Learning Research, pp. 1321–1330, PMLR, 2019.
[89] C. M. Jiang, J. Huang, K. Kashinath, Prabhat, P. Marcus, and M. Nießner, “Spherical cnns on unstructured grids,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
[90] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[91] A.Vaswani,N.Shazeer,N.Parmar,J.Uszkoreit,L.Jones,A.N.Gomez,L.Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp. 5998–6008, 2017.
[92] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, “Deeper depth prediction with fully convolutional residual networks,” in Fourth International Conference on 3D Vision, 3DV 2016, Stanford, CA, USA, October 25-28, 2016, pp. 239–248, 2016.
[93] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and ComputerAssisted Intervention MICCAI 2015 18th International Conference Munich, Germany, October 5 9, 2015, Proceedings, Part III, vol. 9351 of Lecture Notes in Computer Science, pp. 234–241, Springer, 2015.
[94] G. Pintore, M. Agus, and E. Gobbetti, “Atlantanet: Inferring the 3D indoor layout from a single 360 image beyond the manhattan world assumption,” in Proceedings of The European Conference on Computer Vision (ECCV), 2020.
[95] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net, 2021.
[96] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in 2021 IEEE/ CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 9992–10002, IEEE, 2021.
[97] Z. Jiang, Z. Xiang, J. Xu, and M. Zhao, “Lgt-net: Indoor panoramic room layout estimation with geometry-aware transformer network,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 1644–1653, IEEE, 2022.
[98] W. Shen, Y. Dong, Z. Chen, Z. Zhao, Y. Gao, and Z. Liu, “Panovit: Vision transformer for room layout estimation from a single panoramic image,” CoRR, vol. abs/ 2212.12156, 2022.
[99] Z. Shen, C. Lin, K. Liao, L. Nie, Z. Zheng, and Y. Zhao, “Panoformer: Panorama transformer for indoor 360$^{\circ }$ depth estimation,” in Computer Vision ECCV 2022 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part I, vol. 13661 of Lecture Notes in Computer Science, pp. 195– 211, Springer, 2022.
[100] J. Zhang, K. Yang, H. Shi, S. Reiß, K. Peng, C. Ma, H. Fu, K. Wang, and R. Stiefelhagen, “Behind every domain there is a shift: Adapting distortion-aware vision transformers for panoramic semantic segmentation,” CoRR, vol. abs/2207.11860, 2022.
[101] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene parsing through ADE20K dataset,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5122– 5130, IEEE Computer Society, 2017.
[102] I. Yun, H. Lee, and C. Rhee, “Improving 360 monocular depth estimation via nonlocal dense prediction transformer and joint supervised and self-supervised learning,” in Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 March 1, 2022, pp. 3224–3233, AAAI Press, 2022.
[103] M. S. Junayed, A. Sadeghzadeh, M. B. Islam, L. Wong, and T. Aydin, “Himode: A hybrid monocular omnidirectional depth estimation model,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, New Orleans, LA, USA, June 19-20, 2022, pp. 5208–5217, IEEE, 2022.
[104] M. Li, S. Wang, W. Yuan, W. Shen, Z. Sheng, and Z. Dong, “$S2net: Accurate panorama depth estimation on spherical surface,” IEEE Robotics Autom. Lett., vol. 8, no. 2, pp. 1053–1060, 2023.
[105] M. Rey-Area, M. Yuan, and C. Richardt, “360monodepth: High-resolution 360° monocular depth estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp. 3752–3762, IEEE, 2022.