簡易檢索 / 詳目顯示

研究生: 林宏縉
Lin, Hung-Jin
論文名稱: 基於深度網路使用長方體模型的房間二維及三維結構估測
DeepRoom 2D/3D: Fit the Room with a Cuboid Model via Deep Networks
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 陳煥宗
Chen, Hwann-Tzong
江振國
Chiang, Chen-Kuo
劉庭祿
Liu, Tyng-Luh
學位類別: 碩士
Master
系所名稱:
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 46
中文關鍵詞: 深度學習場景理解室內場景結構估測
外文關鍵詞: deep learning, scen understanding, indoor scene, layout estimation
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 對增強現實應用的高要求使場景理解技術變得越來越重要。
    研究人員開始將深度學習應用於室內場景分析的主題,包括從單個圖像估測房間佈局。
    雖然深度學習方法顯著提高了這個問題的準確性。
    然而,現有的方法遵循常見的流程,僅以深度學習取代傳統模型的前端,卻仍然依賴於房間佈局推理的後處理。
    在本論文中,我們提出了一個具有深度網路的幾何感知框架來估測二維空間中的室內佈局以及三維空間。
    我們將佈局估測的任務分為兩個階段的深度學習網路,首先估測二維佈局空間中的房間佈局,然後估測三維佈局長方體模型參數。
    此方法的目標是從單個圖像預測房間佈局估測的三維長方體表示和相應的姿勢。
    此外,利用這樣的兩階段組合,深度網路可以從具代表性的中間產物理解房間佈局,並且還可以延伸至其他訓練集。
    我們的實驗表明,所提出的模型不僅可以提供有競爭力的二維佈局估測,而且可以提供實時的三維房間佈局估測,且無需後處理。


    High demands for augmented reality applications make scene understanding techniques more and more important.
    Researchers started to develop deep learning solutions for indoor scene analysis, including the room layout estimation from a single image.
    Though deep learning approaches have significantly boosted the accuracy for this problem,
    the existing methods follow the long-established pipeline, which only replaces the front-end of the conventional model and heavily relies on post-processing for room layout reasoning.
    In this thesis, we propose a geometry-aware framework with deep networks to estimate the indoor layout in the 2D space as well as 3D space.
    We decouple the task of layout estimation into two stages, first estimating the room layout in the 2D layout space and then estimating the 3D layout cuboid model parameters, all done with deep learning.
    The target of our approach is to predict the 3D cuboid representation and the corresponding pose for the room layout estimation from a single image.
    Moreover, with such a two-stage formulation, the deep networks are explainable from the intermediate outputs for the representative information and also extensible to other training signals jointly and separately.
    Our experiment shows that the proposed model can provide not only competitive 2D layout estimation but also 3D room layout estimate in real-time without post-processing.

    Contents 1 Introduction 1 1.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ProblemStatement . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 ThesisOrganization . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 RelatedWork 5 2.1 RoomLayoutEstimation . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 RoomLayoutEstimationinDeeplearning . . . . . . . . . . . . . . 6 2.3 CameraPoseandGeometryLearning . . . . . . . . . . . . . . . . 7 3 DeepRoom2D: Layout Estimationin 2D Space 9 3.1 SemanticLayoutin2DRepresentation . . . . . . . . . . . . . . . . 11 3.2 Multi-taskNetworkModeling . . . . . . . . . . . . . . . . . . . . 13 3.3 LayoutStructureDegenerationStrategy . . . . . . . . . . . . . . . 15 3.4 Layout-specificObjectiveCriterion . . . . . . . . . . . . . . . . . 18 3.4.1 VanillaLossofSemanticSegmentation . . . . . . . . . . . 18 3.4.2 SmoothnessTerm . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.3 LossofCornerandEdgeDetection . . . . . . . . . . . . . 20 3.4.4 Overall . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 DeepRoom3D: Layoutbeyondpixels 21 4.1 CuboidModelParameterization . . . . . . . . . . . . . . . . . . . 22 4.2 RegressionForwardingNetwork . . . . . . . . . . . . . . . . . . . 23 4.3 TransferLearningNetwork . . . . . . . . . . . . . . . . . . . . . . 25 5 ExperimentalResults 27 5.1 QuantitativeResults . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.1.1 Pixel-wiseAccuracyofLayoutEstimation . . . . . . . . . 28 5.1.2 AccuracyofCornerDetection . . . . . . . . . . . . . . . . 29 5.2 QualitativeResults . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.3 GeneralizationAbility . . . . . . . . . . . . . . . . . . . . . . . . 31 5.4 TimeEfficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.5 DemoSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6 Conclusions 43 References 44

    [1] Arth, C., Reitmayr, G., and Schmalstieg, D. Full 6dof pose estimation from geo-locatedimages.InAsianConferenceonComputerVision(2012),Springer, pp.705–717.
    [2] Biederman, I. Recognition-by-components: a theory of human image understanding. Psychologicalreview94,2(1987),115.
    [3] Coughlan,J.,andYuille,A. ManhattanWorld: compassdirectionfromasingle image by Bayesian inference. In Proceedings of the Seventh IEEE InternationalConferenceonComputerVision(1999).
    [4] Dasgupta,S.,Fang,K.,Chen,K.,andSavarese,S. Delay: Robustspatiallayout estimation for cluttered indoor scenes. In Proceedings of the IEEE ConferenceonComputerVisionandPatternRecognition(2016),pp.616–624.
    [5] Gallego, G., Lund, J. E., Mueggler, E., Rebecq, H., Delbruck, T., and Scaramuzza,D. Event-based,6-dofcameratrackingfromphotometricdepthmaps. IEEETransactionsonPatternAnalysisandMachineIntelligence(2017).
    [6] Garg, R., BG, V. K., Carneiro, G., and Reid, I. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In European Conference on ComputerVision(2016),Springer,pp.740–756.
    [7] Godard,C.,MacAodha,O.,andBrostow,G.J.Unsupervisedmonoculardepth estimationwithleft-rightconsistency. InCVPR(2017),vol.2,p.7.
    [8] Gupta, A., Hebert, M., Kanade, T., and Blei, D. M. Estimating spatial layout ofroomsusingvolumetricreasoningaboutobjectsandsurfaces. InAdvances inneuralinformationprocessingsystems(2010),pp.1288–1296.
    [9] Hedau, V., Hoiem, D., and Forsyth, D. Recovering the spatial layout of clutteredrooms. InComputervision,2009IEEE12thinternationalconferenceon (2009),IEEE,pp.1849–1856.
    [10] Hoiem, D., Efros, A. A., and Hebert, M. Geometric context from a single image. InComputerVision,2005.ICCV2005.TenthIEEEInternationalConferenceon(2005),vol.1,IEEE,pp.654–661.
    [11] Hoiem, D., Efros, A. A., and Hebert, M. Recovering surface layout from an image. InternationalJournalofComputerVision75,1(2007),151–172.
    [12] Hoiem, D., Efros, A. A., and Kanade, T. Seeing the world behind the image: Spatiallayoutfor3dsceneunderstanding.
    [13] Kendall,A.,andCipolla,R.Modellinguncertaintyindeeplearningforcamera relocalization. InRoboticsandAutomation(ICRA),2016IEEEInternational Conferenceon(2016),IEEE,pp.4762–4769.
    [14] Kendall,A.,andCipolla,R. Geometriclossfunctionsforcameraposeregressionwithdeeplearning. InProc.CVPR(2017),vol.3,p.8.
    [15] Kendall, A., Grimes, M., and Cipolla, R. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Computer Vision (ICCV), 2015 IEEEInternationalConferenceon(2015),IEEE,pp.2938–2946.
    [16] Kim,H.,Leutenegger,S.,andDavison,A.J. Real-time3dreconstructionand 6-dof tracking with an event camera. In European Conference on Computer Vision(2016),Springer,pp.349–364.
    [17] Lee,C.-Y.,Badrinarayanan,V.,Malisiewicz,T.,andRabinovich,A.Roomnet: End-to-end room layout estimation. In Computer Vision (ICCV), 2017 IEEE InternationalConferenceon(2017),IEEE,pp.4875–4884.
    [18] Lee,D.C.,Hebert,M.,andKanade,T. Geometricreasoningforsingleimage structurerecovery. InComputerVisionandPatternRecognition,2009.CVPR 2009.IEEEConferenceon(2009),IEEE,pp.2136–2143.
    [19] Lin, H. J., Huang, S.-W., Lai, S.-H., and Chiang, C.-K. Indoor scene layout estimation from a single image. In 2018 24th International Conference on PatternRecognition(ICPR)(2018).
    [20] Lin, H. J., and Lai, S.-H. Deeproom: Fit the room with a cuboid model via deepnetworks. submitted,2018.
    [21] Long, J., Shelhamer, E., and Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer VisionandPatternRecognition(2015),pp.3431–3440.
    [22] Mallya,A.,andLazebnik,S.Learninginformativeedgemapsforindoorscene layout prediction. In Proceedings of the IEEE International Conference on ComputerVision(2015),pp.936–944.
    [23] Nowozin,S.,Lampert,C.H.,etal. Structuredlearningandpredictionincomputer vision. Foundations and Trends® in Computer Graphics and Vision 6, 3–4(2011),185–365.
    [24] Ren, Y., Li, S., Chen, C., and Kuo, C.-C. J. A coarse-to-fine indoor layout estimation (cfile) method. In Asian Conference on Computer Vision (2016), Springer,pp.36–51.
    [25] Roberts, L. G. Machine perception of three-dimensional solids. PhD thesis, MassachusettsInstituteofTechnology,1963.
    [26] Saxena, A., Chung, S. H., and Ng, A. Y. Learning depth from single monocular images. In Advances in neural information processing systems (2006), pp.1161–1168.
    [27] Schwing,A.G.,andUrtasun,R. Efficientexactinferencefor3dindoorscene understanding. InEuropeanConferenceonComputerVision(2012),Springer, pp.299–313.
    [28] University, P. Lsun room layout estimation dataset., 2015. http://lsun.cs. princeton.edu/,accessedon2017-11-30.
    [29] Waltz,D. Understandinglinedrawingsofsceneswithshadows.”thepsychologyofcomputervision.patrickhenrywinston,ed,1975.
    [30] Wetzel,J. Imagebased6-dofcameraposeestimationwithweightedransac3d. InGermanConferenceonPatternRecognition(2013),Springer,pp.249–254.
    [31] Zhang, W., Zhang, W., Liu, K., and Gu, J. Learning to predict high-quality edgemapsforroomlayoutestimation. IEEETransactionsonMultimedia19, 5(2017),935–943.
    [32] Zhao, H., Lu, M., Yao,A., Guo, Y., Chen, Y.,andZhang, L. Physicsinspired optimization on semantic transfer features: An alternative method for room layoutestimation. InProceedingsoftheIEEEConferenceonComputerVision andPatternRecognition(2017).
    [33] Zhou,T.,Brown,M.,Snavely,N.,andLowe,D.G. Unsupervisedlearningof depthandego-motionfromvideo. InCVPR(2017),vol.2,p.7.
    [34] Zou,C.,Colburn,A.,Shan,Q.,andHoiem,D. Layoutnet: Reconstructingthe 3d room layout from a single rgb image. InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(2018),pp.2051–2059.

    QR CODE