簡易檢索 / 詳目顯示

研究生: 姚詩軒
Yao, Shih-Hsuan
論文名稱: 基於多個變形邊界框的人體動作模型重建
SegmentedFusion: Reconstruction of Human Motion Using Multi-deformed Bounding-boxes
指導教授: 賴尚宏
Lai, Shang-Hong
口試委員: 陳煥宗
Chen, Hwann-Tzong
朱宏國
Chu, Hung-Kuo
黃思皓
Huang, Szu-Hao
學位類別: 碩士
Master
系所名稱:
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 45
中文關鍵詞: 重建非剛體運動人體動作模型深度單目相機
外文關鍵詞: reconstruction, non-rigid motion, human motion model, single depth camera
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們提出一個系統——SegmentedFusion,該系統通過使用具有骨架資訊與單目深度相機來重建非剛體運動的人體模型。我們的方法將身體分割成不同的部分並為每個部分建立標準座標空間,通過估測一個稠密空間的六自由度動作場,將累積的各個模型變形到即時影格中。這項研究的關鍵點是為每個身體部分建立一個變形的連接標準邊界框,並將其用作一個空間來累積深度資訊。透過使用兩組剛體變換參數來有效地表示一個空間的稠密空間動作場。總體而言,SegmentedFusion是一種能夠通過使用消費級深度相機高效率地掃描非剛體變形人體表面以及估算稠密動作場的系統。
    實驗結果表明,我們的系統在快速運動和拓撲變化有穩健的結果。由於我們的方法不需要事先假設,因此SegmentedFusion可以廣泛應用於現實的人體運動。


    In this paper, we present SegmentedFusion, a system which has the capability of reconstructing non-rigid motion of a human model by using a single depth camera with skeleton information. Our approach estimates a dense volumetric 6D motion field that warps the integrated model into the live frame by segmenting body into different parts and building a canonical space for each part. The key feature of this work is that a deformed and connected canonical bounding-box for each part is created, and is used as a volume to integrate data. The dense volumetric warp field of one volume is represented efficiently by using two sets of rigid transformation parameters. Overall, SegmentedFusion is a system which is able to memory-efficiently scan non-rigidly deformed human surface as well as estimating dense motion field by using a consumer-grade depth camera.
    The experimental results demonstrate that our system is robust against fast inter-frame motion and topology changes. Since our method does not require prior assumption, the SegmentedFusion can be applied to real data which contains a wide range of human motion.

    1 Introduction 1 1.1 Motivation 1 1.2 Problem Statement 3 1.3 Contributions 4 1.4 Thesis Organization 5 2 Related Work 6 2.1 reconstruction of non-rigid scenes 6 2.2 reconstruction of rigid scenes 7 3 Overview 11 3.1 System Pipeline 11 3.2 Notation and Preliminaries 13 4 Proposed System 16 4.1 Body Part Segmentation 16 4.2 Canonical Bounding-box 20 4.3 Warp Field 22 4.3.1 Bone motion 23 4.3.2 Joint deformation 24 4.3.3 Refinement 27 4.4 Fusion 30 5 Experimental Results 32 5.1 Qualitative results 32 5.2 Limitations and Future work 37 6 Conclusions 40 References 43

    [1] Besl, P. J., and McKay, N. D. Method for registration of 3-d shapes. In SensorFusion IV: Control Paradigms and Data Structures (1992), vol. 1611, InternationalSociety for Optics and Photonics, pp. 586–607.
    [2] Buys, K., Cagniart, C., Baksheev, A., De Laet, T., De Schutter, J., and Pantofaru,C. An adaptable system for rgb-d based human body detection and poseestimation. Journal of visual communication and image representation 25, 1(2014), 39–52.
    [3] Cao, Z., Simon, T., Wei, S.-E., and Sheikh, Y. Realtime multi-person 2d poseestimation using part affinity fields. In CVPR (2017), vol. 1, p. 7.
    [4] Cignoni, P., Callieri, M., Corsini, M., Dellepiane, M., Ganovelli, F., andRanzuglia, G. MeshLab: an Open-Source Mesh Processing Tool. In EurographicsItalian Chapter Conference (2008), V. Scarano, R. D. Chiara, andU. Erra, Eds., The Eurographics Association.
    [5] Cignoni, P., Rocchini, C., and Scopigno, R. Metro: measuring error on simplifiedsurfaces. In Computer Graphics Forum (1998), vol. 17, Blackwell Publishers,pp. 167–174.
    [6] Cover, T., and Hart, P. Nearest neighbor pattern classification. IEEE transactionson information theory 13, 1 (1967), 21–27.
    [7] Curless, B., and Levoy, M. A volumetric method for building complex modelsfrom range images. In Proceedings of the 23rd annual conference on Computergraphics and interactive techniques (1996), ACM, pp. 303–312.
    [8] Dou, M., Khamis, S., Degtyarev, Y., Davidson, P., Fanello, S. R., Kowdle, A.,Escolano, S. O., Rhemann, C., Kim, D., Taylor, J., et al. Fusion4d: Real-timeperformance capture of challenging scenes. ACM Transactions on Graphics(TOG) 35, 4 (2016), 114.
    [9] Engel, J., Schöps, T., and Cremers, D. Lsd-slam: Large-scale direct monocularslam. In European Conference on Computer Vision (2014), Springer, pp. 834–849.
    [10] Henry, P., Fox, D., Bhowmik, A., and Mongia, R. Patch volumes:Segmentation-based consistent mapping with rgb-d cameras. In 3D Vision-3DV 2013, 2013 International Conference on (2013), IEEE, pp. 398–405.
    [11] Innmann, M., Zollhöfer, M., Nießner, M., Theobalt, C., and Stamminger, M.Volumedeform: Real-time volumetric non-rigid reconstruction. In EuropeanConference on Computer Vision (2016), Springer, pp. 362–379.
    [12] Kavan, L., Collins, S., Žára, J., and O’Sullivan, C. Geometric skinning withapproximate dual quaternion blending. ACM Transactions on Graphics (TOG)27, 4 (2008), 105.
    [13] Keller, M., Lefloch, D., Lambers, M., Izadi, S., Weyrich, T., and Kolb, A.Real-time 3d reconstruction in dynamic scenes using point-based fusion. In3D Vision-3DV 2013, 2013 International Conference on (2013), IEEE, pp. 1–8.
    [14] Lorensen, W. E., and Cline, H. E. Marching cubes: A high resolution 3dsurface construction algorithm. In ACM siggraph computer graphics (1987),vol. 21, ACM, pp. 163–169.
    [15] Mur-Artal, R., Montiel, J. M. M., and Tardos, J. D. Orb-slam: a versatile andaccurate monocular slam system. IEEE Transactions on Robotics 31, 5 (2015),1147–1163.
    [16] Mur-Artal, R., and Tardós, J. D. Orb-slam2: An open-source slam system formonocular, stereo, and rgb-d cameras. IEEE Transactions on Robotics 33, 5(2017), 1255–1262.
    [17] Newcombe, R. A., Fox, D., and Seitz, S. M. Dynamicfusion: Reconstructionand tracking of non-rigid scenes in real-time. In Proceedings of the IEEEconference on computer vision and pattern recognition (2015), pp. 343–352.
    [18] Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison,A. J., Kohi, P., Shotton, J., Hodges, S., and Fitzgibbon, A. Kinectfusion: Realtimedense surface mapping and tracking. In Mixed and augmented reality(ISMAR), 2011 10th IEEE international symposium on (2011), IEEE, pp. 127–136.
    [19] Pire, T., Fischer, T., Civera, J., De Cristóforis, P., and Berlles, J. J. Stereoparallel tracking and mapping for robot localization. In Intelligent Robots andSystems (IROS), 2015 IEEE/RSJ International Conference on (2015), IEEE,pp. 1373–1378.
    [20] Rünz, M., and Agapito, L. Co-fusion: Real-time segmentation, tracking andfusion of multiple objects. In Robotics and Automation (ICRA), 2017 IEEEInternational Conference on (2017), IEEE, pp. 4471–4478.
    [21] Sigal, L., Isard, M., Haussecker, H., and Black, M. J. Loose-limbed people:Estimating 3d human pose and motion using non-parametric belief propagation.International journal of computer vision 98, 1 (2012), 15–48.
    [22] Slavcheva, M., Baust, M., Cremers, D., and Ilic, S. Killingfusion: Non-rigid3d reconstruction without correspondences. In IEEE Conference on ComputerVision and Pattern Recognition (CVPR) (2017), vol. 3, p. 7.
    [23] Tomasi, C., and Manduchi, R. Bilateral filtering for gray and color images.In Computer Vision, 1998. Sixth International Conference on (1998), IEEE,pp. 839–846.
    [24] Vlasic, D., Baran, I., Matusik, W., and Popović, J. Articulated mesh animationfrom multi-view silhouettes. In ACM Transactions on Graphics (TOG) (2008),vol. 27, ACM, p. 97.
    [25] Wei, X., Zhang, P., and Chai, J. Accurate realtime full-body motion captureusing a single depth camera. ACM Transactions on Graphics (TOG) 31, 6(2012), 188.
    [26] Whelan, T., Kaess, M., Fallon, M., Johannsson, H., Leonard, J., and McDonald,J. Kintinuous: Spatially extended kinectfusion.
    [27] Wold, S., Esbensen, K., and Geladi, P. Principal component analysis. Chemometricsand intelligent laboratory systems 2, 1-3 (1987), 37–52.
    [28] Yu, T., Guo, K., Xu, F., Dong, Y., Su, Z., Zhao, J., Li, J., Dai, Q., and Liu, Y.Bodyfusion: Real-time capture of human motion and surface geometry usinga single depth camera.
    [29] Zollhöfer, M., Nießner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., Wu,C., Fitzgibbon, A., Loop, C., Theobalt, C., et al. Real-time non-rigid reconstructionusing an rgb-d camera. ACM Transactions on Graphics (TOG) 33, 4(2014), 156.
    [30] Zuffi, S., and Black, M. J. The stitched puppet: A graphical model of 3d humanshape and pose. In Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (2015), pp. 3537–3546.

    QR CODE