簡易檢索 / 詳目顯示

研究生: 賴奕融
Lai, Yi-Jung
論文名稱: 手部遮蔽姿態評估之模型分析
An Evaluation of Models for Occluded Hand Pose Estimation
指導教授: 金仲達
King, Chung-Ta
口試委員: 王家祥
Wang, Jia-Shung
胡敏君
Hu, Min-Chun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 35
中文關鍵詞: 人體姿態估計手部姿態現實差距遮蔽
外文關鍵詞: Human pose estimation, Hand pose, Reality gap, Occlusion
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 使用彩色影像進行人體姿態估計為目前熱門研究主體,因為影像是最容易取得的資料型態。有許多團隊透過深度學習來進行這方面的研究。但是由於缺少相關訓練資料,目前在場景出現遮蔽,尤其是在手抓握物體時,並不能產生準確的判斷。在此論文中,我們於虛擬環境內蒐集了手部與手臂的姿態資料。我們對一些文獻提出的模型使用我們蒐集的資料進行訓練與評估,並回報在不同參數與架構下的預測準確率。我們評估了使用遮蔽影像進行訓練的效果。我們嘗試利用一些技巧來跨越現實差距。基於二維姿態的估計結果,我們將問題從二維延伸至三維。最後我們回報了在現實場景測試之結果。


    Human pose estimation using only RGB images is a popular research topic since the data type is the easiest to acquire. Many researches have used deep learning to perform this task. However, for occlusion scene, especially when hands are holding objects, there is still room for improvement. One of the reasons is the lack of datasets that contains grasping images, making supervised learning harder to deal with this kind of condition. In this work we collect our dataset contains hand and arm pose in virtual environment. We evaluated baseline approach by studying different structures and parameters, to compare the performance of these models. The effect of training with occlusion data is evaluated. We tried to bridge the reality gap by several strategies. The pose estimation task was extended into 3D based on our 2D detection. Finally, we reported the qualitative result in real-world scene.

    中文摘要 I Abstract II Table of Contents III List of Figures V List of Tables VI Chapter 1 Introduction 1 Chapter 2 Background 3 2.1 3D Pose Estimation 3 2.1.1 RGB Image Based 2D Human Pose Estimation 3 2.1.2 Heatmap Design 4 2.1.3 Lifting 2D Keypoints into 3D 4 2.1.4 RGB Image Based Hand Pose Estimation 5 2.2 Public Hand Dataset 6 Chapter 3 Evaluation Methodology 7 3.1 2D Pose Estimation 8 3.1.1 Baseline Structure 8 3.1.2 Evaluation Details 8 3.1.3 Comparison with Non-occlusion Data 9 3.1.4 Reality Gap 9 3.2 3D Pose Estimation 10 3.2.1 Linear Model 10 3.2.2 Convolutional Model 10 3.2.3 Hand Branch Ensemble 11 3.2.4 Evaluation Details 12 Chapter 4 Experiments 13 4.1 Data Collection 13 4.1.1 Training Dataset 13 4.1.2 Testing Dataset 14 4.2 Training Details 16 4.2.1 2D Model 16 4.2.1.1 Baseline Structure 16 4.2.2 3D Model 17 4.2.2.1 Linear Model 17 4.2.2.2 Convolutional Model 17 4.2.2.3 Hand Branch Ensemble 18 4.3 Evaluation Results 19 4.3.1 Evaluation Metrics 19 4.3.2 2D Model 19 4.3.3 Comparison with Non-occlusion Data 22 4.3.4 Reality Gap 24 4.3.5 3D Model 27 Chapter 5 Conclusion 29 Chapter 6 Future Works 30 References 31

    A. Toshev, C. Szegedy. DeepPose: human pose estimation via deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014
    [2] J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional network and a graphical model for human pose estimation. In Conference on Neural Information Processing Systems (NIPS), 2014.
    [3] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision (ECCV), 2016.
    [4] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask R-CNN. In IEEE International Conference on Computer Vision (ICCV), 2017.
    [5] S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. Convolutional pose machines. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [6] F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, S. Sridhar, D. Casas, and C. Theobalt. GANerated hands for real-time 3D hand tracking from monocular RGB. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [7] G. Garcia-Hernando, S. Yuan, S. Baek, T. Kim. First-Person hand action benchmark with RGB-D videos and 3D hand pose annotations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [8] C. Zimmermann and T. Brox. Learning to estimate 3D hand pose from single RGB images. In IEEE International Conference on Computer Vision (ICCV), 2017.
    [9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [10] F. Mueller, D. Mehta, O. Sotnychenko, S. Sridhar, D. Casas, and C. Theobalt. Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In IEEE International Conference on Computer Vision (ICCV), 2017.
    [11] J. Zhang, J. Jiao, M. Chen, L. Qu, X. Xu, Q. Yang. A hand pose tracking benchmark from stereo matching. In IEEE International Conference on Image Processing (ICIP), 2017
    [12] T. Simon, H. Joo, I. Matthews, and Y. Sheikh. Hand keypoint detection in single images using multiview bootstrapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [13] G. Moon, J. Chang, and K. Lee. V2V-PoseNet: Voxel-to-Voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [14] L. Ge, H. Liang, J. Yuan, and D. Thalmann. Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [15] X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei. Integral human pose regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [16] J. Deng, W. Dong, R. Socher, L. Li, K. Li and F. Li. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
    [17] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu. Human 3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. In Proceeding of TPAMI, 2014.
    [18] J. Martinez, R. Hossain, J. Romero, and J. J. Little. A simple yet effective baseline for 3d human pose estimation. In IEEE International Conference on Computer Vision (ICCV), 2017.
    [19] Unreal Engine 4. [Online]. Available: https://www.unrealengine.com
    [20] P. Martinez-Gonzalez, S. Oprea, A. Garcia-Garcia, A. Jover-Alvarez, S. Orts-Escolano, J. Garcia-Rodriguez. UnrealROX: An eXtremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. arXiv preprint arXiv:1810.06936
    [21] B. Xiao, H. Wu, and Y. Wei. Simple baselines for human pose estimation and tracking. In European Conference on Computer Vision (ECCV), 2018.
    [22] Y. Zhou, J. Lu, K. Du, X. Lin, Y. Sun, and X. Ma. HBE: Hand branch ensemble network for real-time 3d hand pose estimation. In European Conference on Computer Vision (ECCV), 2018.
    [23] Keras. [Online]. Available: https://www.tensorflow.org/guide/keras
    [24] K. He, X. Zhang, S. Ren, and J. Sun. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv preprint arXiv:1502.01852v1
    [25] Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. Cascaded pyramid network for multi-person pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [26] D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli. 3D human pose estimation in video with temporal convolutions and semi-supervised training. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [27] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. Gehler, and B. Schiele. DeepCut: Joint subset partition and labeling for multi person pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [28] Y. Cai, L. Ge, J. Cai, and Junsong Yuan. Weakly-supervised 3d hand pose estimation from monocular RGB Images. In European Conference on Computer Vision (ECCV), 2018.
    [29] X. Zhou, X. Sun, W. Zhang, S. Liang, and Y. Wei. Deep kinematic pose regression. In European Conference on Computer Vision (ECCV) Workshop, 2016.
    [30] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. Zitnick. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV), 2014
    [31] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [32] Procrustes analysis. [Online]. Available: https://en.wikipedia.org/wiki/Procrustes_analysis
    [33] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE International Conference on Intelligent Robots and Systems (IROS), 2017
    [34] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems (NIPS), 2014.
    [35] H. Fang, G. Lu, X. Fang, J. Xie, Y. Tai, and C. Lu. Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

    QR CODE