簡易檢索 / 詳目顯示

研究生: 李國緯
Lee, Kuo-Wei
論文名稱: 輪廓網路: 從輪廓預測三維度手勢座標
Silhouette-Net: 3D Hand Pose Estimation from Silhouettes
指導教授: 陳煥宗
Chen, Hwann-Tzong
口試委員: 許秋婷
Hsu, Chiou-Ting
林彥宇
Lin, Yen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2020
畢業學年度: 109
語文別: 英文
論文頁數: 34
中文關鍵詞: 三圍度預測手部姿勢手勢輪廓網路深度學習電腦視覺
外文關鍵詞: 3D_Hand_Pose, Silhouette_Net, Depth_Perception
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 摘 要
    3D 手部姿勢估計由於其廣泛的應用而受到廣泛關注,並且隨著深度學習的發展而取得了長足的進步。現有的方法主要考慮不同的輸入方式和設置來提供足夠的資訊以解決遮蔽或是多視角所造成的差異,例如單眼 RGB,多視圖RGB,深度,或是點雲。相比之下,這項工作旨在只用目前比較少關注的方法 - 使用最少的信息來估計 3D 手勢。我們提出了一種新架構, 從隱式深度感知中獲得指導,並通過端到端訓練解決了手勢的歧義。實驗結果表明,僅使用手剪影即可準確估算 3D 手勢,而無需使用深度圖作為輸入。HIM2017 基準數據集進一步證明,我們的方法可以達到可比甚至更好的性能與最近的基於深度的方法相比,它在根據輪廓估計 3D 手部姿勢方面可算是最先進的技術。


    Abstract
    3D hand pose estimation has received a lot of attention for its wide range of applications and has made great progress owing to the development of deep learning. Existing approaches mainly consider different input modalities and settings, such as monocular RGB, multi-view RGB, depth, or point cloud, to provide sufficient cues for resolving variations caused by self occlusion and viewpoint change. In contrast, this work aims to address the less-explored idea of using minimal information to estimate 3D hand poses. We present a new architecture that automatically learns a guidance from implicit depth per-ception and solves the ambiguity of hand pose through end-to-end training. The experimental results show that 3D hand poses can be accurately estimated from solely hand silhouettes without using depth maps. Extensive evaluations on the 2017 Hands In the Million Challenge (HIM2017) benchmark dataset further demonstrate that our method achieves comparable or even better per-formance than recent depth-based approaches and serves as the state-of-the-art of its own kind on estimating 3D hand poses from silhouettes.

    List of Tables --------5 List of Figures--------6 摘 要 -----------------8 Abstract --------------9 1 Introduction -------10 2 Related work -------14 3 Proposed Framework--17 4 Experiments --------23 5 Conclusion ---------30 Bibliography ---------31

    [1] S. Baek, K. In Kim, and T.-K. Kim. Augmented skeleton space transfer for depth-based hand pose estimation. In CVPR, pages 8330–8339, 2018.
    [2] Y. Cai, L. Ge, J. Cai, and J. Yuan. Weakly-supervised 3d hand pose estimation from monocular rgb images. In ECCV, pages 666–682, 2018.
    [3] X. Chen, G. Wang, H. Guo, and C. Zhang. Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416, 2017.
    [4] C. Choi, A. Sinha, J. Hee Choi, S. Jang, and K. Ramani. A collaborative filtering approach to real-time hand pose estimation. In ICCV, pages 2336–2344, 2015.
    [5] M. de La Gorce, D. J. Fleet, and N. Paragios. Model-based 3d hand pose estimation from monocular video. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 33(9):1793–1805, 2011.
    [6] X. Deng, S. Yang, Y. Zhang, P. Tan, L. Chang, and H. Wang. Hand3d: Hand pose estimation using 3d neural network. arXiv preprint arXiv:1704.02224, 2017.
    [7] E. Dibra, H. Jain, C. Oztireli, R. Ziegler, and M. Gross. Human shape from silhouettes using generative hks descriptors and cross-modal neural networks. In CVPR, pages 4826–4836, 2017.
    [8] G. Garcia-Hernando, S. Yuan, S. Baek, and T.-K. Kim. First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In CVPR, pages 409–419, 2018.
    [9] L. Ge, Y. Cai, J. Weng, and J. Yuan. Hand pointnet: 3d hand pose estimation using point sets. In CVPR, pages 8417–8426, 2018.
    [10] L. Ge, H. Liang, J. Yuan, and D. Thalmann. Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In CVPR, pages 3593–3601, 2016.
    [11] L. Ge, H. Liang, J. Yuan, and D. Thalmann. 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In CVPR, pages 1991–2000, 2017.
    [12] H. Guo, G. Wang, X. Chen, and C. Zhang. Towards good practices for deep 3d hand pose estimation. arXiv preprint arXiv:1707.07248, 2017.
    [13] S. Honari, J. Yosinski, P. Vincent, and C. Pal. Recombinator networks: Learning coarse-to-fine feature aggregation. In CVPR, pages 5743–5752, 2016.
    [14] C. Keskin, F. Kıraç, Y. E. Kara, and L. Akarun. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In ECCV, pages 852–
    863. Springer, 2012.
    [15] T. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie. Feature pyramid networks for object detection. In CVPR, pages 936–944, 2017.
    [16] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature pyramid networks for object detection. In CVPR, pages 2117–2125, 2017.
    [17] M. Madadi, S. Escalera, X. Bar’o, and J. Gonzalez. End-to-end global to local cnn learning for hand pose recovery in depth data. arXiv preprint arXiv:1705.09606, 2017.
    [18] G. Moon, J. Yong Chang, and K. Mu Lee. V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map. In CVPR, pages 5079–5088, 2018.
    [19] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose esti-mation. In ECCV, pages 483–499. Springer, 2016.
    [20] M. Oberweger and V. Lepetit. Deepprior++: Improving fast and accurate 3d hand pose estimation. In ICCV, pages 585–594, 2017.
    [21] M. Oberweger, P. Wohlhart, and V. Lepetit. Hands deep in deep learning for hand pose estimation. arXiv preprint arXiv:1502.06807, 2015.
    [22] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomed-ical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
    [23] A. Spurr, J. Song, S. Park, and O. Hilliges. Cross-modal deep variational hand pose estimation. In CVPR, pages 89–98, 2018.
    [24] S. Sridhar, F. Mueller, M. Zollhöfer, D. Casas, A. Oulasvirta, and C. Theobalt. Real-time joint tracking of a hand manipulating an object from rgb-d input. In ECCV, pages 294–310. Springer, 2016.
    [25] S. Sridhar, A. Oulasvirta, and C. Theobalt. Interactive markerless articulated hand motion tracking using rgb and depth data. In ICCV, pages 2456–2463, 2013.
    [26] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
    [27] A. Tkach, M. Pauly, and A. Tagliasacchi. Sphere-meshes for real-time hand modeling and tracking. ACM Transactions on Graphics (TOG), 35(6):222, 2016.
    [28] D. Tome, C. Russell, and L. Agapito. Lifting from the deep: Convolutional 3d pose estimation from a single image. In CVPR, pages 2500–2509, 2017.
    [29] C. Wan, T. Probst, L. Van Gool, and A. Yao. Dense 3d regression for hand pose estimation. In CVPR, pages 5147–5156, 2018.
    [30] X. Wu, D. Finnegan, E. O’Neill, and Y.-L. Yang. Handmap: Robust hand pose esti-mation via intermediate dense guidance map supervision. In ECCV, September 2018.
    [31] Q. Ye, S. Yuan, and T.-K. Kim. Spatial attention deep net with partial pso for hierar-chical hybrid hand pose estimation. In ECCV, pages 346–361. Springer, 2016.
    [32] S. Yuan, G. Garcia-Hernando, B. Stenger, G. Moon, J. Yong Chang, K. Mu Lee,
    P. Molchanov, J. Kautz, S. Honari, L. Ge, et al. Depth-based 3d hand pose estimation:
    [33] S. Yuan, Q. Ye, G. Garcia-Hernando, and T.-K. Kim. The 2017 hands in the million challenge on 3d hand pose estimation. arXiv preprint arXiv:1707.02237, 2017.
    [34] S. Yuan, Q. Ye, B. Stenger, S. Jain, and T.-K. Kim. Bighand2. 2m benchmark: Hand pose dataset and state of the art analysis. In CVPR, pages 4866–4874, 2017.
    [35] X. Zhou, Q. Wan, W. Zhang, X. Xue, and Y. Wei. Model-based deep hand pose estimation. arXiv preprint arXiv:1606.06854, 2016.
    [36] C. Zimmermann and T. Brox. Learning to estimate 3d hand pose from single rgb images. In ICCV, pages 4903–4911, 2017.

    QR CODE