簡易檢索 / 詳目顯示

研究生: 周佳蓉
Chou, Chia-Jung
論文名稱: 用於人體姿勢估測之對抗式訓練方法
Self Adversarial Training for Human Pose Estimation
指導教授: 陳煥宗
Chen, Hwann-Tzong
口試委員: 賴尚宏
Lai, Shang-Hong
劉庭祿
Liu, Tyng-Luh
學位類別: 碩士
Master
系所名稱:
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 36
中文關鍵詞: 人體姿勢估測對抗式生成網路全卷積類神經網路
外文關鍵詞: Human Pose Estimation, Generative Adversarial Network, Fully Convolutional Neural Network
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文提出一個基於深度學習,用於人體姿勢估測的方法。我們採用生成對抗網
    路作為我們學習模式,其中我們建立了兩個堆疊式沙漏型網路,一個作為生成網
    路,一個作為鑑別網路。訓練完成後,生成網路會直接當作人體姿勢估測使用。鑑
    別網路用來區分標準答案的熱圖和生成的熱圖,以此計算出的對抗損失反向回饋至
    生成網路。此過程能使生成網路學習人體姿勢合理性,且從實驗結果發現,這樣的
    訓練有助於提升預測的準確度。


    This thesis presents a deep learning based approach to the problem of human pose
    estimation. We employ generative adversarial networks as our learning paradigm in
    which we set up two stacked hourglass networks with the same architectures, one as
    the generator and the other as the discriminator. The generator is used as a human
    pose estimator after the training is done. The discriminator distinguishes groundtruth
    heatmaps from generated ones, and back-propagates the adversarial loss to the
    generator. This process enables the generator to learn the plausible human body
    con gurations and is shown to be useful for improving the prediction accuracy.

    1 Introduction 8 2 Related Work 10 2.1 Human Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . 11 3 Adversarial Training with the Stacked Hourglass Networks 13 3.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Training the Generator . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Training the Discriminator . . . . . . . . . . . . . . . . . . . . 17 3.3 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Experiments 21 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.1 LSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3.2 MPII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.3 LIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5 Conclusion 32

    [1] M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele. 2d human pose
    estimation: New benchmark and state of the art analysis. In CVPR, 2014.
    [2] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People
    detection and articulated pose estimation. In CVPR, 2009.
    [3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. CoRR,
    abs/1701.07875, 2017.
    [4] V. Belagiannis and A. Zisserman. Recurrent human pose estimation. CoRR,
    abs/1605.02914, 2016.
    [5] D. Berthelot, T. Schumm, and L. Metz. BEGAN: boundary equilibrium generative
    adversarial networks. CoRR, abs/1703.10717, 2017.
    [6] A. Bulat and G. Tzimiropoulos. Human pose estimation via convolutional part
    heatmap regression. In ECCV, 2016.
    [7] Z. Cao, T. Simon, S. Wei, and Y. Sheikh. Realtime multi-person 2d pose estimation
    using part anity elds. CoRR, abs/1611.08050, 2016.
    [8] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik. Human pose estimation
    with iterative error feedback. CoRR, abs/1507.06550, 2015.
    [9] X. Chen and A. L. Yuille. Articulated pose estimation by a graphical model with
    image dependent pairwise relations. In NIPS, 2014.
    [10] Y. Chen, C. Shen, X.Wei, L. Liu, and J. Yang. Adversarial posenet: A structureaware
    convolutional network for human pose estimation. CoRR, abs/1705.00389,
    2017.
    [11] X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, and X. Wang. Multi-context
    attention for human pose estimation. CoRR, abs/1702.07432, 2017.
    [12] M. Dantone, J. Gall, C. Leistner, and L. J. V. Gool. Human pose estimation
    using body parts dependent joint regressors. In CVPR, 2013.
    [13] P. F. Felzenszwalb, D. A. McAllester, and D. Ramanan. A discriminatively
    trained, multiscale, deformable part model. In CVPR, 2008.
    [14] G. Gkioxari, A. Toshev, and N. Jaitly. Chained predictions using convolutional
    neural networks. In ECCV, 2016.
    [15] K. Gong, X. Liang, X. Shen, and L. Lin. Look into person: Self-supervised
    structure-sensitive learning and A new benchmark for human parsing. CoRR,
    abs/1703.05446, 2017.
    [16] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
    A. C. Courville, and Y. Bengio. Generative adversarial networks. CoRR,
    abs/1406.2661, 2014.
    [17] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved
    training of wasserstein gans. CoRR, abs/1704.00028, 2017.
    [18] P. Hu and D. Ramanan. Bottom-up and top-down reasoning with hierarchical
    recti ed gaussians. In CVPR, 2016.
    [19] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut:
    A deeper, stronger, and faster multi-person pose estimation model. CoRR,
    abs/1605.03170, 2016.
    [20] P. Isola, J. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with
    conditional adversarial networks. CoRR, abs/1611.07004, 2016.
    [21] S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models
    for human pose estimation. In BMVC, 2010.
    [22] S. Johnson and M. Everingham. Learning e ective human pose estimation from
    inaccurate annotation. In CVPR, 2011.
    [23] L. Ladicky, P. H. S. Torr, and A. Zisserman. Human pose estimation using a
    joint pixel-wise and part-wise formulation. In CVPR, 2013.
    [24] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz,
    Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a
    generative adversarial network. CoRR, abs/1609.04802, 2016.
    [25] I. Lifshitz, E. Fetaya, and S. Ullman. Human pose estimation using deep consensus
    voting. CoRR, abs/1603.08212, 2016.
    [26] P. Luc, C. Couprie, S. Chintala, and J. Verbeek. Semantic segmentation using
    adversarial networks. CoRR, abs/1611.08408, 2016.
    [27] M. Mirza and S. Osindero. Conditional generative adversarial nets. CoRR,
    abs/1411.1784, 2014.
    [28] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose
    estimation. CoRR, abs/1603.06937, 2016.
    [29] J. Pan, C. Canton-Ferrer, K. McGuinness, N. E. O'Connor, J. Torres, E. Sayrol,
    and X. Giro i Nieto. Salgan: Visual saliency prediction with generative
    adversarial networks. CoRR, abs/1701.01081, 2017.
    [30] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context
    encoders: Feature learning by inpainting. In CVPR, 2016.
    [31] L. Pishchulin, M. Andriluka, P. V. Gehler, and B. Schiele. Strong appearance
    and expressive spatial models for human pose estimation. In ICCV, 2013.
    [32] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. V. Gehler,
    and B. Schiele. Deepcut: Joint subset partition and labeling for multi person
    pose estimation. In CVPR, 2016.
    [33] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning
    with deep convolutional generative adversarial networks. CoRR, abs/1511.06434,
    2015.
    [34] U. Ra , B. Leibe, J. Gall, and I. Kostrikov. An ecient convolutional network
    for human pose estimation. In BMVC, 2016.
    [35] V. Ramakrishna, D. Munoz, M. Hebert, J. A. Bagnell, and Y. Sheikh. Pose
    machines: Articulated pose estimation via inference machines. In ECCV, 2014.
    [36] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Ecient object
    localization using convolutional networks. In CVPR, 2015.
    [37] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional
    network and a graphical model for human pose estimation. In NIPS,
    2014.
    [38] A. Toshev and C. Szegedy. Deeppose: Human pose estimation via deep neural
    networks. In CVPR, 2014.
    [39] S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. Convolutional pose machines.
    CoRR, abs/1602.00134, 2016.
    [40] W. Yang, W. Ouyang, H. Li, and X. Wang. End-to-end learning of deformable
    mixture of parts and deep convolutional neural networks for human pose estimation.
    In CVPR, 2016.
    [41] Y. Yang and D. Ramanan. Articulated pose estimation with
    exible mixturesof-
    parts. In CVPR, 2011.
    [42] J. J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial
    network. CoRR, abs/1609.03126, 2016.

    QR CODE