用於人體姿勢估測之對抗式訓練方法｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	周佳蓉 Chou, Chia-Jung
論文名稱：	用於人體姿勢估測之對抗式訓練方法 Self Adversarial Training for Human Pose Estimation
指導教授：	陳煥宗 Chen, Hwann-Tzong
口試委員:	賴尚宏 Lai, Shang-Hong 劉庭祿 Liu, Tyng-Luh
學位類別：	碩士 Master
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	36
中文關鍵詞：	人體姿勢估測、對抗式生成網路、全卷積類神經網路
外文關鍵詞：	Human Pose Estimation, Generative Adversarial Network, Fully Convolutional Neural Network
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本篇論文提出一個基於深度學習，用於人體姿勢估測的方法。我們採用生成對抗網
路作為我們學習模式，其中我們建立了兩個堆疊式沙漏型網路，一個作為生成網
路，一個作為鑑別網路。訓練完成後，生成網路會直接當作人體姿勢估測使用。鑑
別網路用來區分標準答案的熱圖和生成的熱圖，以此計算出的對抗損失反向回饋至
生成網路。此過程能使生成網路學習人體姿勢合理性，且從實驗結果發現，這樣的
訓練有助於提升預測的準確度。

This thesis presents a deep learning based approach to the problem of human pose
estimation. We employ generative adversarial networks as our learning paradigm in
which we set up two stacked hourglass networks with the same architectures, one as
the generator and the other as the discriminator. The generator is used as a human
pose estimator after the training is done. The discriminator distinguishes groundtruth
heatmaps from generated ones, and back-propagates the adversarial loss to the
generator. This process enables the generator to learn the plausible human body
congurations and is shown to be useful for improving the prediction accuracy.

Introduction 8
Related Work 10
1 Human Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . 11
Adversarial Training with the Stacked Hourglass Networks 13
1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Training the Generator . . . . . . . . . . . . . . . . . . . . . . 14
2 Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1 Training the Discriminator . . . . . . . . . . . . . . . . . . . . 17
3 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Experiments 21
1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 LSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 MPII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 LIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Conclusion 32
                                

[1] M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele. 2d human pose
estimation: New benchmark and state of the art analysis. In CVPR, 2014.
[2] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited: People
detection and articulated pose estimation. In CVPR, 2009.
[3] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN. CoRR,
abs/1701.07875, 2017.
[4] V. Belagiannis and A. Zisserman. Recurrent human pose estimation. CoRR,
abs/1605.02914, 2016.
[5] D. Berthelot, T. Schumm, and L. Metz. BEGAN: boundary equilibrium generative
adversarial networks. CoRR, abs/1703.10717, 2017.
[6] A. Bulat and G. Tzimiropoulos. Human pose estimation via convolutional part
heatmap regression. In ECCV, 2016.
[7] Z. Cao, T. Simon, S. Wei, and Y. Sheikh. Realtime multi-person 2d pose estimation
using part anity elds. CoRR, abs/1611.08050, 2016.
[8] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik. Human pose estimation
with iterative error feedback. CoRR, abs/1507.06550, 2015.
[9] X. Chen and A. L. Yuille. Articulated pose estimation by a graphical model with
image dependent pairwise relations. In NIPS, 2014.
[10] Y. Chen, C. Shen, X.Wei, L. Liu, and J. Yang. Adversarial posenet: A structureaware
convolutional network for human pose estimation. CoRR, abs/1705.00389,
2017.
[11] X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, and X. Wang. Multi-context
attention for human pose estimation. CoRR, abs/1702.07432, 2017.
[12] M. Dantone, J. Gall, C. Leistner, and L. J. V. Gool. Human pose estimation
using body parts dependent joint regressors. In CVPR, 2013.
[13] P. F. Felzenszwalb, D. A. McAllester, and D. Ramanan. A discriminatively
trained, multiscale, deformable part model. In CVPR, 2008.
[14] G. Gkioxari, A. Toshev, and N. Jaitly. Chained predictions using convolutional
neural networks. In ECCV, 2016.
[15] K. Gong, X. Liang, X. Shen, and L. Lin. Look into person: Self-supervised
structure-sensitive learning and A new benchmark for human parsing. CoRR,
abs/1703.05446, 2017.
[16] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. C. Courville, and Y. Bengio. Generative adversarial networks. CoRR,
abs/1406.2661, 2014.
[17] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville. Improved
training of wasserstein gans. CoRR, abs/1704.00028, 2017.
[18] P. Hu and D. Ramanan. Bottom-up and top-down reasoning with hierarchical
rectied gaussians. In CVPR, 2016.
[19] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. Deepercut:
A deeper, stronger, and faster multi-person pose estimation model. CoRR,
abs/1605.03170, 2016.
[20] P. Isola, J. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with
conditional adversarial networks. CoRR, abs/1611.07004, 2016.
[21] S. Johnson and M. Everingham. Clustered pose and nonlinear appearance models
for human pose estimation. In BMVC, 2010.
[22] S. Johnson and M. Everingham. Learning eective human pose estimation from
inaccurate annotation. In CVPR, 2011.
[23] L. Ladicky, P. H. S. Torr, and A. Zisserman. Human pose estimation using a
joint pixel-wise and part-wise formulation. In CVPR, 2013.
[24] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. P. Aitken, A. Tejani, J. Totz,
Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a
generative adversarial network. CoRR, abs/1609.04802, 2016.
[25] I. Lifshitz, E. Fetaya, and S. Ullman. Human pose estimation using deep consensus
voting. CoRR, abs/1603.08212, 2016.
[26] P. Luc, C. Couprie, S. Chintala, and J. Verbeek. Semantic segmentation using
adversarial networks. CoRR, abs/1611.08408, 2016.
[27] M. Mirza and S. Osindero. Conditional generative adversarial nets. CoRR,
abs/1411.1784, 2014.
[28] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose
estimation. CoRR, abs/1603.06937, 2016.
[29] J. Pan, C. Canton-Ferrer, K. McGuinness, N. E. O'Connor, J. Torres, E. Sayrol,
and X. Giro i Nieto. Salgan: Visual saliency prediction with generative
adversarial networks. CoRR, abs/1701.01081, 2017.
[30] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context
encoders: Feature learning by inpainting. In CVPR, 2016.
[31] L. Pishchulin, M. Andriluka, P. V. Gehler, and B. Schiele. Strong appearance
and expressive spatial models for human pose estimation. In ICCV, 2013.
[32] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M. Andriluka, P. V. Gehler,
and B. Schiele. Deepcut: Joint subset partition and labeling for multi person
pose estimation. In CVPR, 2016.
[33] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning
with deep convolutional generative adversarial networks. CoRR, abs/1511.06434,
2015.
[34] U. Ra, B. Leibe, J. Gall, and I. Kostrikov. An ecient convolutional network
for human pose estimation. In BMVC, 2016.
[35] V. Ramakrishna, D. Munoz, M. Hebert, J. A. Bagnell, and Y. Sheikh. Pose
machines: Articulated pose estimation via inference machines. In ECCV, 2014.
[36] J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. Ecient object
localization using convolutional networks. In CVPR, 2015.
[37] J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. Joint training of a convolutional
network and a graphical model for human pose estimation. In NIPS,
2014.
[38] A. Toshev and C. Szegedy. Deeppose: Human pose estimation via deep neural
networks. In CVPR, 2014.
[39] S. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh. Convolutional pose machines.
CoRR, abs/1602.00134, 2016.
[40] W. Yang, W. Ouyang, H. Li, and X. Wang. End-to-end learning of deformable
mixture of parts and deep convolutional neural networks for human pose estimation.
In CVPR, 2016.
[41] Y. Yang and D. Ramanan. Articulated pose estimation with
exible mixturesof-
parts. In CVPR, 2011.
[42] J. J. Zhao, M. Mathieu, and Y. LeCun. Energy-based generative adversarial
network. CoRR, abs/1609.03126, 2016.

簡易檢索 / 詳目顯示

相關論文