研究生: |
賴盈秀 Lai, Ying-Hsiu |
---|---|
論文名稱: |
透過生成對抗網絡學習表情保留特徵的多視角人臉表情辨識 Emotion-Preserving Representation Learning via Generative Adversarial Network for Multi-view Facial Expression Recognition |
指導教授: |
賴尚宏
Lai, Shang-Hong |
口試委員: |
陳祝嵩
Chen, Chu-Song 邱瀞德 Chiu, Ching-Te 林嘉文 Lin, Chia-Wen |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 47 |
中文關鍵詞: | 多視角人臉表情辨識 、表情保留人臉轉正 、生成對抗網絡 |
外文關鍵詞: | Multi-view Facial Expression Recognition, Emotion-Preserving Face Frontalization, Generative Adversarial Network |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
儘管人臉表情辨識因為深度學習的興起而有了很大的進展,但頭部角度的變化對大部分現有的研究來說仍舊是一個瓶頸。人臉轉正是將多視角簡化成單一正視視角表情辨識,能拿來解決大角度問題的其中一種做法。這篇論文提出了一種基於生成對抗網絡來從人臉轉正過程中學習表情保留特徵的多任務學習方法。利用生成對抗網絡的優勢,網絡中的生成器可以經由輸入一側臉圖,來生成與真實圖像相似且保留表情與個體特徵的正臉圖像,同時網絡學習到的表情保留特徵可以直接拿來辨識表情。我們所提出的網絡是透過最佳化合成與分類目標函數,來讓學習到的特徵具有生成與辨識的能力。實驗結果顯示我們生成出的正臉圖像是可以有效的用於表情辨識,讓那些只在正臉有好表現的表情辨識系統能夠利用我們生成的轉正圖像來解決大角度變化的問題;另外我們的多任務學習網絡在Multi-PIE資料集中多視角表情辨識的表現優於目前最好的方法。
Although the development of facial expression recognition made a significant progress due to the rise of deep learning, the pose variations are still a bottleneck for most of the existing works. Face frontalization is one way to solve the large pose variation problem, which simplifies multi-view recognition into one canonical-view recognition. This thesis presents a multi-task learning approach based on the generative adversarial network (GAN) that learns the emotion-preserving representations during the face frontalization process. Taking advantage of adversarial relationship between the generator and the discriminator in GAN, the generator can frontalize input profile face images into realistic-looking frontal face images while preserving the identity and expression characteristics; in the meantime, it can employ the learnt emotion-preserving representations to predict the expression class label from the input face. The proposed network is optimized by both synthesis and classification objective functions that makes the learnt representations be generative and discriminative. Experimental results demonstrate that our generated frontal faces are effective for the recognition task, which lets the previous facial expression recognition methods well performed on frontal view can also deal with large pose expressions. In addition, our multi-task learning network outperforms state-of-the-art results on multi-view facial expression recognition in Multi-PIE database.
[1] Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016, March). Going deeper in facial expression recognition using deep neural networks. In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on (pp. 1-10). IEEE.
[2] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9).
[3] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
[4] Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2983-2991).
[5] Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 94-101). IEEE.
[6] Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9), 607-619.
[7] Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1701-1708).
[8] Zhu, Z., Luo, P., Wang, X., & Tang, X. (2013). Deep learning identity-preserving face space. In Proceedings of the IEEE International Conference on Computer Vision (pp. 113-120).
[9] Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., & Kim, J. (2015). Rotating your face using multi-task deep neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 676-684).
[10] Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning gan for pose-invariant face recognition. In CVPR (Vol. 4, No. 5, p. 7).
[11] Huang, R., Zhang, S., Li, T., & He, R. (2017). Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis. arXiv preprint arXiv:1704.04086.
[12] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[13] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
[14] Sariyanidi, E., Gunes, H., & Cavallaro, A. (2015). Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1113-1133.
[15] Liu, P., Han, S., Meng, Z., & Tong, Y. (2014). Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1805-1812).
[16] Zhao, X., Liang, X., Liu, L., Li, T., Han, Y., Vasconcelos, N., & Yan, S. (2016, October). Peak-piloted deep network for facial expression recognition. In European Conference on Computer Vision (pp. 425-442). Springer International Publishing.
[17] Ahonen, T., Hadid, A., & Pietikainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE transactions on pattern analysis and machine intelligence, 28(12), 2037-2041.
[18] Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886-893). IEEE.
[19] Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998, April). Coding facial expressions with gabor wavelets. In Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on (pp. 200-205). IEEE.
[20] Nikitidis, S., Tefas, A., Nikolaidis, N., & Pitas, I. (2012). Subclass discriminant nonnegative matrix factorization for facial image analysis. Pattern Recognition, 45(12), 4080-4091.
[21] Zhi, R., Flierl, M., Ruan, Q., & Kleijn, W. B. (2011). Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1), 38-52.
[22] Zafeiriou, S., & Petrou, M. (2010, June). Sparse representations for facial expressions recognition via l 1 optimization. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 32-39). IEEE.
[23] Mahoor, M. H., Zhou, M., Veon, K. L., Mavadati, S. M., & Cohn, J. F. (2011, March). Facial action unit recognition with sparse representation. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on (pp. 336-342). IEEE.
[24] Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 179-188.
[25] Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems, 2(1-3), 37-52.
[26] Sandbach, G., Zafeiriou, S., Pantic, M., & Yin, L. (2012). Static and dynamic 3D facial expression recognition: A comprehensive survey. Image and Vision Computing, 30(10), 683-697.
[27] Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., & Yan, K. (2016). A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition. IEEE Transactions on Multimedia, 18(12), 2528-2536.
[28] Jampour, M., Lepetit, V., Mauthner, T., & Bischof, H. (2017). Pose-specific non-linear mappings in feature space towards multiview facial expression recognition. Image and Vision Computing, 58, 38-46.
[29] Hassner, T., Harel, S., Paz, E., & Enbar, R. (2015). Effective face frontalization in unconstrained images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4295-4304).
[30] Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S. Z. (2015). High-fidelity pose and expression normalization for face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 787-796).
[31] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
[32] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2016). Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802.
[33] Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2536-2544).
[34] Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
[35] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in Neural Information Processing Systems (pp. 2234-2242).
[36] Odena, A., Olah, C., & Shlens, J. (2016). Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585.
[37] Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499-1503.
[38] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Ghemawat, S. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
[39] Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and Vision Computing, 28(5), 807-813.
[40] Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006, April). A 3D facial expression database for facial behavior research. In Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on (pp. 211-216). IEEE.
[41] Moore, S., & Bowden, R. (2011). Local binary patterns for multi-view facial expression recognition. Computer Vision and Image Understanding, 115(4), 541-558.
[42] Zheng, W. (2014). Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Transactions on Affective Computing, 5(1), 71-85.