研究生: |
謝承宏 Xie, Cheng-Hong |
---|---|
論文名稱: |
基於深度學習之表情辨識使用生成對抗網路資料增強 Deep Learning for Facial Expression Recognition with Data Augmentation Using GAN |
指導教授: |
黃之浩
Huang, Chih-Hao |
口試委員: |
孫敏德
Sun, Min-Te 李端興 Lee, Duan-Shin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 通訊工程研究所 Communications Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 33 |
中文關鍵詞: | 臉部情緒辨識 、卷積神經網路 、生成對抗網路 、資料增強 |
外文關鍵詞: | Facial Expression Recognition, Convolutional Neural Networks, Generative Adversarial Networks, Data Augmentation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在眾多情緒表達方式中,臉部情緒是最自然、直觀也最普遍的一種,人類透過臉部情緒傳達情感和意圖,不會因不同國家、文化而有所差異,這些基本表情分別是生氣、厭惡、恐懼、快樂、悲傷和驚訝。要在自然的條件下辨認出這些臉部表情是具有挑戰性的,由於不同的年紀、頭部姿勢、燈光照明,加上自然的表情變化是細微、難以察覺的,所以可靠的臉部情緒辨識系統在目前還是一個有待解決的問題。近幾年生成對抗網路和卷積神經網路在影像處理和電腦視覺領域有優異的效能表現,不斷提升圖像分類的準確率,啟發我們基於現有的模型,創造出本篇論文的架構。
本篇論文使用卷積神經網路作為分類器,並使用生成對抗網路來做資料增強。我們訓練一個生成對抗網路來學習影像對影像的轉換,將原有的資料集作為輸入,產生指定類別的影像加入原有訓練集中,以擴增少數類別的資料數量與多樣性,進而提升影像分類的效能。
Among all emotional expressions, facial expressions are the most natural, intuitive and common. Humans convey emotions and intentions through facial expressions, people act the same away in different countries and cultures. These basic expressions are anger, disgust, fear, happiness, sadness, and surprise. It is challenging to recognize these facial expressions under natural conditions, because of different ages, head postures, illumination, and natural expression changes that are subtle and difficult to detect, reliable facial emotion recognition system is still a problem to be solved. In recent years, generative adversarial networks and convolutional neural networks achieve excellent performance in fields of image processing and computer vision, the accuracy of image classification has been improved continually, inspire us to take previous work as reference, and create the structure of this paper.
In this paper, we use a convolutional neural network as a classifier and StarGAN to perform data augmentation. We train a generative adversarial network that learns image-to-image translation, it generates images of specific classes by using the original dataset as reference class. By adding generated images into original training set, we increase the amount of images in minority classes and the diversity of training set, thereby improving the performance of image classification for facial expression recognition.
[1] Darwin, C., & Prodger, P. (1998). The expression of the emotions in man and animals. Oxford University Press, USA.
[2] Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on pattern analysis and machine intelligence, 23(2), 97-115.
[3] Martinez, B., & Valstar, M. F. (2016). Advances, challenges, and opportunities in automatic facial expression recognition. In Advances in face detection and facial image analysis (pp. 63-100). Springer, Cham.
[4] Ekman, P. (1994). Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique.
[5] Matsumoto, D. (1992). More evidence for the universality of a contempt expression. Motivation and Emotion, 16(4), 363-368.
[6] Sariyanidi, E., Gunes, H., & Cavallaro, A. (2014). Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE transactions on pattern analysis and machine intelligence, 37(6), 1113-1133.
[7] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
[8] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[9] Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815-823).
[10] Tang, Y. (2013). Deep learning using support vector machines. CoRR, abs/1306.0239, 2.
[11] Yu, Z., & Zhang, C. (2015, November). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 435-442). ACM.
[12] Kim, B. K., Dong, S. Y., Roh, J., Kim, G., & Lee, S. Y. (2016). Fusing aligned and non-aligned face information for automatic affect recognition in the wild: a deep learning approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 48-57).
[13] Nielsen, C., & Okoniewski, M. (2019). GAN Data Augmentation Through Active Learning Inspired Sample Acquisition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 109-112).
[14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[15] Denton, E. L., Chintala, S., & Fergus, R. (2015). Deep generative image models using a laplacian pyramid of adversarial networks. In Advances in neural information processing systems (pp. 1486-1494).
[16] Yu, L., Zhang, W., Wang, J., & Yu, Y. (2017, February). Seqgan: Sequence generative adversarial nets with policy gradient. In Thirty-First AAAI Conference on Artificial Intelligence.
[17] Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015, November). Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443-449). ACM.
[18] Liu, X. Y., Wu, J., & Zhou, Z. H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539-550.
[19] Barua, S., Islam, M. M., Yao, X., & Murase, K. (2012). MWMOTE--majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26(2), 405-425.
[20] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
[21] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
[22] Fasel, B. (2002, August). Robust face analysis using convolutional neural networks. In Object recognition supported by user interaction for service robots (Vol. 2, pp. 40-43). IEEE.
[23] Fasel, B. (2002, October). Head-pose invariant facial expression recognition using convolutional neural networks. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (p. 529). IEEE Computer Society.
[24] Cireşan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011). High-performance neural networks for visual object classification. arXiv preprint arXiv:1102.0183.
[25] Simard, P. Y., Steinkraus, D., & Platt, J. C. (2003, August). Best practices for convolutional neural networks applied to visual document analysis. In Icdar (Vol. 3, No. 2003).
[26] Hauberg, S., Freifeld, O., Larsen, A. B. L., Fisher, J., & Hansen, L. (2016, May). Dreaming more data: Class-dependent distributions over diffeomorphisms for learned data augmentation. In Artificial Intelligence and Statistics (pp. 342-350).
[27] Dixit, M., Kwitt, R., Niethammer, M., & Vasconcelos, N. (2017). Aga: Attribute-guided augmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7455-7463).
[28] Rogez, G., & Schmid, C. (2016). Mocap-guided data augmentation for 3d pose estimation in the wild. In Advances in neural information processing systems (pp. 3108-3116).
[29] Devries, T., Biswaranjan, K., & Taylor, G. W. (2014, May). Multi-task learning of facial landmarks and expression. In 2014 Canadian Conference on Computer and Robot Vision (pp. 98-103). IEEE.
[30] Dumoulin, V., Belghazi, I., Poole, B., Mastropietro, O., Lamb, A., Arjovsky, M., & Courville, A. (2016). Adversarially learned inference. arXiv preprint arXiv:1606.00704.
[31] Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Paul Smolley, S. (2017). Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2794-2802).
[32] Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.
[33] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
[34] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein gans. In Advances in neural information processing systems (pp. 5767-5777).
[35] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., ... & Shi, W. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4681-4690).
[36] Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016, October). Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision (pp. 597-613). Springer, Cham.
[37] Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015). Learning social relation traits from face images. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3631-3639).
[38] Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
[39] Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8789-8797).
[40] Guo, Y., Tao, D., Yu, J., Xiong, H., Li, Y., & Tao, D. (2016, July). Deep neural networks with relativity learning for facial expression recognition. In 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (pp. 1-6). IEEE.
[41] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[42] He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9), 1904-1916.
[43] Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., ... & Zhou, Y. (2013, November). Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing (pp. 117-124). Springer, Berlin, Heidelberg.
[44] Bradski, G. (2000). The opencv library. Dr Dobb's J. Software Tools, 25, 120-125.
[45] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[46] Odena, A., Olah, C., & Shlens, J. (2017, August). Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 2642-2651). JMLR. org.
[47] Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
[48] Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 5907-5915).
[49] Brock, A., Lim, T., Ritchie, J. M., & Weston, N. (2016). Neural photo editing with introspective adversarial networks. arXiv preprint arXiv:1609.07093.
[50] Shu, Z., Yumer, E., Hadap, S., Sunkavalli, K., Shechtman, E., & Samaras, D. (2017). Neural face editing with intrinsic image disentangling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5541-5550).
[51] Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700-708).
[52] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[53] Liu, M. Y., & Tuzel, O. (2016). Coupled generative adversarial networks. In Advances in neural information processing systems (pp. 469-477).
[54] Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
[55] Kim, T., Cha, M., Kim, H., Lee, J. K., & Kim, J. (2017, August). Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1857-1865). JMLR. org.