基於深度學習的臉部表情辨識系統｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	郭介銘 Kuo, Chieh-Ming
論文名稱：	基於深度學習的臉部表情辨識系統 Deep Learning Based Facial Expression Recognition System
指導教授：	賴尚宏 Lai, Shang-Hong
口試委員:	許秋婷 Hsu, Chiu-Ting 陳煥宗 Chen, Hwann-Tzong 劉庭祿 Liu, Tyng-Luh
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	51
中文關鍵詞：	深度學習、表情辨識
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

人臉表情辨識是電腦視覺研究領域中的經典問題，隨基於深度學習的方法在許多電腦視覺問題中都有優異的成績，近年來也有許多研究使用卷積神經網路來解決表情辨識的問題。
　　在本篇論文中，我們提出一個基於深度學習的臉部表情辨識系統，此系統包含了臉部區域偵測以及卷積神經網路分類器。在臉部偵測的部分，我們根據臉部的特徵點來決定最適當的臉部區域。我們自行設計了更適合臉部表情辨識的卷積神經網路架構，我們的模型也能更進一步地藉由循環神經網路的模組來提高在標準資料集上的準確率。在兩個標準資料集的實驗結果中，我們系統達到比目前最新的方法都來得更好的表現。
　　我們也自行蒐集了三組不同定義域的資料集，來探討對於不同定義域的適用性。我們提出利用光照條件來進行資料擴增的方法，有效降低了不同定義域之間的過擬合問題。
　　不僅如此，由於我們的系統不依賴複雜的前處理或校正，再加上我們的卷積神經網路有更佳的參數利用率，我們的系統能夠在有獨立繪圖晶片的筆記型電腦上以15FPS左右的速度執行。

Facial expression recognition is a classical problem in computer vision. With recent success in applying deep learning to a number of computer vision tasks, we propose a deep learning based facial expression recognition system. The system is composed of face region detection and convolutional neural network (CNN) classifier. In the module of face region detection, we decide the best face region by facial landmark points. We design our own CNN architecture which is more suitable for the expression recognition task. The expression recognition accuracy of our CNN model could be further improved by using the recurrent neural network module. Experimental results on some standard datasets show that our framework is superior to or comparable to the state-of-the-art methods.
We also collect three datasets from different domain to further investigate the generalization of the CNN model. We propose an illumination augmentation scheme which effectively reduce the overfitting issue while training with different domain types. Moreover, because our system did not rely on complicated pre-processing or rectification, our system is very efficient and it could run at about 15 FPS on notebook with GPU.

CHAPTER 1.    INTRODUCTION    1
1.1    MOTIVATION    1
1.2    PROBLEM DESCRIPTION    2
1.3    MAIN CONTRIBUTIONS    3
1.4    THESIS ORGANIZATION    4
CHAPTER 2.    PREVIOUS WORKS    5
2.1    FRAME-BASED METHOD    5
2.2    SEQUENCE-BASED METHOD    6
2.3    RECURRENT NEURAL NETWORK    7
CHAPTER 3.    PROPOSED FRAMEWORK    8
3.1    FACE PREPROCESSING    8
3.2    FRAME-BASED CNN MODEL    9
3.3    FRAME-TO-SEQUENCE APPROACH    11
3.4    MODEL TRAINING    14
CHAPTER 4.    EXPERIMENTAL RESULTS    15
4.1    STANDARD DATASETS    15
4.1.1    Frame-Based Approach    16
4.1.2    Frame-to-Sequence Approach    19
4.1.3    Parameter Efficiency    23
4.2    SELF-COLLECTED DATASETS    24
4.2.1    Cross Dataset Evaluation    27
4.2.2    Illumination Augmentation    29
4.3    IN THE WILD DATASETS    32
CHAPTER 5.    CONCLUSION    36
REFERENCES    37
APPENDIX I : DEMO SYSTEM    46
Matlab Implementation    46
Python Implementation    48
APPENDIX II : TEST ON DEEPSENSUS’S DEMO    49

                                

[1] L.-F. Chen and Y.-S. Yen. Taiwanese facial expression image database. Brain Mapping Laboratory, Institute of Brain Science, National Yang-Ming University, Taipei, Taiwan., 2007.
[2] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE, 2005.
[4] A. Dhall et al. Collecting large, richly annotated facial expression databases from movies. 2012.
[5] A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon.Emotion recognition in the wild challenge 2013. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages 509–516. ACM, 2013.
[6] C. Fabian Benitez-Quiroz, R. Srinivasan, and A. M. Martinez. Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5562–5570, 2016.
[7] L. Fei-Fei. Imagenet: crowdsourcing, benchmarking & other cool things. In CMU VASC Seminar, 2010.
[8] X. Glorot and Y. Bengio. Understanding the difﬁculty of training deep feedforward neural networks. In Aistats, volume 9, pages 249–256, 2010.
[9] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.H. Lee, et al. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing, pages 117–124. Springer, 2013.
[10] Y. Guo, G. Zhao, and M. Pietik¨ainen. Dynamic facial expression recognition using longitudinal facial expression atlases. In Computer Vision–ECCV 2012, pages 631–644. Springer, 2012.
[11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
[12] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[13] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten.Densely connected convolutional networks. arXiv preprint arXiv:1608.06993, 2016.
[14] H. Jung, S. Lee, J. Yim, S. Park, and J. Kim. Joint ﬁne-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 2983–2991, 2015.
[15] P. Khorrami, T. Le Paine, K. Brady, C. Dagli, and T. S. Huang. How deep neural networks can improve emotion recognition on video data. In Image Processing (ICIP), 2016 IEEE International Conference on, pages 619–623. IEEE, 2016.
[16] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[17] A. Klaser, M. Marszałek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In BMVC 2008-19th British Machine Vision Conference, pages 275–1. British Machine Vision Association, 2008.
[18] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1):18–31, 2012.
[19] A. Krizhevsky, V. Nair, and G. Hinton. The cifar-10 dataset, 2014.
[20] He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.Y.
[21] LeCun, C. Cortes, and C. J. Burges. The mnist database of handwritten digits, 1998.
[22] M. Liu, S. Shan, R. Wang, and X. Chen. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1749–1756, 2014.
[23] P. Liu, S. Han, Z. Meng, and Y. Tong. Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1805–1812, 2014.
[24] D. G. Lowe. Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, pages 1150–1157. Ieee, 1999.
[25] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-speciﬁed expression. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pages 94–101. IEEE, 2010.
[26] M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, and J. Budynek. The japanese female facial expression (jaffe) database. In Proceedings of third international conference on automatic face and gesture recognition, pages 14–16, 1998.
[27] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classiﬁcation with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 24(7):971–987, 2002.
[28] R. W. Picard, E. Vyzas, and J. Healey. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE transactions on pattern analysis and machine intelligence, 23(10):1175–1191, 2001.
[29] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld. Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing, 39(3):355–368, 1987.
[30] K. Sikka, A. Dhall, and M. Bartlett. Exemplar hidden markov models for classiﬁcation of facial expressions in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 18–25, 2015.
[31] K. Sikka, G. Sharma, and M. Bartlett. Lomo: Latent ordinal model for facial analysis in videos. arXiv preprint arXiv:1604.01500, 2016.
[32] K. Sikka, T. Wu, J. Susskind, and M. Bartlett. Exploring bag of words architectures in the facial expression domain. In European Conference on Computer Vision, pages 250–259. Springer, 2012.
[33] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[34] M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic. A multimodal database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, 3(1):42–55, 2012.
[35] G. Stemmler. Methodological considerations in the psychophysiological study of emotion. Handbook of affective sciences, pages 225–255, 2003.
[36] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9, 2015.
[37] M. Valstar and M. Pantic. Induced disgust, happiness and surprise: an addition to the mmi facial expression database. In Proc. 3rd Intern. Workshop on EMOTION (satellite of LREC): Corpora for Research on Emotion and Affect, page 65, 2010.
[38] J. Wang, L. Yin, X. Wei, and Y. Sun. 3d facial expression recognition based on primitive surface feature distribution. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1399–1406. IEEE, 2006.
[39] X. Xiong and F. De la Torre. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 532–539, 2013.
[40] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014.
[41] L. Yin, X. Chen, Y. Sun, T. Worm, and M. Reale. A high-resolution 3d dynamic facial expression database. In Automatic Face & Gesture Recognition, 2008. FG’08. 8th IEEE International Conference on, pages 1–6. IEEE, 2008.
[42] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato. A 3d facial expression database for facial behavior research. In Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on, pages 211–216. IEEE, 2006.
[43] Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence, 31(1):39–58, 2009.
[44] X. Zhang, L. Yin, J. F. Cohn, S. Canavan, M. Reale,A. Horowitz, P. Liu, and J. M. Girard. Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic facial expression database. Image and Vision Computing, 32(10):692–706, 2014.
[45] G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. Pietik¨ainen.Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9):607–619, 2011.
[46] X. Zhao, X. Liang, L. Liu, T. Li, Y. Han, N. Vasconcelos, and S. Yan. Peak-piloted deep network for facial expression recognition. In European Conference on Computer Vision, pages 425–442. Springer, 2016.
[47] Z. Zhou, G. Zhao, and M. Pietikainen. Towards a practical lipreading system. In CVPR, 2011 IEEE Conference on, pages 137–144. IEEE, 2011.
[48] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
[49] Davis E. King. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10, pp. 1755-1758, 2009
[50] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multi-pie. Image Vision Computing, 2010. 6
[51] [wisam baddar]. (2016, May 19). DeepSensus®: a deep learning based facial expression recognition demo system. [Video File]. Retrieved from https://youtu.be/SaPCVZC6SvA.
[52] Cui, Dongshun, Guang-Bin Huang, and Tianchi Liu. "Smile detection using Pair-wise Distance Vector and Extreme Learning Machine." Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 2016.
[53] Zhang, Kaihao, et al. "Facial smile detection based on deep learning features." Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on. IEEE, 2015.
[54] The MPLab GENKI-4K Database: http: //mplab.ucsd.edu/.
[55] Shan Li, Weihong Deng, and Junping Du. "Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[56] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

簡易檢索 / 詳目顯示

相關論文