研究生: |
李芝宇 Lee, Chih-Yu |
---|---|
論文名稱: |
結合不同層級特徵於語音情緒辨識之研究 Combining Different Levels of Features for Emotion Recognition in Speech |
指導教授: |
張智星
Jang, Jyh-Shing Roger |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 情緒辨識 |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
情緒辨識廣泛的被應用在很多領域中。正確的辨識情緒一直是這項研究的目的。我們認為不同階層的語音特徵可以提供不同的語音資訊,並且相信結合不同階層的語音特徵可以提高辨識率。從實驗結果可得知,將不同階層的語音特徵結合再一起,可以有效達到彌補各階層資訊不足的情況。我們提出幾種不同階層特徵之組合方式,並且在實驗中,我們證實了組合不同階層語音特徵確實可以提升辨識率。在實驗中我們採用德語情緒語料庫以及英語eNTERFACE情緒語料庫,前者有七種情緒類別,後者則有六種情緒類別。我們擷取的語音特徵值可分為兩種,第一類是以音框階層為基準擷取的語音特徵,包含能量、音高以及梅爾倒頻譜係數;第二類則是針對區段階層以及語句階層擷取,擷取的特徵則為low-level-descriptors (LLDs)。由實驗可知,相較於單一階層的語音特徵,結合多層特徵將能有效提升辨識率。
Emotion recognition has been successfully applied in many fields. It is believed that features extracted from each timing-level can provide different information of the emotional speech signals and therefore can compensate one another. In order to achieve a promising recognition accuracy, several methods for combining features extracted from different timing-levels are proposed in this thesis, including likelihood combination, weighted likelihood combination, raw feature combination and partial raw feature combination. We extracted spectrum features and prosodic features for frame-level features, and low-level descriptors (LLDs) for segment-level features and utterance-level features. The Berlin Emotion Database and eNTERFACE emotional database are used in the experiments. Compared with conventional one or two timing-level features, the combination of three timing-level features shows higher recognition rate.
[1] Dan-Ning Jiang and Lian-Hong Cai, “Speech emotion classification with the combination of statistic features and temporal features”, (2004) IEEE ICME
[2] B. Schuller and Gerhard Rigoll, “Timing levels in segment-based speech emotion recognition”, (2006) Interspeech
[3] F. Burkhardt, A. Paeschke, M. Rolfes et al., “A database of German emotional speech”, (2005) Interspeech, 1517-1520
[4] O. Martin, I. Kotsia, B. Macq and I. Pitas, “The eNTERFACE ’05 audio-visual emotion database”, (2006) IEEE Workshop on Multimedia Database Management
[5] B. Vlasenko, B. Schuller, A. Wendemuth and g. Rigoll, “Combining frame and turn-level information for robust recognition of emotions within speech”, (2007) Interspeech
[6] M. Chetouani, A. Mahdhaoui and F. Ringeval, “Time-scale feature extractions for emotional speech characterization”, (2009) Cognitive Computation, 194-201
[7] Yi-Lin Lin and Gang Wei, “Speech emotion recognition based on GMM and SVM”, Fourth Internation Conference on Machine Learning and Cybernetics, (2005) 18-21
[8] D. Ververidis ,C. Kotropoulos and Ioannis Pitas, “Automatic emotional speech classification”, (2004) 593-596
[9] F. Yu, E. Chang, Y.Q. Xu and H.Y. Shum, “Emotion detection from speech to enrich multimedia content”, (2001) IEEE Pacific Rim Conference on Multimedia
[10] N. Sato and Y. Obuchi, ” Emotion recognition using Mel-frequency cepstral coefficients”, (2007) Information and Media Technologies
[11] M. Vondra and R. Vích, “Recognition of emotions in german speech using gaussian mixture models”, (2009) Multimodal Signals: Cognitive and Algorithmic Issues
[12] Kwon, O.W., Chan, K., Hao J., T., “Emotion recognition by speech signals”, (2003) 8th Eur. Conf. on Speech Communication and Technology, 125-128
[13] B. Schuller, R. Mller, M. Lang, G. Rigoll, “Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles”, (2005) Interspeech, 805-809
[14] Yang Li and Yunxin Zhao, “Recognizing emotions in speech using short-term and long-term features”, (1998) IEEE ICSLP
[15] Bou-Ghazale, S. E., Hansen, J. H. L., “A comparative study of traditional and newly proposed features for recognition of speech under stress”, (2000) IEEE Speech and Audio Proc., 8,429-442
[16] B. Schuller, D. Seppi, A. Batliner, A. Maier, S. Steidl, “Towards more reality in the recognition of emotional speech”, (2007) ICASSP
[17] R. Kohavi, G. John, “Wrappers for feature subset selection”, (1997) Artificial Intelligence, 273-324
[18] A. Jain, D. Zonger, “Feature selection: Evaluation, application, and small sample performance”, IEEE Trans. Pattern Anal. Machine Intell. 19 (1997) 153–158
[19] H. Liu, L. Yu, “Toward integrating feature selection algorithms for classification and clustering”, IEEE Trans. Knowledge and Data Eng. 17 (2005) 491–502
[20] P. Pudil, F. J. Ferri, J. Novovicova, and J. Kittler, “Floating search methods for feature selection with nonmonotonic critierion finctiond”, (1994) ICCVIP, 279-283
[21] R.-E. Fan, P. H. Chen, and C.-J. Lin. “Working set selection using the second order information for training SVM”, Journal of Machine Learning Research 6, (2005) 1889-1918 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
[22] Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. The HTK(Hidden Markov Model Toolkit) Book V3.2 (2002), Cambridge University Engineering Department http://htk.eng.cam.ac.uk
[23] Lawrence Rabiner, Bing-Hwang Juang, “Fundamentals of speech recognition”, (1993) PTR Prentice Hall, Signal Processing Series
[24] Theodoridis Koutroumbas,“Pattern Recognition”, (2009) Academic Press, Elsevier
[25] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz and J. G. Taylor, “Emotion recognition in human-computer-interaction,” IEEE Signal Processing magazine, (2001) vol. 18, no. 1, 32-80
[26] http://en.wikipedia.org/wiki/AIBO
[27] F. Eyben, M. Wollmer, and B. Schuller, “openEAR – Introducing the Munich Open-Source Emotion and Affect Recognition Toolkit”, In Proc, ACII IEEE (2009), http://sourceforge.net/projects/openear
[28] B. Schuller, S. Steidl, and A. Batliner, “ The interspeech 2009 emotion challenge”, In Interspeech (2009) ISCA.
[29] Y.H. Yang, Y. C. Lin, Y. F. Su and Homer H. Chen, “A regression approach to music emotion recognition”, (2008) IEEE Audio, Speech and Language Processing