研究生: |
廖伶伶 Liao, Ling-Ling |
---|---|
論文名稱: |
利用聯合因素分析研究大腦磁振神經影像之時間效應以改善情緒辨識系統 Improving Categorical Emotion Recognition System by Joint Factor Analysis from Time Effect on fMRI |
指導教授: |
李祈均
Lee, Chi-Chun |
口試委員: |
曹昱
Tsao, Yu 郭立威 Kuo, Li-Wei 江振宇 Chiang, Chen-Yu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2017 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 45 |
中文關鍵詞: | 語音情緒辨識 、功能性磁振造影 、時間 、聯合因素分析 、激動程度 、情緒正負向 |
外文關鍵詞: | Speech emotion recognition, Functional magnetic resonance imaging, Time, Joint Factor analysis, Activation, Valence |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語音情緒辨識的方法,目前最常見的是利用人類外在的行為訊號(如聲音、影像、文字和肢體動作)進行機器學習建模,另外還有些研究嘗試透過人類內在的生理訊號(如腦波圖、心電圖和功能性磁振神經影像)理解情緒感知的過程。本次研究藉由讓受試者聆聽包含情緒的語音並利用磁振造影技術獲取大腦磁振神經影像,我們希望使用此大腦磁振神經影像建立對激動程度及情緒正負向辨識的系統,然而,長時間受測使受試者的大腦活動產生變化,例如疲勞、注意力轉移等,這些因測試時間累積額外產生的大腦活動類似雜訊,影響激動程度及情緒正負向辨識的效果。因此,我們嘗試將聯合因素分析演算法應用於大腦磁振神經影像,作為分離大腦中語音情緒刺激跟時間效應訊號的方法,經過此方法處理後的特徵比之未處理的特徵,能得到較好的語音激動程度及情緒正負向辨識效果。
Regarding speech emotion recognition, so far the most commonly seen method is to use a human beings exterior behavioral signals (such as voice, images, words and body language) to build the machine learning model. Other research has tried to understand the process of emotion recognition via human beings’ interior physical signals (through electroencephalography, electrocardiography and functional magnetic resonance imaging). This study observes subjects while they listen to emotional speech, using magnetic resonance imaging (MRI) technology to build the recognition system for activation and valence. However, after the subjects were tested for a long time, there were some changes in their brain activities, such as tiredness and distraction; these extra brain activities were similar to noise, which influenced the effects of activation and valence. Therefore, we applied a joint factor analysis algorithm to the brain magnetic resonance imaging as a method of separating speech emotion recognition in the brain from temporal effect signals. Speech activation and valence effects in the results after the method was applied are significantly improved.
[1] Cruz, A. C., Bhanu, B., & Thakoor, N. S. (2014). Vision and attention theory based sampling for continuous facial emotion recognition. IEEE Transactions on Affective Computing, 5(4), 418-431.
[2] Zen, G., Sangineto, E., Ricci, E., & Sebe, N. (2014, November). Unsupervised domain adaptation for personalized facial emotion recognition. In Proceedings of the 16th international conference on multimodal interaction (pp. 128-135). ACM.
[3] Zhang, L., Jiang, M., Farid, D., & Hossain, M. A. (2013). Intelligent facial emotion recognition and semantic-based topic detection for a humanoid robot. Expert Systems with Applications, 40(13), 5160-5168.
[4] Karg, M., Samadani, A. A., Gorbet, R., Kühnlenz, K., Hoey, J., & Kulić, D. (2013). Body movements for affective expression: A survey of automatic recognition and generation. IEEE Transactions on Affective Computing, 4(4), 341-359.
[5] Han, K., Yu, D., & Tashev, I. (2014). Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth Annual Conference of the International Speech Communication Association.
[6] Kim, Y., Lee, H., & Provost, E. M. (2013, May). Deep learning for robust feature generation in audiovisual emotion recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 3687-3691). IEEE.
[7] Li, W., & Xu, H. (2014). Text-based emotion classification using emotion cause extraction. Expert Systems with Applications, 41(4), 1742-1749.
[8] Rozgić, V., Vitaladevuni, S. N., & Prasad, R. (2013, May). Robust EEG emotion classification using segment level decision fusion. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 1286-1290). IEEE.
[9] Ackermann, P., Kohlschein, C., Bitsch, J. Á., Wehrle, K., & Jeschke, S. (2016, September). EEG-based automatic emotion recognition: Feature extraction, selection and classification methods. In e-Health Networking, Applications and Services (Healthcom), 2016 IEEE 18th International Conference on (pp. 1-6). IEEE.
[10] Vaish, A., & Kumari, P. (2014). A comparative study on machine learning algorithms in emotion state recognition using ECG. In Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012 (pp. 1467-1476). Springer, New Delhi.
[11] Chang, J., Zhang, M., Hitchman, G., Qiu, J., & Liu, Y. (2014). When you smile, you become happy: Evidence from resting state task-based fMRI. Biological psychology, 103, 100-106.
[12] Goncalves, N. R., Ban, H., Sánchez-Panchuelo, R. M., Francis, S. T., Schluppeck, D., & Welchman, A. E. (2015). 7 tesla FMRI reveals systematic functional organization for binocular disparity in dorsal visual cortex. Journal of Neuroscience, 35(7), 3056-3072.
[13] Da Costa, S., Saenz, M., Clarke, S., & Van Der Zwaag, W. (2015). Tonotopic gradients in human primary auditory cortex: concurring evidence from high-resolution 7 T and 3 T fMRI. Brain topography, 28(1), 66-69.
[14] Okada, K., Venezia, J. H., Matchin, W., Saberi, K., & Hickok, G. (2013). An fMRI study of audiovisual speech perception reveals multisensory interactions in auditory cortex. PloS one, 8(6), e68959.
[15] Wang, J., Sun, X., & Yang, Q. X. (2014). Methods for olfactory fMRI studies: Implication of respiration. Human brain mapping, 35(8), 3616-3624.
[16] Vedaei, F., Oghabian, M. A., Firouznia, K., Harirchian, M. H., Lotfi, Y., & Fakhri, M. (2016). The Human Olfactory System: Cortical Brain Mapping Using fMRI. Iranian Journal of Radiology, (In Press).
[17] Koelsch, S., Skouras, S., Fritz, T., Herrera, P., Bonhage, C., Küssner, M. B., & Jacobs, A. M. (2013). The roles of superficial amygdala and auditory cortex in music-evoked fear and joy. Neuroimage, 81, 49-60.
[18] Kurth, R., Villringer, K., Curio, G., Wolf, K. J., Krause, T., Repenthin, J., ... & Villringer, A. (2000). fMRI shows multiple somatotopic digit representations in human primary somatosensory cortex. Neuroreport, 11(7), 1487-1491.
[19] Chou, H. C., Lin, W. C., Chang, L. C., Li, C.C., Ma, H. P., & Lee, C.C. (2017). NNIME: The NTHU-NTUA Chinese Interactive Multimodal Emotion Corpus. (In Proceedings of ACII).
[20] Roy, C. S., & Sherrington, C. S. (1890). On the regulation of the blood‐supply of the brain. The Journal of physiology, 11(1-2), 85-158.
[21] Logothetis, N. K., Pauls, J., Augath, M., Trinath, T., & Oeltermann, A. (2001). Neurophysiological investigation of the basis of the fMRI signal. Nature, 412(6843), 150.
[22] Ogawa, S., Lee, T. M., Nayak, A. S., & Glynn, P. (1990). Oxygenation‐sensitive contrast in magnetic resonance image of rodent brain at high magnetic fields. Magnetic resonance in medicine, 14(1), 68-78.
[23] Roy, C. S., & Sherrington, C. S. (1890). On the regulation of the blood‐supply of the brain. The Journal of physiology, 11(1-2), 85-158.
[24] Huettel, S. A., Song, A. W., & McCarthy, G. (2004). Functional magnetic resonance imaging (Vol. 1). Sunderland: Sinauer Associates.
[25] McKeown, M. J., & Sejnowski, T. J. (1998). Independent component analysis of fMRI data: examining the assumptions. Human brain mapping, 6(5-6), 368-372.
[26] McKeown, M. J., Makeig, S., Brown, G. G., Jung, T. P., Kindermann, S. S., Bell, A. J., & Sejnowski, T. J. (1997). Analysis of fMRI data by blind separation into independent spatial components (No. NHRC-REPT-97-42). NAVAL HEALTH RESEARCH CENTER SAN DIEGO CA.
[27] Fletcher, R. (2013). Practical methods of optimization. John Wiley & Sons.
[28] Moon, T. K. (1996). The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6), 47-60.
[29] Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing10.1-3 (2000): 19-41.
[30] Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing, 10(1-3), 19-41.
[31] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
[32] Hatch, A. O., Kajarekar, S. S., & Stolcke, A. (2006, September). Within-class covariance normalization for SVM-based speaker recognition. In Interspeech.
[33] Izenman, A. J. (2013). Linear discriminant analysis. In Modern multivariate statistical techniques (pp. 237-280). Springer New York.
[34] Glembek, O., Ma, J., Matejka, P., Zhang, B., Plchot, O., Burget, L., & Matsoukas, S. (2014, May). Domain adaptation via within-class covariance correction in i-vector based speaker recognition systems. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (pp. 4032-4036). IEEE.
[35] El Shafey, L., McCool, C., Wallace, R., & Marcel, S. (2013). A scalable formulation of probabilistic linear discriminant analysis: Applied to face recognition. IEEE transactions on pattern analysis and machine intelligence, 35(7), 1788-1794.