研究生: |
高炫凱 Kao, Shiuan-Kai |
---|---|
論文名稱: |
以雙向長短期記憶網路架構混和多時間粒度文字模態改善婚 姻治療自動化行為評分系統 Improving Automatic Behavior Rating System of Couple Therapy using Multi-granular Word Fusion Approach with bidirectional LSTM Architecture |
指導教授: |
李祈均
Lee, Chi-Chun |
口試委員: |
曹昱
Tsao, Yu 賴穎暉 Lai, Ying-Hui 李宏毅 Lee, Hung-Yi |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 人類行為訊號處理 、多時間粒度 、深度學習 、情感計算 、自然語言處理 |
外文關鍵詞: | behavioral signal processing, multi-granularity, deep learning, affective computing, natural language processing |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在心理學領域的研究中,為了觀察人類的心理狀態,專家們時常會設計一套實
驗流程,如諮詢、演出或討論等,希望藉由外在行為的刺激引出內在情緒的反應。
然而,在分析整段互動過程時,不同時間長度的互動片段會隱含不同強度的情緒資
訊,專家們便藉由彙整這些片段的資訊以便做出較完整且適合的決策。本論文受此
概念啟發,將其應用於婚姻治療資料庫自動化行為評分系統中,藉此增強機器對於
心理治療之互動過程評分之正確性。其計畫挑選長期患有婚姻問題的夫妻,讓夫妻
雙方針對主題進行對話,將其過程中之聲音、影像以及文字記錄下來,藉由這些資
訊可分析夫妻雙方互動過程之行為表現程度進而評估治療成效。
本論文使用雙向長短時記憶網路(Bidirectional Long Short Term Memory)架構應用於文字模態中取出多時間粒度下之高階特徵,並結合文本層級之文章向量(Doc2Vec)做特徵篩選,以整合不同時間層次之行為表徵,最後加入語音模態進行二元分類器之機器學習,在六種行為編碼之表現上,丈夫和妻子的平均行為準確率分別達到了 79.3%和 82.4%,相較於過去論文的 74%和 75%[1]分別提升了 5.3%以及 7.4%。最後的實驗與結果展示了使用深度雙向長短期記憶網路能夠有效學習時間序列資訊的優點,其應用於各時間粒度行為強度之計算能夠增進整體演算法在婚姻治療之行為辨識準確率。
In psychology field research, experts generally design a standard experimental
procedure, e.g., consultation, show or talk, to observe the mental state of human.
They expect to trigger reactions of internal emotion by stimulating external behavior.
However, when analyzing whole interaction process, different lengths of fragments of
interaction including different strength of emotional information, and experts make more
complete and suitable decision. Our work inspired by the conception and apply it on
automatic behavior rating system of couple therapy database, to improve the accuracy of
scoring interaction process of psychotherapy. This program recruit seriously and
chronically distressed married couples, and let them make a problem-solving
communication for specific topic, recording the audio, video and text of process, experts
analyze the extent of behavior of couples interaction process to evaluate treatment effects
by these information.
This paper use Bidirectional Long Short Term Memory structure to extract multi-
granular and high-level features for lexical modality, also combine Doc2Vec into
document level with feature selection to integrate different temporal level of behavioral
features, and finally join audio modality to train binary classifier with machine learning
algorithm. For the performance of six behavioral codes, husband and wife's average
accuracy of behavior achieve 79.3% and 82.4% separately, this enhance 5.3% and 7.4%
average accuracy compared to 74% and 75% of previous paper[1]. Our experiments and
results present the merit of use of Bidirectional Long Short Term Memory can learn time
series information effectively, the computation of different level granularity of intensity
of behavior improving the algorithm on couple therapy rating system.
[1] Xia, Wei, et al. "A dynamic model for behavioral analysis of couple interactions using acoustic features." Sixteenth Annual Conference of the International Speech Communication Association. 2015.
[2] Mehrabian, Albert. Silent messages. Vol. 8. Belmont, CA: Wadsworth, 1971.
[3] Peräkylä, Anssi, and Johanna Elisabeth Ruusuvuori. Facial expression and interactional regulation of emotion. Oxford University Press, 2012.
[4] Skinner, Burrhus Frederic. Science and human behavior. Simon and Schuster, 1953.
[5] Berelson, Bernard, and Gary A. Steiner. "Human behavior: An inventory of scientific findings." (1964).
[6] Little, Paul, et al. "Observational study of effect of patient centredness and positive approach on outcomes of general practice consultations." Bmj 323.7318 (2001): 908-911.
[7] Gersten, Russell M., Douglas W. Carnine, and Paul B. Williams. "Measuring implementation of a structured educational model in an urban school district: An observational approach." Educational Evaluation and Policy Analysis 4.1 (1982): 67-79.
[8] Narayanan, Shrikanth, and Panayiotis G. Georgiou. "Behavioral signal processing: Deriving human behavioral informatics from speech and language." Proceedings of the IEEE 101.5 (2013): 1203-1233.
[9] Boose, John H. "Personal Construct Theory and the Transfer of Human Expertise." AAAI. Vol. 84. 1984.
[10] Chen, Chin-Po, et al. "Computing Multimodal Dyadic Behaviors during Spontaneous Diagnosis Interviews toward Automatic Categorization of Autism Spectrum Disorder." Age (Avg/Std) 14 (2017): 3-08.
[11] Huang, Wen-Yu, et al. "Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information." INTERSPEECH. 2016.
[12] Tsai, Fu-Sheng, et al. "Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions." INTERSPEECH. 2016.
[13] Kim, Kyung Hwan, Seok Won Bang, and Sang Ryong Kim. "Emotion recognition system using short-term monitoring of physiological signals." Medical and biological engineering and computing 42.3 (2004): 419-427.
[14] Busso, Carlos, et al. "Analysis of emotion recognition using facial expressions,
speech and multimodal information." Proceedings of the 6th international conference on Multimodal interfaces. ACM, 2004.
[15] Lin, Wei-Cheng, and Chi-Chun Lee. "A thin-slice perception of emotion? An information theoretic-based framework to identify locally emotion-rich behavior segments for global affect recognition." Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
[16] Christensen, Andrew, et al. "Traditional versus integrative behavioral couple therapy for significantly and chronically distressed married couples." Journal of consulting and clinical psychology 72.2 (2004): 176.
[17] Black, Matthew, et al. "Automatic classification of married couples' behavior using
audio features." Eleventh Annual Conference of the International Speech Communication Association. 2010.
[18] Black, Matthew P., et al. "Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features." Speech Communication 55.1 (2013): 1-21.
[19] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[20] Lawrence, Steve, et al. "Face recognition: A convolutional neural-network approach." IEEE transactions on neural networks8.1 (1997): 98-113.
[21] Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. "Speech recognition with deep recurrent neural networks." Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013.
[22] Graves, Alex, and Navdeep Jaitly. "Towards end-to-end speech recognition with recurrent neural networks." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014.
[23] Lewis, Mike, et al. "Deal or no deal? end-to-end learning for negotiation dialogues." arXiv preprint arXiv:1706.05125 (2017).
[24] Christensen, Andrew, Neil S. Jacobson, and Julia C. Babcock. "Integrative behavioral couple therapy." (1995).
[25] Christensen, A. "Social support interaction rating system." Unpublished measure, University of California, Los Angeles(1999).
[26] Heavey, C., D. Gill, and A. Christensen. "Couples interaction rating system 2 (CIRS2)." University of California, Los Angeles 7 (2002).
[27] WILLIAMS‐BAUCOM, KATHERINE J., et al. "“You” and “I” need to talk about
“us”: Linguistic patterns in marital interactions." Personal Relationships 17.1
(2010): 41-56.
[28] Moreno, Pedro J., et al. "A recursive algorithm for the forced alignment of very
long audio segments." Fifth International Conference on Spoken Language
Processing. 1998.
[29] Katsamanis, Athanasios, et al. "SailAlign: Robust long speech-text
alignment." Proc. of Workshop on New Tools and Methods for Very-Large Scale
Phonetics Research. 2011.
[30] Ghosh, Prasanta Kumar, Andreas Tsiartas, and Shrikanth Narayanan. "Robust voice
activity detection using long-term signal variability." IEEE Transactions on Audio,
Speech, and Language Processing 19.3 (2011): 600-613.
[31] Eyben, Florian, Martin Wöllmer, and Björn Schuller. "Opensmile: the munich
versatile and fast open-source audio feature extractor." Proceedings of the 18th
ACM international conference on Multimedia. ACM, 2010.
[32] Boersma, Paul, and David Weenink. "2001. Praat. A system for doing phonetics by
computer." (1992).
[33] Hinton, Geoffrey E. "Learning distributed representations of
concepts." Proceedings of the eighth annual conference of the cognitive science
society. Vol. 1. 1986.
[34] Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine
learning research 3.Feb (2003): 1137-1155.
[35] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector
space." arXiv preprint arXiv:1301.3781(2013).
[36] Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. "Linguistic regularities in
continuous space word representations." hlt-Naacl. Vol. 13. 2013.
[37] Mohammad, Saif M., and Peter D. Turney. "Crowdsourcing a word–emotion
association lexicon." Computational Intelligence29.3 (2013): 436-465.
[38] Mohammad, Saif M., Svetlana Kiritchenko, and Xiaodan Zhu. "NRC-Canada:Building the state-of-the-art in sentiment analysis of tweets." arXiv preprint arXiv:1308.6242 (2013).
[39] Chikersal, Prerna, Soujanya Poria, and Erik Cambria. "SeNTU: Sentiment Analysis of Tweets by Combining a Rule-based Classifier with Supervised Learning." SemEval@ NAACL-HLT. 2015.
[40] Majumder, Navonil, et al. "Deep Learning-Based Document Modeling for Personality Detection from Text." IEEE Intelligent Systems 32.2 (2017): 74-79.
[41] Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014.
[42] Rosenblatt, Frank. "The perceptron: A probabilistic model for information storage and organization in the brain." Psychological review 65.6 (1958): 386.
[43] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." Cognitive modeling 5.3 (1988): 1.
[44] Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." science 313.5786 (2006): 504-507.
[45] Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the 2013 conference on empirical methods in
natural language processing. 2013.
[46] Mikolov, Tomas, et al. "Recurrent neural network based language model." Interspeech. Vol. 2. 2010.
[47] Sundermeyer, Martin, et al. "Translation Modeling with Bidirectional Recurrent Neural Networks." EMNLP. 2014.
[48] Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.
[49] Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.
[50] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
[51] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
[52] Jean, Sébastien, et al. "On using very large target vocabulary for neural machine translation." arXiv preprint arXiv:1412.2007(2014).
[53] Luong, Minh-Thang, et al. "Addressing the rare word problem in neural machine translation." arXiv preprint arXiv:1410.8206(2014).
[54] Vinyals, Oriol, et al. "Show and tell: A neural image caption
generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
[55] Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhudinov. "Unsupervised learning of video representations using lstms." International Conference on Machine Learning. 2015.
[56] Dai, Andrew M., and Quoc V. Le. "Semi-supervised sequence learning." Advances in Neural Information Processing Systems. 2015.
[57] Tseng, Shao-Yen, et al. "Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models." INTERSPEECH. 2016.
[58] Sainath, Tara N., et al. "Convolutional, long short-term memory, fully connected deep neural networks." Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.
[59] Liu, Bing. "Sentiment analysis and opinion mining." Synthesis lectures on human language technologies 5.1 (2012): 1-167.
[60] Fu, Guohong, and Xin Wang. "Chinese sentence-level sentiment classification based on fuzzy sets." Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Association for Computational Linguistics, 2010.
[61] Hashimoto, Kazuma, et al. "A joint many-task model: Growing a neural network for multiple NLP tasks." arXiv preprint arXiv:1611.01587 (2016).
[62] Lipton, Zachary C., et al. "Learning to diagnose with LSTM recurrent neural networks." arXiv preprint arXiv:1511.03677(2015).
[63] Li, Xin, et al. "Weighted multi-label classification model for sentiment analysis of online news." Big Data and Smart Computing (BigComp), 2016 International Conference on. IEEE, 2016.