研究生: |
雷諾 Jollet De Lorenzo, Renaud |
---|---|
論文名稱: |
情緒文字分類器:運用情緒相似度強化文字組合模式之學習 Emotion Text Classification: Enhancing Patterns Learning Using Emotion Similarities |
指導教授: |
陳宜欣
Chen, Yi-Shin |
口試委員: |
蘇豐文
Soo, Von-Wun 陳朝欽 Chen, Chaur-Chin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 39 |
中文關鍵詞: | 情緒分類 、文字探勘 、文本樣式 、整體模型 |
外文關鍵詞: | Emotion Classification, Text Mining, Text Patterns, Ensemble Model |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在文本分析領域之中,情緒的文本分類是個相當困難的題目。要量化文本性的資訊可以透過數個不同的方式而達成,如藉由字詞、n-元語法或樣式分析。在此研究中,我們將集中探討一些使用基於文本樣式的情緒分類器的有趣性質。在社群網路上,如喜悅或悲傷的情緒經常被使用且精確的表達出來;然而,諸如害怕或噁心的情緒則相對稀少且形式廣泛。此一不均衡的資料型態將造成分類器在不同情緒上有著截然不同的效能。除了此資料不均衡的難點外,文本資料在微部落格和評論與短訊上也會傾向於由較為便捷的口語化詞語組成。而社會科學與心理學研究者們已有給予各情緒的定義並且描述了不同情緒間的相似度與距離。譬如說,生氣與噁心之間較開心更為接近。此論文描述了一個利用來自前人的知識以改善短文的情緒分類的方法。這有關於情緒距離和相似度的前人知識被用於對於情緒與情緒間的文本特徵的轉換之學習。此一利用知識轉換的方法為與另一個情緒分類器共決的方法,並將能夠提升此情緒分類器的效能。
我們使用排名與多層量測去和其他的方法做比較,而我們的實驗結果顯示對於稀少情緒的分類分數在多層量測與排名的測試中,我們的方法的效能皆有得到提升。
Text emotion classification is a challenging topic in the Text Mining field. Quantifying textual information can be done with various approaches using words, character n-grams or patterns. In this research, we will explore and highlight some of the interesting properties of using text-based patterns for emotion classification. Emotions like joy and sadness are often used and clearly expressed on social media; whereas, emotions such as fear or disgust are more sparse and less abundant. This unbalanced data makes the performance of the classifier inconsistent over the different emotions. In addition to this unbalanced emotion challenge, text data on micro-blog, comments, and short messages are considered quickly composed spoken language text. Social Science and Psychology researchers gave definition about emotion and describe similarities and distances between the different emotions. For instance, anger is closer to disgust than it is to joy. This paper describes an approach to use this prior knowledge in order to improve short text emotion labeling. This prior knowledge about emotion distances and similarities is used to transfer text feature learning on an emotion to other emotions. This transfer knowledge approach ensembles with another emotion classifier improve the performance of this emotion classifier. We use ranking and multi-label metrics to compare different models. Our experiments show that classification scores for rare emotions as well as multi-label and ranking performances have increased.
[1] Cynthia M Whissel. The dictionary of affect in language, emotion: Theory, research and experience: vol. 4, the measurement of emotions, r. Plutchik and H. Kellerman, Eds., New York: Academic, 1989.
[2] Robert Plutchik. A general psychoevolutionary theory of emotion. Theories of emotion, 1(3-31):4, 1980.
[3] James A. Russell. A circumplex model of affect. Journal of Personality and Social Psychology, 39(6):1161–1178, December 1980.
[4] Thomas Gilovich, Kenneth Savitsky, and Victoria Husted Medvec. The illusion of transparency: biased assessments of others’ ability to read one’s emotional states. Journal of personality and social psychology, 75(2):332, 1998.
[5] Justin Kruger, Nicholas Epley, Jason Parker, and Zhi-Wen Ng. Egocentrism over e-mail: Can we communicate as well as we think? Journal of personality and social psychology, 89(6):925, 2005.
[6] Klaus R Scherer. Toward a dynamic theory of emotion. Geneva studies in Emotion, 1:1–96, 1987.
[7] James A Russell. Culture and the categorization of emotions. Psychological bulletin, 110(3):426, 1991.
[8] Arne Ohman. Distinguishing unconscious from conscious emotional processes: ¨ Methodological considerations and theoretical implications. Handbook of cognition and emotion, pages 321–352, 1999.
[9] Klaus R Scherer. What are emotions? and how can they be measured? Social science information, 44(4):695–729, 2005.
[10] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor. Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1):32–80, Jan 2001.
[11] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013.
[12] Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016.
[13] Quoc Le and Tomas Mikolov. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1188–1196, 2014.
[14] Zellig S Harris. Distributional structure. Word, 10(2-3):146–162, 1954.
[15] Elvis Saravia, Carlos Argueta, and Yi-Shin Chen. Unsupervised graph-based pattern extraction for multilingual emotion classification. Social Network Analysis and Mining, 6(1):92, 2016.
[16] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1):11–21, 1972.
[17] Tong Zhang, Fred Damerau, and David Johnson. Text chunking based on a generalization of winnow. Journal of Machine Learning Research, 2(Mar):615–637, 2002.
[18] Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Mining multi-label data. Data mining and knowledge discovery handbook, pages 667–685, 2010.