研究生: |
王晨安 Wang, Chen-An |
---|---|
論文名稱: |
基於中文情緒構面模型之電影評論意見分析 Valence-Arousal Dimension-based Opinion Mining for Movie Reviews |
指導教授: |
許聞廉
Hsu, Wen-Lian |
口試委員: |
張詠淳
Chang, Yung-Chun 戴鴻傑 Dai, Hong-Jie 古倫維 Ku, Lun-Wei |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 55 |
中文關鍵詞: | 意見分析 、情緒構面 、情感分析 、詞嵌入 、電影評論 、本體論 |
外文關鍵詞: | Opinion Mining, Valence&Arousal Dimension, Sentiment Analysis, Word Embeddings, Movie Review, Ontology |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
自從網路2.0服務開放,越來越多的資訊充斥著我們的生活,同時,以使用者為導向的言論、意見開始透過社群網站分享交流。近年來資料探勘、機器學習、自然語言處理等領域蓬勃發展開始針對現行的資料進行語意、情緒分析。在商業行為上,幫助決策者預測市場動向及掌握潛在顧客。同時也涉及民生、醫療、氣候等多種議題。
本研究延伸 Russell (1980) 所提出的情緒構面概念,以Valence表示詞彙的正負極性,Arousal表示詞彙的情緒程度,在情緒構面中,任一中文詞彙皆有Valence值及Arousal值,分數區間為1至10。我們進一步使用情緒構面針對電影評論進行意見分析,判斷使用者是否推薦該部電影。首先我們蒐集PTT Movie版上標示「好雷」及「負雷」的電影評論文章,並將文章進行自然語言處理,最後透過分散式關鍵詞向量的方法表示訓練和測試資料的特徵。此外我們將此方法和現行在情感分析中著名的方法比較,如:LDA-SVM、Naïve Bayes、K-NN、tf-idf及Delta tf-idf用以判斷其成效。
研究結果顯示我們的方法在情緒構面的預測上表現優異,不僅可以針對新興的中文詞彙預測Valence值和Arousal值,並且在目前眾多的方法、模型中達到最優異的表現。而在電影評論的意見分析上,考量到撰文者所使用詞彙背後的情緒極性及程度性進一步幫助我們準確抓取文章的核心,在成效上勝過目前常用的情感分析方法,達到85.5%的準確率。
關鍵字:意見分析、情緒構面、本體論、詞嵌入、電影評論、情感分析
Since Web 2.0 service began, more and more information has been filled with our lives. At the same time, user’s comments and opinions began sharing through social media. Data mining, Machine learning, Natural language processing fields start to focus on semantic and emotion analysis in recent years. In business conduct, these techniques help decision maker predict market trends and discover potential customers. It also involve people’s welfare, medical, climate and other issues.
In this study, we extend the dimensional theory of emotion which Russell proposed in 1980. Valence indicates the positive and negative polarity of the word and Arousal indicates the emotion degree of the word. In Valence and Arousal dimension, any Chinese word contains Valence value and Arousal value, both value range are from 1 to 10. We further apply Valence and Arousal dimension to analyze on Chinese movie reviews. To determine whether the user recommend the movie or not, we collected movie reviews from PTT movie forum and processed them with nature language processing approaches. Finally, we use distributed keyword vectors to represent training and testing features. We also compare our method to evaluate its performance with the well-known methods such as LDA-SVM, Naïve Bayes, K-NN, tf-idf and Delta tf-idf in sentiment analysis.
The experimental results show our method can achieve the best performance on Valence and Arousal prediction. Also the method can predict unknown word’s Valence value and Arousal value . In opinion mining for movie reviews, our method can consider writer’s emotion polarity and degree. As a result, our method can help us grasp the core of the article accurately and achieve 85.3% accuracy in performance.
Keywords:Opinion Mining, Valence&Arousal Dimension, Ontology, Word Embeddings, Movie Review, Sentiment Analysis
[1] P. Kalaivani and K. L. Shunmuganathan, "SENTIMENT CLASSIFICATION OF MOVIE REVIEWS BY SUPERVISED MACHINE LEARNING APPROACHES," Indian Journal of Computer Science and Engineering, vol. 4.4 pp. 285-292, 2013.
[2] T. R. Gruber, "A translation approach to portable ontology specifications," Knowl. Acquis., vol. 5, pp. 199-220, 1993.
[3] 鍾明強, "基於Ontology架構之文件分類網路服務研究與建構," 碩士, 資訊工程學系碩博士班, 國立成功大學, 台南市, 2004.
[4] 曾新穆 and 李健興, "支援語意空間的Ontology擷取與建構技術研究," 財團法人資訊工業策進會九十一年度分包學術機構研究計畫期末報告, 2002.
[5] N. Shadbolt, T. Berners-Lee, and W. Hall, "The Semantic Web Revisited," IEEE Intelligent Systems, vol. 21, pp. 96-101, 2006.
[6] C. O. Alm, D. Roth, and R. Sproat, "Emotions from text: machine learning for text-based emotion prediction," presented at the Empirical Methods in Natural Language Processing (HLT/EMNLP), 2005.
[7] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, "Sentiment analysis of twitter data," presented at the Association for Computational Linguistics, 2011.
[8] C. Po Yung, "Automatic Ontology Construction by Using a Chinese Parser and a Lexical Knowledge Base," 2009.
[9] 宋啟聖, "詞網同義詞集的中文語意表達之研究," 碩士, 資訊科學系, 東吳大學, 台北市, 2003.
[10] 陳信裕, "利用廣義知網及維基百科於劇本文件之廣告推薦," 碩士, 資訊工程學系, 國立臺灣師範大學, 台北市, 2016.
[11] 李政儒, 游基鑫, and 陳信希, "廣義知網詞彙意見極性的預測," Computational Linguistics and Chinese Language Processing, pp. 21-36, 2012.
[12] "Discrete Emotions or Dimensions? The Role of Valence Focus and Arousal Focus," Cognition and Emotion, vol. 12, pp. 579-599, 1998.
[13] E. K. Gray and D. Watson, "Assessing positive and negative affect via self-report," Handbook of Emotion Elicitation and Assessment, pp. 171-183, 2007.
[14] R. Plutchik, "A general psychoevolutionary theory of emotion," Theories of Emotion, pp. 3-34, 1980.
[15] H. Schlosberg, "Three dimensions of emotion," Psychological Review, pp. 81-88, 1954.
[16] C. E. Osgood, "The nature and measurement of meaning," Psychological Bulletin, vol. 49, pp. 197-237, 1952.
[17] J. A. Russell, "The circumplex model of affect," Personality and Social Psychology, 39(6), pp. 1161-1178, 1980.
[18] L.-C. Yu, J. Wang, K. R. Lai, and X.-j. Zhang, "Predicting Valence-Arousal Ratings of Words Using a Weighted Graph Method," presented at the Association for Computational Linguistics, 2015.
[19] W.-C. Chou, C.-K. Lin, Y.-R. Wang, and Y.-F. Liao, "Evaluation of Weighted Graph and Neural Network Models on Predicting the Valence Arousal Ratings of Chinese Words," presented at the International Conference on Asian Language Processing, 2016.
[20] H.-Y. Wang and W.-Y. Ma, "CKIP Valence-Arousal Predictor for IALP 2016 Shared Task," presented at the International Conference on Asian Language Processing, 2016.
[21] T. H. Chang and Y. T. Siao, "Constructing a Chinese valence-arousal dictionary with multiple heterogeneous lexicons," presented at the 2016 International Conference on Asian Language Processing (IALP), 2016.
[22] J. Turian, L. Ratinov, and Y. Bengio, "Word representations: a simple and general method for semi-supervised learning," presented at the Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010.
[23] G. E. Hinton, "Learning Distributed Representations of Concepts," presented at the Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986.
[24] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," CoRR, vol. abs/1301.3781, 2013.
[25] A. Mnih and K. Kavukcuoglu, "Learning word embeddings efficiently with noise-contrastive estimation," presented at the NIPS, 2013.
[26] S.-C. Chen, H.-T. Hung, and B. Chen, "Exploring Word Embedding and Concept Information for Language Model Adaptation in Mandarin Large Vocabulary Continuous Speech Recognition," in Computational Linguistics and Speech Processing, 2015.
[27] L. Qiu, Y. Cao, Z. Nie, Y. Yu, and Y. Rui, "Learning Word Representation Considering Proximity and Ambiguity," presented at the Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.
[28] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, "Learning word vectors for sentiment analysis," presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Portland, Oregon, 2011.
[29] A. A. Altowayan and L. Tao, "Word embeddings for Arabic sentiment analysis," presented at the 2016 IEEE International Conference on Big Data (Big Data), 2016.
[30] C. H. Chu, C. A. Wang, Y. C. Chang, Y. W. Wu, Y. L. Hsieh, and W. L. Hsu, "Sentiment analysis on Chinese movie review with distributed keyword vector representation," presented at the 2016 Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2016.
[31] H.-C. Tseng, H.-T. Hung, Y.-T. Sung, and B. Chen, "基於深層類神經網路及表示學習技術之文件可讀性分類(Classification of Text Readability Based on Deep Neural Network and Representation Learning Techniques)[In Chinese]," in ROCLING, 2016.
[32] A.-L. Chiu, "Application of Data Mining Techniques to Detect the Changes of Customer Behavior," 2003.
[33] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in knowledge discovery and data mining vol. 21: AAAI press Menlo Park, 1996.
[34] P. D. Turney, "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews," presented at the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania, 2002.
[35] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? sentiment classification using machine learning techniques," presented at the Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, 2002.
[36] J. Thorsten, Text categorization with Support Vector Machines: Learning with many relevant features. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998.
[37] J. Huang, J. Lu, and C. X. Ling, "Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy," presented at the Proceedings of the Third IEEE International Conference on Data Mining, 2003.
[38] A. McCallum and K. Nigam, "A Comparison of Event Models for Naive Bayes Text Classification," presented at the Learning for Text Categorization: Papers from the 1998 AAAI Workshop, 1998.
[39] V. Metsis, I. Androutsopoulos, and G. Paliouras, "Spam Filtering with Naive Bayes - Which Naive Bayes?," presented at the CEAS, 2008.
[40] L. Shian-Hua, C. Meng Chang, H. Jan-Ming, and H. Yueh-Ming, "ACIRD: intelligent Internet document organization and retrieval," IEEE Transactions on Knowledge and Data Engineering, vol. 14, pp. 599-614, 2002.
[41] G. Salton and C. Buckley, "Term Weighting Approaches in Automatic Text Retrieval," 1987.
[42] Z. Lili and L. Chunping, Ontology Based Opinion Mining for Movie Reviews. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009.
[43] L. Devillers, L. Vidrascu, and L. Lamel, "Challenges in real-life emotion annotation and machine learning based detection," Neural Networks, vol. 18, pp. 407-422, 2005.
[44] 邱鴻達, "意見探勘在中文電影評論之應用," 碩士, 資訊科學與工程研究所, 國立交通大學, 新竹市, 2011.
[45] H. Wen Juan and C. Chuang Ping, "Sentiment Classification for Movie Reviews in Chinese Using Parsing-based Methods," presented at the International Joint Conference on Natural Language Processing, Nagoya, Japan, 2013.
[46] C. Fellbaum, "WordNet," in The Encyclopedia of Applied Linguistics, ed: John Wiley & Sons, Inc., 2012.
[47] J. Kamps, M. Marx, R. J. Mokken, and M. de Rijke, "Using WordNet to measure semantic orientation of adjectives," presented at the LREC 2004, 2004.
[48] L.-C. Yu, L.-H. Lee, and K.-F. Wong, "Overview of the IALP 2016 Shared Task on Dimensional Sentiment Analysis for Chinese Words," presented at the International Conference on Asian Language Processing, 2016.
[49] Y. L. Hsieh, S. H. Liu, Y. C. Chang, and W. L. Hsu, "Distributed keyword vector representation for document categorization," presented at the 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), 2015.
[50] J. Cohen, Statistical power analysis for the behavioral sciences. Hillsdale, N.J.: L. Erlbaum Associates, 1988.