簡易檢索 / 詳目顯示

研究生: 安凱若
Carlos Rene Argueta Tejeda
論文名稱: Multilingual Emotion Classifier using Unsupervised Pattern Extraction from Microblog Data
非監督型的多語言情緒分類法
指導教授: 陳宜欣
Chen, Yi Shin
口試委員: 蘇豐文
Soo, Von-Wun
陳朝欽
Chen, Chaur-Chin
韓永楷
Hon, Win-Kai
張俊盛
Chang, Jason
黃從仁
Huang, Tsung-Ren
古倫維
Ku, Lun-Wei
許永真
Hsu, Jane
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2015
畢業學年度: 104
語文別: 英文
論文頁數: 60
中文關鍵詞: 社群網絡情緒
外文關鍵詞: microblog, patterns
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • The expanding social network services have led to a collateral growth of user generated content on the web. This has led to microblogs positioning themselves as a very common and popular channel of expression. In recent years these opinions have taken greater importance, for it is understood that if analyzed and interpreted correctly they can provide information useful for multiple purposes. One of these purposeful analysis is understanding how people feel or react towards a specific topic. The most basic approaches attempt to determine if a given text is an expression of positive or negative opinion towards a given subject. A more detailed alternative to tackle this need is to classify texts into defined emotion specific categories. This has made it crucial to devise algorithms to efficiently identify the emotions expressed within the opinionated content.
    Traditional emotion classifiers require extracting high dimensional feature representations which become computationally expensive to process and can be counterproductive to the accuracy of a classifier.

    In this thesis we propose an unsupervised graph-based algorithm to extract emotion bearing patterns from microblog posts. Having the extracted patterns, a classification method is defined to efficiently identify the emotions expressed in posts without depending on a predefined emotional dictionary or ontology.
    The system also considers that in these global, connected networks, generated content comes from different geographic locations, cultures and languages. It then takes advantage of the pattern extraction method which enables it to perform successfully in different languages and domains. The experimental results demonstrate the proposed system can handle English, Spanish, and French tweets with accuracy, generality, adaptability and minimal supervision.


    社群網絡服務的蓬勃發展,讓使用者會產生越來越多的社群資料,讓微網誌成為越來越普遍和流行的發聲頻道。這幾年來,這些使用者評論越來越重要,因為如果有分析並正確解釋的話,評論是多重用途的資料。

    社群研究中,其中一個分析是要了解人們對一個具體主題的感覺或反應。最基本的作法是衡量人們對主題是好感或是反感。比較詳細的方法是按照定義把資料分類到具體類別,如果能辨識出文句中的情緒,將能有助於各種分析。傳統的情緒分類方法的成本很大,精確度也不佳。

    本研究提出一個非監督式的情緒辨識技術,這個技術依賴一些圖行分析,將可以自動辨識出具有情緒概念的文字模式,而不依賴於任何預定的情緒字典或情緒本體。這個技術還可以跨地理位置、跨文化、和跨語言,只要有訓練資料就可以辨識。實驗結果證明,這個技述可以準確的處理英語、西班牙語、和法語情緒資料,也證實本技術的準確性、通用性。 

    Introduction 1 Related Work 5 2.1 Sentiment Analysis on Multiple Domains . . . . . . . . . . . . . . . . 5 2.2 Sentiment Analysis on Microblog Data . . . . . . . . . . . . . . . . . . 7 2.3 Multilingual Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 10 Overview 11 3.1 Data and Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . 13 Patterns Extraction 16 4.1 Graph-Based Word Categorization . . . . . . . . . . . . . . . . . . . . 16 4.1.1 Graph Construction . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1.2 Graph Aggregation . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1.3 Graph Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Emotion Patterns Extraction . . . . . . . . . . . . . . . . . . . . . . . 22 Emotion Classification 25 5.1 Emotion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 The Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Evaluation 31 6.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.1.1 Data and emotions: . . . . . . . . . . . . . . . . . . . . . . . . 31 6.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.2.1 Method Comparison in Multiple Languages . . . . . . . . . . . 33 6.2.2 Multiple Language Evaluation for Reduced Dictionary Size . . 36 6.2.3 Method Comparison with Reduced Dictionaries . . . . . . . . . 37 6.2.4 Classifier Adaptation . . . . . . . . . . . . . . . . . . . . . . . 39 6.2.5 Mobile App Testing . . . . . . . . . . . . . . . . . . . . . . . 41 6.2.5.1 Extended Discussion . . . . . . . . . . . . . . . . . 42 Conclusions 44

    [1] JamesWPennebaker, Cindy K Chung, Molly Ireland, Amy Gonzales, and Roger J
    Booth. The development and psychometric properties of liwc2007, 2007.
    [2] Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity
    summarization based on minimum cuts. In Proceedings of the 42nd annual
    meeting on Association for Computational Linguistics, page 271. Association
    for Computational Linguistics, 2004.
    [3] Casey Whitelaw, Navendu Garg, and Shlomo Argamon. Using appraisal groups
    for sentiment analysis. In Proceedings of the 14th ACM international conference
    on Information and knowledge management, pages 625–631. ACM, 2005.
    [4] Kerstin Denecke. Using sentiwordnet for multilingual sentiment analysis. In Data
    Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference
    on, pages 507–512. IEEE, 2008.
    [5] Jordan Boyd-Graber and Philip Resnik. Holistic sentiment analysis across languages:
    Multilingual supervised latent dirichlet allocation. In Proceedings of the
    2010 Conference on Empirical Methods in Natural Language Processing, pages
    45–55. Association for Computational Linguistics, 2010.
    [6] Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng,
    and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings
    of the 49th Annual Meeting of the Association for Computational Linguistics:
    Human Language Technologies-Volume 1, pages 142–150. Association for Computational
    Linguistics, 2011.
    [7] Xiaojun Wan. Co-training for cross-lingual sentiment classification. In Proceedings
    of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th
    International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1, pages 235–243. Association for Computational Linguistics,
    2009.
    [8] Xiaojun Wan. Using bilingual knowledge and ensemble techniques for unsupervised
    chinese sentiment analysis. In Proceedings of the Conference on Empirical
    Methods in Natural Language Processing, pages 553–561. Association for Computational
    Linguistics, 2008.
    [9] Bin Wei and Christopher Pal. Cross lingual adaptation: an experiment on sentiment
    classifications. In Proceedings of the ACL 2010 Conference Short Papers,
    pages 258–262. Association for Computational Linguistics, 2010.
    [10] Mikhail Bautin, Lohit Vijayarenu, and Steven Skiena. International sentiment
    analysis for news and blogs. In ICWSM, 2008.
    [11] Namrata Godbole, Manja Srinivasaiah, and Steven Skiena. Large-scale sentiment
    analysis for news and blogs. ICWSM, 7:21, 2007.
    [12] Carmen Banea, Rada Mihalcea, and Janyce Wiebe. Multilingual subjectivity: are
    more languages better? In Proceedings of the 23rd international conference on
    computational linguistics, pages 28–36. Association for Computational Linguistics,
    2010.
    [13] Shlomo Argamon, Moshe Koppel, and Galit Avneri. Routing documents according
    to style. In First International workshop on innovative information systems,
    pages 85–92. Citeseer, 1998.
    [14] Prem Melville, Wojciech Gryc, and Richard D Lawrence. Sentiment analysis of
    blogs by combining lexical knowledge with text classification. In Proceedings
    of the 15th ACM SIGKDD international conference on Knowledge discovery and
    data mining, pages 1275–1284. ACM, 2009.
    [15] Ainur Yessenalina, Yisong Yue, and Claire Cardie. Multi-level structured models
    for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1046–1056.
    Association for Computational Linguistics, 2010.
    [16] Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M
    Welpe. Predicting elections with twitter: What 140 characters reveal about political
    sentiment. ICWSM, 10:178–185, 2010.
    [17] Awais Athar. Sentiment analysis of citations using sentence structure-based features.
    In Proceedings of the ACL 2011 student session, pages 81–87. Association
    for Computational Linguistics, 2011.
    [18] Alexander Pak and Patrick Paroubek. Twitter as a corpus for sentiment analysis
    and opinion mining. In LREC, volume 10, pages 1320–1326, 2010.
    [19] Luciano Barbosa and Junlan Feng. Robust sentiment detection on twitter from
    biased and noisy data. In Proceedings of the 23rd International Conference on
    Computational Linguistics: Posters, pages 36–44. Association for Computational
    Linguistics, 2010.
    [20] Dmitry Davidov, Oren Tsur, and Ari Rappoport. Enhanced sentiment learning
    using twitter hashtags and smileys. In Proceedings of the 23rd International Conference
    on Computational Linguistics: Posters, pages 241–249. Association for
    Computational Linguistics, 2010.
    [21] Apoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow, and Rebecca Passonneau.
    Sentiment analysis of twitter data. In Proceedings of the Workshop on Languages
    in Social Media, pages 30–38. Association for Computational Linguistics,
    2011.
    [22] Adam Bermingham and Alan F Smeaton. Classifying sentiment in microblogs: is
    brevity an advantage? In Proceedings of the 19th ACM international conference
    on Information and knowledge management, pages 1833–1836. ACM, 2010.
    [23] Efthymios Kouloumpis, Theresa Wilson, and Johanna Moore. Twitter sentiment
    analysis: The good the bad and the omg! ICWSM, 11:538–541, 2011.
    [24] Bin Lu, Chenhao Tan, Claire Cardie, and Benjamin K Tsou. Joint bilingual sentiment
    classification with unlabeled parallel corpora. In Proceedings of the 49th Annual
    Meeting of the Association for Computational Linguistics: Human Language
    Technologies-Volume 1, pages 320–330. Association for Computational Linguistics,
    2011.
    [25] Kun-Lin Liu, Wu-Jun Li, and Minyi Guo. Emoticon smoothed language models
    for twitter sentiment analysis. In AAAI, 2012.
    [26] Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred
    Stede. Lexicon-based methods for sentiment analysis. Computational linguistics,
    37(2):267–307, 2011.
    [27] Samuel Brody and Nicholas Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!:
    using word lengthening to detect sentiment in microblogs. In Proceedings
    of the conference on empirical methods in natural language processing,
    pages 562–570. Association for Computational Linguistics, 2011.
    [28] Sascha Narr, Michael Hulfenhaus, and Sahin Albayrak. Language-independent
    twitter sentiment analysis. Knowledge Discovery and Machine Learning (KDML),
    LWA, pages 12–14, 2012.
    [29] Kirk Roberts, Michael A Roach, Joseph Johnson, Josh Guthrie, and Sanda M
    Harabagiu. Empatweet: Annotating and detecting emotions on twitter. In LREC,
    pages 3806–3813, 2012.
    [30] Dmitry Davidov and Ari Rappoport. Efficient unsupervised discovery of word
    categories using symmetric patterns and high frequency words. In Proceedings
    of the 21st International Conference on Computational Linguistics and the 44th
    annual meeting of the Association for Computational Linguistics, pages 297–304.
    Association for Computational Linguistics, 2006.
    [31] Meng-Hsuan Fu, Ling-Yu Chen, Kuan-Rong Lee, and Yaw-Huang Kuo. A novel
    opinion analysis scheme using social relationships on microblog. In Future Information
    Technology, Application, and Service, pages 687–695. Springer, 2012.
    [32] Chenhao Tan, Lillian Lee, Jie Tang, Long Jiang, Ming Zhou, and Ping Li. Userlevel
    sentiment analysis incorporating social networks. In Proceedings of the 17th
    ACM SIGKDD international conference on Knowledge discovery and data mining,
    pages 1397–1405. ACM, 2011.
    [33] Fotis Aisopos, George Papadakis, Konstantinos Tserpes, and Theodora Varvarigou.
    Content vs. context for sentiment analysis: a comparative analysis over
    microblogs. In Proceedings of the 23rd ACM conference on Hypertext and social
    media, pages 187–196. ACM, 2012.
    [34] Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu, and Tiejun Zhao. Target-dependent
    twitter sentiment classification. In Proceedings of the 49th Annual Meeting of
    the Association for Computational Linguistics: Human Language Technologies-
    Volume 1, pages 151–160. Association for Computational Linguistics, 2011.
    [35] Matteo Baldoni. From tags to emotions: Ontology-driven sentiment analysis in
    the social semantic web. Intelligenza Artificiale, 6(1):41–54, 2012.
    [36] Xiaolong Wang, Furu Wei, Xiaohua Liu, Ming Zhou, and Ming Zhang. Topic
    sentiment analysis in twitter: a graph-based hashtag sentiment classification approach.
    In Proceedings of the 20th ACM international conference on Information
    and knowledge management, pages 1031–1040. ACM, 2011.
    [37] Pedro Henrique Calais Guerra, Adriano Veloso, Wagner Meira Jr, and Virg´ılio
    Almeida. From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD international conference
    on Knowledge discovery and data mining, pages 150–158. ACM, 2011.
    [38] Anqi Cui, Min Zhang, Yiqun Liu, and Shaoping Ma. Emotion tokens: Bridging
    the gap among multilingual twitter sentiment analysis. In Information retrieval
    technology, pages 238–249. Springer, 2011.
    [39] Xia Hu, Lei Tang, Jiliang Tang, and Huan Liu. Exploiting social relations for sentiment
    analysis in microblogging. In Proceedings of the sixth ACM international
    conference on Web search and data mining, pages 537–546. ACM, 2013.
    [40] Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. Sentiment strength detection
    for the social web. Journal of the American Society for Information Science
    and Technology, 63(1):163–173, 2012.
    [41] John Blitzer, Mark Dredze, Fernando Pereira, et al. Biographies, bollywood,
    boom-boxes and blenders: Domain adaptation for sentiment classification. In
    ACL, volume 7, pages 440–447, 2007.
    [42] Saif M. Mohammad and Peter D. Turney. Crowdsourcing a word-emotion association
    lexicon. 29(3):436–465, 2013.
    [43] Saif Mohammad. #emotional tweets. In *SEM 2012: The First Joint Conference
    on Lexical and Computational Semantics – Volume 1: Proceedings of the
    main conference and the shared task, and Volume 2: Proceedings of the Sixth International
    Workshop on Semantic Evaluation (SemEval 2012), pages 246–255,
    Montr´eal, Canada, 7-8 June 2012. Association for Computational Linguistics.
    [44] Svitlana Volkova, Theresa Wilson, and David Yarowsky. Exploring sentiment in
    social media: Bootstrapping subjectivity clues from multilingual twitter streams.
    In ACL (2), pages 505–510, 2013.
    [45] Carmen Banea, Rada Mihalcea, and Janyce Wiebe. Multilingual sentiment and
    subjectivity analysis. Multilingual Natural Language Processing, 2011.
    [46] Rada Mihalcea Banea, Carmen and JanyceWiebe. Porting multilingual subjectivity
    resources across languages. T. Affective Computing, 4(2):211–225, 2013.
    [47] Alexandra Balahur and Marco Turchi. Comparative experiments for multilingual
    sentiment analysis using machine translation. In SDAD 2012 The 1st International
    Workshop on Sentiment Discovery from Affective Data, page 75, 2012.
    [48] Valentin Jijkoun and Katja Hofmann. Generating a non-english subjectivity lexicon:
    relations that matter. In Proceedings of the 12th Conference of the European
    Chapter of the Association for Computational Linguistics, pages 398–405. Association
    for Computational Linguistics, 2009.
    [49] Barbara L Fredrickson. What good are positive emotions? Review of general
    psychology, 2(3):300, 1998.
    53

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE