研究生: |
李蒂卡 Ritika Nimje |
---|---|
論文名稱: |
Pronoun Resolution in Fiction Stories Based on Background Information and Discourse Structure 基於背景資訊與篇章結構來解析小說故事中的代名詞 |
指導教授: |
蘇豐文
Soo,Von-Wun |
口試委員: |
陳宜欣
Chen,Yi Shin 陳朝欽 Chen,Chaur-Chin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2016 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 80 |
中文關鍵詞: | 背景資訊 故事 、代名詞 |
外文關鍵詞: | pronoun resolution, fictional stories, background information |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
代名詞解析是一個篇章分析中常見的工作也是自然語言處理應用上的一個重要的研究。我們藉由故事中人物的背景語意資訊來對付文章內容中代名詞的解析問題。我們也從故事中擷取有關對話中的講者的篇章規則已將故事的文章分解成不同聚叢。更精確地我們著注於出現在文章中可以共同指涉的名詞片語來挑戰如何將語境中提及的代名詞與相關的關係來解析所提及的人物。我們利用暨有的史丹福剖析器的交叉指設法技術來擷取元件、名詞片語與共同指涉的候選人。因為史丹福剖析器對代名詞的解析只能達到百分之二十一到三十五的正確率。我們提議以個增進的方法假設我們可以提供些字我加註的數據與有關人物名詞片語的背景資訊(包括人物的關係如父親、母親、女兒等)來解析代名詞。我們利用一些訣竅規則來裂解篇章結構成為許多片段並利用背景資訊來改進故事中代名詞解析的準確度與召回度。我們利用三個不同領域與風格的故事:大亨小傳 、身份的案例 哈里波特來實驗並獲的約百分之五十的準確度的改善。
Pronoun resolution is a well-known task in discourse analysis and is an important research issue in the applications of natural language processing. We tackle the problem of pronoun resolution in the textual content by leveraging background semantic information of the characters in the story and the discourse structure. We also extracted some general discourse rules from the story about the speaker of the dialogs to split the story text into clusters. Background information includes, the relationship between the main characters in the story. We extracted some general discourse rules from the story about the narrator and the speakers of dialogs to split the story text into clusters. Specifically, we focus on noun phrases that co-reference identifiable entities that appear in the text; the challenge in this context is to improve the pronoun co-reference resolution by leveraging the potential relations by which we can identify the mentions. Our system applies state-of-the-art techniques to extract entities, noun phrases, and candidate co-references that are conducted by the Stanford Parser’s co-reference resolution method. Since Stanford parser’s co-reference resolution can account for only about 21% to 35%~20.9%, ~-27.985% and ~34.98% accuracy of pronoun resolution we need, we propose an augmented approach in which we assume we could provide priorly and self (manually) annotated data (which is about 10% of a full text) to Stanford parser and utilize the semantic relatedness of noun phrases to the background information about the characters (it included the person relationship like “father”, “mother”, “daughter”, etc. about the characters) to resolve the co-references. We employ heuristic rules of splitting text into segments based on the discourse structure as well as the background information to improve the recall and precision of pronoun resolution in stories. We use three stories with different domains and different writing styles, we used “A Great Gatsby”, “A Case of Identity” and “Harry potter” in our experiments and got, there was about ~50% of improvement in precision after applying ourt methods.
1. Mira Ariel. Accessing noun-phrase antecedents. Routledge, 2014.
2. Amit Bagga and Breck Baldwin. Algorithms for scoring co-reference chains. In The first international conference on language resources and evaluation workshop on linguistics coreference, volume 1, pages 563–566. Citeseer, 1998.
3. Breck Baldwin. Cogniac: high precision co-reference with limited knowledge and linguistic resources. In Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, pages 38–45. Association for Computational Linguistics, 1997.
4. Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. Sixth Workshop on Very Large Corpora, chapter Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. 1998.
5. Volha Bryl, Claudio Giuliano, Luciano Serafini, and Kateryna Tymoshenko. Using background knowledge to support co-reference resolution. In ECAI, volume 10, pages 759–764, 2010.
6. Xiao Cheng and Dan Roth. Relational Inference for Wikification. Empirical Methods in Natural Language Processing, pages 1787–1796, 2013.
7. Pradheep Elango. Co-reference resolution: A survey. Technical report, University of Wisconsin, Madison, 2005.
1. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 363–370, Stroudsburg, PA, USA, 2005.
8.
9. Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, and Paolo Ciancarini. Automatic typing of DBpedia entities. In The Semantic Web–ISWC 2012, pages 65–81. Springer, 2012.
10. Niyu Ge, John Hale, and Eugene Charniak. A statistical approach to anaphora resolution. In Proceedings of the sixth workshop on very large corpora, volume 71, 1998.
11. Barbara J Grosz et al. The representation and use of focus in a system for understanding dialogs. In IJCAI, volume 67, page 76, 1977.
12. Barbara J Grosz, Scott Weinstein, and Aravind K Joshi. Centering: A framework for modeling the local coherence of discourse. Computational linguistics, 21(2):203– 225, 1995.
13. Aria Haghighi and Dan Klein. Simple co-reference resolution with rich syntactic and semantic features. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP ’09, pages 1152–1161, Stroudsburg, PA, USA, 2009.
14. Michael A.K. Halliday and Ruqaiya Hasan. Cohesion in English. Longman, London, 1976.
15. Sanda M Harabagiu, Rzvan C Bunescu, and Steven J Maiorano. Text and knowledge mining for co-reference resolution. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, pages 1–8, 2001.
16. 16. J Hobbs. Resolving pronoun references. In Readings in natural language processing, pages 339–352. Morgan Kaufmann Publishers Inc., 1986.
17. Neil Houlsby and Massimiliano Ciaramita. A scalable Gibbs sampler for probabilistic entity linking. In Advances in Information Retrieval, pages 335–346. Springer, 2014.
18. Shalom Lappin and Herbert J Leass. An algorithm for pronominal anaphora resolution. Computational linguistics, 20(4):535–561, 1994.
19. Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. Stanford’s multi-pass sieve co-reference resolution system at the conll-2011 shared task. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pages 28–34, 2011.
20. Xiaoqiang Luo. On co-reference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25–32, 2005
21. Edgar Meij, Krisztian Balog, and Daan Odijk. Entity linking and retrieval. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 1127–1127, New York, NY, USA, 2013. ACM.
22. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
23. Ruslan Mitkov. Robust pronoun resolution with limited knowledge. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 2, pages 869– 875, 1998.
24. Ndapandula Nakashole, Tomasz Tylenda, and Gerhard Weikum. Fine-grained semantic typing of emerging entities. In ACL (1), pages 1488–1497, 2013.
25. Vincent Ng. Machine learning for co-reference resolution: Recent successes and future challenges. Technical report, Cornell University, 2003.
26. Vincent Ng. Semantic class induction and co-reference resolution. In Association of Computational Linguistics, pages 536–543, 2007.
27. Vincent Ng. Supervised noun phrase co-reference research: The first fifteen years. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 1396–1411, 2010.
28. Heiko Paulheim and Christian Bizer. Improving the Quality of Linked Data Using Statistical Distributions. Int. J. Semantic Web Inf. Syst., 10(2):63–86, January 2014.
29. Simone Paolo Ponzetto and Michael Strube. Exploiting semantic role labeling, wordnet and wikipedia for co-reference resolution. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 192–199, 2006.
30. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted co-reference in ontonotes. In Joint Conference on EMNLP and CoNLL - Shared Task, CoNLL ’12, pages 1–40, Stroudsburg, PA, USA, 2012.
31. William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846–850, 1971.
32. M. Recasens and E. Hovy. Blanc: Implementing the rand index for co-reference evaluation. Nat. Lang. Eng., 17(4):485–510, October 2011.
33. Giuseppe Rizzo and Troncy Rapha¨el. NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data. Proceedings of the 11th International Semantic Web Conference ISWC2011, pages 1–4, 2011.
34. Candace Sidner. Focusing in the comprehension of definite anaphora. In Readings in Natural Language Processing, pages 363–394. Morgan Kaufmann Publishers Inc., 1986.
35. Candace Lee Sidner. Towards a computational theory of definite anaphora comprehension in english discourse. Technical report, DTIC Document, 1979.
36. Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. A machine learning approach to co-reference resolution of noun phrases. Computational linguistics, 27(4):521–544, 2001.
37. Michael Strube and Simone Paolo Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. In AAAI, volume 6, pages 1419–1424, 2006.
38. Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudr´e-Mauroux, and Karl Aberer. Trank: Ranking entity types using the web of data. In The Semantic Web – ISWC 2013, volume 8218 of Lecture Notes in Computer Science, pages 640–656. Springer Berlin Heidelberg, 2013.
39. Tomasz Tylenda, Mauro Sozio, and Gerhard Weikum. Einstein: Physicist or vegetarian? summarizing semantic type graphs for knowledge discovery. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW ’11, pages 273–276, New York, NY, USA, 2011. ACM.
40. Olga Uryupina, Massimo Poesio, Claudio Giuliano, and Kateryna Tymoshenko. Disambiguation and filtering methods in using web knowledge for co-reference resolution. In FLAIRS Conference, pages 317–322, 2011.
41. Kees Van Deemter and Rodger Kibble. On coreferring: Co-reference in muc and related annotation schemes. Computational linguistics, 26(4):629–637, 2000.
42. http://www.nltk.org/
43. Statistics Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett. Sentence Similarity Based on Semantic Nets and Corpus
44. Hobbs, J. R. Pronoun resolution (Research Report No. 76-1). Department of Computer Sciences, City College, City University of New York. 12, 1976.
45. Dagan, I., Justeson, J. S., Lappin, S., Leass, H. J., & Ribak, A. Syntax and lexical statistics in anaphora. Applied Artificial Intelligence, 9(6), 633-644, 1995, November)..
46. Kennedy, C., & Boguraev, B. Anaphora for everyone: pronominal anaphora resoluation without a parser. In Proceedings of the 16th conference on computational linguistics (pp. 113–118). Morristown, NJ, USA: Association for Computational Linguistics, 1996.
47. Cardie, C. Corpus-based acquisition of relative pronoun disambiguation heuristics. In Proceedings of the 30th annual meeting on association for computational linguistics (pp. 216– 223). Morristown, NJ, USA: Association for Computational Linguistics,1992.
48. Denber, M. Automatic resolution of anaphora in English (Tech. Rep.). Eastman Kodak Co.
49. Dagan, I., & Itai, A. (1990). Automatic processing of large corpora for the resolution of anaphora references. In Proceedings of the 13th conference on computational linguistics (pp. 330–332). Morristown, NJ, USA: Association for Computational Linguistics, 1998.