Pronoun Resolution in Fiction Stories Based on Background Information and Discourse Structure

簡易檢索 / 詳目顯示

回結果列表

研究生：	李蒂卡 Ritika Nimje
論文名稱：	Pronoun Resolution in Fiction Stories Based on Background Information and Discourse Structure 基於背景資訊與篇章結構來解析小說故事中的代名詞
指導教授：	蘇豐文 Soo,Von-Wun
口試委員:	陳宜欣 Chen,Yi Shin 陳朝欽 Chen,Chaur-Chin
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2016
畢業學年度：	105
語文別：	英文
論文頁數：	80
中文關鍵詞：	背景資訊故事、代名詞
外文關鍵詞：	pronoun resolution, fictional stories, background information
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

代名詞解析是一個篇章分析中常見的工作也是自然語言處理應用上的一個重要的研究。我們藉由故事中人物的背景語意資訊來對付文章內容中代名詞的解析問題。我們也從故事中擷取有關對話中的講者的篇章規則已將故事的文章分解成不同聚叢。更精確地我們著注於出現在文章中可以共同指涉的名詞片語來挑戰如何將語境中提及的代名詞與相關的關係來解析所提及的人物。我們利用暨有的史丹福剖析器的交叉指設法技術來擷取元件、名詞片語與共同指涉的候選人。因為史丹福剖析器對代名詞的解析只能達到百分之二十一到三十五的正確率。我們提議以個增進的方法假設我們可以提供些字我加註的數據與有關人物名詞片語的背景資訊（包括人物的關係如父親、母親、女兒等）來解析代名詞。我們利用一些訣竅規則來裂解篇章結構成為許多片段並利用背景資訊來改進故事中代名詞解析的準確度與召回度。我們利用三個不同領域與風格的故事：大亨小傳、身份的案例哈里波特來實驗並獲的約百分之五十的準確度的改善。

Pronoun resolution is a well-known task in discourse analysis and is an important research issue in the applications of natural language processing. We tackle the problem of pronoun resolution in the textual content by leveraging background semantic information of the characters in the story and the discourse structure. We also extracted some general discourse rules from the story about the speaker of the dialogs to split the story text into clusters. Background information includes, the relationship between the main characters in the story. We extracted some general discourse rules from the story about the narrator and the speakers of dialogs to split the story text into clusters. Specifically, we focus on noun phrases that co-reference identifiable entities that appear in the text; the challenge in this context is to improve the pronoun co-reference resolution by leveraging the potential relations by which we can identify the mentions. Our system applies state-of-the-art techniques to extract entities, noun phrases, and candidate co-references that are conducted by the Stanford Parser’s co-reference resolution method. Since Stanford parser’s co-reference resolution can account for only about 21% to 35%~20.9%, ~-27.985% and ~34.98% accuracy of pronoun resolution we need, we propose an augmented approach in which we assume we could provide priorly and self (manually) annotated data (which is about 10% of a full text) to Stanford parser and utilize the semantic relatedness of noun phrases to the background information about the characters (it included the person relationship like “father”, “mother”, “daughter”, etc. about the characters) to resolve the co-references. We employ heuristic rules of splitting text into segments based on the discourse structure as well as the background information to improve the recall and precision of pronoun resolution in stories. We use three stories with different domains and different writing styles, we used “A Great Gatsby”, “A Case of Identity” and “Harry potter” in our experiments and got, there was about ~50% of improvement in precision after applying ourt methods.

Table of Contents
Chapter1:    1
Introduction    1
1 The Task    2
1.1 General overview    2
1.2 Pronoun tackled    3
1.3 Aims of the system    4
2 Terminology    4
3 Related Work    8
3.1 Named Entity Recognition    8
3.2 Entity linking    8
3.3 Entity types    9
3.4 Co-reference and Anaphora    9
4 Outline    11
Chapter 2: Methodology    12
1 Method 1: Adding self-annotated data to the state-of-art method    12
1.1 System Input    12
1.2 System Overview    13
1.3 Preprocessing:    15
1.4 Adding Self-Annotated data to the clusters:    16
2 Method 2: Semantic Relatedness    17
2.1 Stanford Co-Reference Cluster Breaking    17
2.2 Semantic Annotation:    17
Table 2 shows the character relation, Column A and Column C shows the character name and Column B shows the relation with character of Column A with Column C.    18
2.3 Computing Semantic Annotation for Short Sentences    18
2.3.2 Text Similarity method    19
2.3.6 Overall Sentence Similarity    28
3 Method 3: Splitting the text using discourse structure    28
3.1 Text splitting    28
3.1 Procedure for Text splitting    31
3.2 Text Combination:    33
4 Method 4: Speaker Detection    34
4.1 Reflexive and Possessive pronoun resolution    38
5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    39
6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    40
7 Method 7: Adding self-annotated data to result text splitting result    41
Chapter 3: Experiment Evaluation:    42
1 Self-Annotated Data Set    42
2 Metrics    42
3 Case 1: A Case of Identity    44
3.1 Pronoun Resolution using state-of-art    45
3.2 Method 1: Adding self-annotated data to the state-of-art method    45
3.3 Method 2: Semantic Relatedness    46
3.4 Method 3: Splitting the text using discourse structure    47
3.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    47
3.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    48
3.7 Method 7: Adding self-annotated data to result text splitting result    49
4 Case 2: A Great Gatsby    49
4.1 Pronoun Resolution using state-of-art    49
4.2 Method 1: Adding self-annotated data to the state-of-art method    50
4.3 Method 2: Semantic Relatedness    50
4.4 Method 3: Splitting the text using discourse structure    51
4.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    52
4.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    52
4.7 Method 7: Adding self-annotated data to result text splitting result    53
5 Case 3: Harry Potter    53
5.1 Pronoun Resolution using state-of-art    53
5.2 Method 1: Adding self-annotated data to the state-of-art method    54
5.3 Method 2: Semantic Relatedness    55
5.4 Method 3: Splitting the text using discourse structure    55
5.5 Method 4: Speaker Detection    55
5.6 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    55
5.7 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    56
5.8 Method 7: Adding self-annotated data to result text splitting result    56
5 Error Analysis    56
Chapter 4: Discussion    57
Chapter 6: Conclusion:    59
References:    61
Appendix    66
Appendix A: Original Text Format:    66
Appendix B: Dialog Clusters    68
Appendix C: Dialogs (conversation or narrative conversation of a speaker)    69
Dialog with one- to-one Conversation    70
Dialog with itself    70
Appendix D: Pronoun Resolver Using Splitting of Text    71
Appendix E: Stanford Resolver    71
Chapter 1: apter1:    11
Introduction    11
1 The Task    22
1.1 General overview    22
1.2 Pronoun tackled    33
1.3 Aims of the system    44
2 Terminology    44
3 Related Work    88
3.1 Named Entity Recognition    109
3.2 Entity linking    109
3.3 Entity types    119
3.4 Co-reference and Anaphora    1110
4 Outline    1211
Chapter 2: Methodology    1212
1 Method 1: Adding self-annotated data to the state-of-art method    1212
1.1 System Input    1212
1.2 System Overview    1313
1.3 Preprocessing:    1515
1.4 Adding Self-Annotated data to the clusters:    1616
2 Method 2: Semantic Relatedness    1717
2.1 Stanford Co-Reference Cluster Breaking    1717
2.2 Semantic Annotation:    1717
2.3 Computing Semantic Annotation for Short Sentences    1818
2.3.2 Text Similarity method    1919
2.3.6 Overall Sentence Similarity    2828
3 Method 3: Splitting the text using discourse structure    2828
3.1 Text splitting    2828
3.1 Procedure for Text splitting    3131
3.2 Text Combination:    3333
4 Method 4: Speaker Detection    3434
4.1 Reflexive and Possessive pronoun resolution    3838
5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    3939
6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    4040
7 Method 7: Adding self-annotated data to result text splitting result    4141
8 Method 8: Adding self-annotated data to result of method 5    4141
Chapter 3: Experiment Evaluation:    4343
1 Self-Annotated Data Set    4343
2 Metrics    4343
3 Case 1: A Case of Identity    4545
3.1 Pronoun Resolution using state-of-art    4545
3.2 Method 1: Adding self-annotated data to the state-of-art method    4646
3.3 Method 2: Semantic Relatedness    4747
3.4 Method 3: Splitting the text using discourse structure    4848
3.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    4848
3.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    4949
3.7 Method 7: Adding self-annotated data to result text splitting result    5050
3.8 Method 8: Adding self-annotated data to result of method 5    5050
4 Case 2: A Great Gatsby    5151
4.1 Pronoun Resolution using state-of-art    5252
4.2 Method 1: Adding self-annotated data to the state-of-art method    5252
4.3 Method 2: Semantic Relatedness    5353
4.4 Method 3: Splitting the text using discourse structure    5353
4.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    5454
4.6 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    5555
4.7 Method 7: Adding self-annotated data to result text splitting result    5555
4.8 Method 8: Adding self-annotated data to result of method 5    5656
5 Case 3: Harry Potter    5757
5.1 Pronoun Resolution using state-of-art    5757
5.2 Method 1: Adding self-annotated data to the state-of-art method    5858
5.3 Method 2: Semantic Relatedness    5959
5.4 Method 3: Splitting the text using discourse structure    5959
5.5 Method 5: Combining the Personal Pronoun Resolved with Stanford parser    6060
5.7 Method 6: Semantic Relatedness of Noun Phrase detected by Stanford using character relations:    6161
5.8 Method 7: Adding self-annotated data to result text splitting result    6161
3.8 Method 8: Adding self-annotated data to result of method 5    6262
6 Error Analysis    6364
Chapter 4: Discussion    6566
Chapter 56: Conclusion:    7069
References    7170
Appendix    7675
Appendix A: Original Text Format    7675
Appendix B: Dialog Clusters    7776
Appendix C: Dialogs (conversation or narrative conversation of a speaker)    7978
Dialog with one- to-one Conversation    7978
Dialog with itself    7978
Appendix D: Pronoun Resolver Using Splitting of Text    8079
Appendix E: Stanford Resolver    8180


                                

1. Mira Ariel. Accessing noun-phrase antecedents. Routledge, 2014.
2. Amit Bagga and Breck Baldwin. Algorithms for scoring co-reference chains. In The first international conference on language resources and evaluation workshop on linguistics coreference, volume 1, pages 563–566. Citeseer, 1998.
3. Breck Baldwin. Cogniac: high precision co-reference with limited knowledge and linguistic resources. In Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, pages 38–45. Association for Computational Linguistics, 1997.
4. Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. Sixth Workshop on Very Large Corpora, chapter Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition. 1998.
5. Volha Bryl, Claudio Giuliano, Luciano Serafini, and Kateryna Tymoshenko. Using background knowledge to support co-reference resolution. In ECAI, volume 10, pages 759–764, 2010.
6. Xiao Cheng and Dan Roth. Relational Inference for Wikification. Empirical Methods in Natural Language Processing, pages 1787–1796, 2013.
7. Pradheep Elango. Co-reference resolution: A survey. Technical report, University of Wisconsin, Madison, 2005.
1. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 363–370, Stroudsburg, PA, USA, 2005.
8.
9. Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, and Paolo Ciancarini. Automatic typing of DBpedia entities. In The Semantic Web–ISWC 2012, pages 65–81. Springer, 2012.
10. Niyu Ge, John Hale, and Eugene Charniak. A statistical approach to anaphora resolution. In Proceedings of the sixth workshop on very large corpora, volume 71, 1998.
11. Barbara J Grosz et al. The representation and use of focus in a system for understanding dialogs. In IJCAI, volume 67, page 76, 1977.
12. Barbara J Grosz, Scott Weinstein, and Aravind K Joshi. Centering: A framework for modeling the local coherence of discourse. Computational linguistics, 21(2):203– 225, 1995.
13. Aria Haghighi and Dan Klein. Simple co-reference resolution with rich syntactic and semantic features. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP ’09, pages 1152–1161, Stroudsburg, PA, USA, 2009.
14. Michael A.K. Halliday and Ruqaiya Hasan. Cohesion in English. Longman, London, 1976.
15. Sanda M Harabagiu, Rzvan C Bunescu, and Steven J Maiorano. Text and knowledge mining for co-reference resolution. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, pages 1–8, 2001.
16. 16. J Hobbs. Resolving pronoun references. In Readings in natural language processing, pages 339–352. Morgan Kaufmann Publishers Inc., 1986.
17. Neil Houlsby and Massimiliano Ciaramita. A scalable Gibbs sampler for probabilistic entity linking. In Advances in Information Retrieval, pages 335–346. Springer, 2014.
18. Shalom Lappin and Herbert J Leass. An algorithm for pronominal anaphora resolution. Computational linguistics, 20(4):535–561, 1994.
19. Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. Stanford’s multi-pass sieve co-reference resolution system at the conll-2011 shared task. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task, pages 28–34, 2011.
20. Xiaoqiang Luo. On co-reference resolution performance metrics. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 25–32, 2005
21. Edgar Meij, Krisztian Balog, and Daan Odijk. Entity linking and retrieval. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 1127–1127, New York, NY, USA, 2013. ACM.
22. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
23. Ruslan Mitkov. Robust pronoun resolution with limited knowledge. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 2, pages 869– 875, 1998.
24. Ndapandula Nakashole, Tomasz Tylenda, and Gerhard Weikum. Fine-grained semantic typing of emerging entities. In ACL (1), pages 1488–1497, 2013.
25. Vincent Ng. Machine learning for co-reference resolution: Recent successes and future challenges. Technical report, Cornell University, 2003.
26. Vincent Ng. Semantic class induction and co-reference resolution. In Association of Computational Linguistics, pages 536–543, 2007.
27. Vincent Ng. Supervised noun phrase co-reference research: The first fifteen years. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 1396–1411, 2010.
28. Heiko Paulheim and Christian Bizer. Improving the Quality of Linked Data Using Statistical Distributions. Int. J. Semantic Web Inf. Syst., 10(2):63–86, January 2014.
29. Simone Paolo Ponzetto and Michael Strube. Exploiting semantic role labeling, wordnet and wikipedia for co-reference resolution. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 192–199, 2006.
30. Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted co-reference in ontonotes. In Joint Conference on EMNLP and CoNLL - Shared Task, CoNLL ’12, pages 1–40, Stroudsburg, PA, USA, 2012.
31. William M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846–850, 1971.
32. M. Recasens and E. Hovy. Blanc: Implementing the rand index for co-reference evaluation. Nat. Lang. Eng., 17(4):485–510, October 2011.
33. Giuseppe Rizzo and Troncy Rapha¨el. NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data. Proceedings of the 11th International Semantic Web Conference ISWC2011, pages 1–4, 2011.
34. Candace Sidner. Focusing in the comprehension of definite anaphora. In Readings in Natural Language Processing, pages 363–394. Morgan Kaufmann Publishers Inc., 1986.
35. Candace Lee Sidner. Towards a computational theory of definite anaphora comprehension in english discourse. Technical report, DTIC Document, 1979.
36. Wee Meng Soon, Hwee Tou Ng, and Daniel Chung Yong Lim. A machine learning approach to co-reference resolution of noun phrases. Computational linguistics, 27(4):521–544, 2001.
37. Michael Strube and Simone Paolo Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. In AAAI, volume 6, pages 1419–1424, 2006.
38. Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudr´e-Mauroux, and Karl Aberer. Trank: Ranking entity types using the web of data. In The Semantic Web – ISWC 2013, volume 8218 of Lecture Notes in Computer Science, pages 640–656. Springer Berlin Heidelberg, 2013.
39. Tomasz Tylenda, Mauro Sozio, and Gerhard Weikum. Einstein: Physicist or vegetarian? summarizing semantic type graphs for knowledge discovery. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW ’11, pages 273–276, New York, NY, USA, 2011. ACM.
40. Olga Uryupina, Massimo Poesio, Claudio Giuliano, and Kateryna Tymoshenko. Disambiguation and filtering methods in using web knowledge for co-reference resolution. In FLAIRS Conference, pages 317–322, 2011.
41. Kees Van Deemter and Rodger Kibble. On coreferring: Co-reference in muc and related annotation schemes. Computational linguistics, 26(4):629–637, 2000.
42. http://www.nltk.org/
43. Statistics Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett. Sentence Similarity Based on Semantic Nets and Corpus
44. Hobbs, J. R. Pronoun resolution (Research Report No. 76-1). Department of Computer Sciences, City College, City University of New York. 12, 1976.
45. Dagan, I., Justeson, J. S., Lappin, S., Leass, H. J., & Ribak, A. Syntax and lexical statistics in anaphora. Applied Artificial Intelligence, 9(6), 633-644, 1995, November)..
46. Kennedy, C., & Boguraev, B. Anaphora for everyone: pronominal anaphora resoluation without a parser. In Proceedings of the 16th conference on computational linguistics (pp. 113–118). Morristown, NJ, USA: Association for Computational Linguistics, 1996.
47. Cardie, C. Corpus-based acquisition of relative pronoun disambiguation heuristics. In Proceedings of the 30th annual meeting on association for computational linguistics (pp. 216– 223). Morristown, NJ, USA: Association for Computational Linguistics,1992.
48. Denber, M. Automatic resolution of anaphora in English (Tech. Rep.). Eastman Kodak Co.
49. Dagan, I., & Itai, A. (1990). Automatic processing of large corpora for the resolution of anaphora references. In Proceedings of the 13th conference on computational linguistics (pp. 330–332). Morristown, NJ, USA: Association for Computational Linguistics, 1998.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文