研究生: |
黃婕雅 Huang, Chieh Ya |
---|---|
論文名稱: |
利用網路連續詞統計之同義詞 與重述語的自動產生方法 Automatic Generation of Synonyms and Paraphrases based on Web Grams |
指導教授: |
張俊盛
Jason S. Chang |
口試委員: |
張智星
劉顯仲 陳浩然 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 42 |
中文關鍵詞: | 同義詞抽取 、重述語產生 、資訊檢索 |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
重述語是指使用不同的文字表達相同的意思。重述在在英語寫作教學及自然語言處理上都有很重要的應用。本論文提出一套統計式網路n連詞為本的檢索同義詞與重述語的方法。此方法首先擷取大規模網路語料庫中的連結詞語法結構,並藉由排名系數、重疊系數、相互資訊等統計指標過濾詞彙找尋同義詞。我們進一步由英文單字的同義詞,運用語言搜尋引擎,擴充到片語式重述語,並透過統計式分類器,進一步篩選重述語。我們標示將近200個英文片語,來訓練分類器。我們進行實驗,使用本論文提出的系統於大規模的網路語料庫中,檢索同義詞及重述語。實驗結果顯示本論文提出的方法,能有效檢索回同義詞及重述語。
A paraphrase is to express the same semantic content using different words. The use of paraphrases has been widely discussed in both the literature of teaching English writing and Natural Language Processing (NLP). In this paper, we introduce a new method for extracting synonyms and paraphrases for a given word or phrase based on Web-scale n-grams. In our approach, we use surface patterns to extract trigram over the Web, and filter out noises with rank ratio, overlap coefficient with Pointwise mutual information (PMI). Furthermore, we derive phrasal paraphrases from refined synonyms. In our experiments, we applied system to find phrase-level paraphrases, and trained a classifier for about 200 phrases. The experimental results show that the method has the potential to generate good paraphrases of a given phrase.
Colin Bannard and Chris Callison-Burch. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 597-604. Association for Computational Linguistics,2005.
Jon Barwise and John Perry. Shifting situations and shaken attitudes. Linguistics and Philosophy, 8(1):105-161, 1985.
Joanne Boisson, Ting-Hui Kao, Jian-Cheng Wu, Tzu-Hsi Yen, and Jason S Chang. Linggle: a web-scale linguistic search engine for words in context. In ACL (Conference System Demonstrations), pages 139-144, 2013.
Gerlof Bouma. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL, pages 31-40, 2009.
Timothy Chklovski and Patrick Pantel. Verbocean: Mining the web for fine-grained semantic verb relations. In EMNLP, volume 4, pages 33-40, 2004.
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational linguistics, 16(1):22-29, 1990.
Robert De Beaugrande. Introduction to text linguistics, 1981.
Paul Deane. A nonparametric method for extraction of candidate phrasal terms. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 605-613. Association for Computational Linguistics, 2005.
Stefan Evert and Brigitte Krenn. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 188-195. Association for Computational
Linguistics, 2001.
Gavin Fairbairn and Christopher Winch. Reading, writing and reasoning: a guide for students. McGraw-Hill Education (UK), 2011.
Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles, and Benjamin Van Durme. Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1168-1179. Association for Computational Linguistics, 2011.
Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. Ppdb: The paraphrase database. In HLT-NAACL, pages 758-764, 2013.
Google. Google books ngram viewer. http://storage.googleapis.com/books/ngrams/books/datasetsv2.html, 2009. [Online; accessed 2013].
MAK Halliday and CM Matthiessen. An introduction to functional grammar. edward arnold, london. Australian Rev. Appl. Linguist, 10(2):163-181, 1985.
Marti A Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics-Volume 2, pages 539-545. Association for Computational Linguistics, 1992.
Casey Keck. The use of paraphrase in summary writing: A comparison of l1 and l2 writers. Journal of Second Language Writing, 15(4):261-278, 2006.
Adam Kilgarriff. ITRI-96-10 Putting frequencies in the dictionary, 1996.
Stanley Kok and Chris Brockett. Hitting the right paraphrases in good time. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 145-
153. Association for Computational Linguistics, 2010.
Dekang Lin and Patrick Pantel. Discovery of inference rules for question-answering. Natural Language Engineering, 7(04):343-360, 2001.
Jimmy Lin and Boris Katz. Question answering from the web using knowledge annotation and knowledge mining techniques. In Proceedings of the twelfth international conference on Information and knowledge management, pages 116-123.
ACM, 2003.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Effcient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013a.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111-3119, 2013b.
Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional clustering of english words. In Proceedings of the 31st annual meeting on Association for Computational Linguistics, pages 183-190. Association for Computational Linguistics, 1993.
Lonneke van der Plas and Gosse Bouma. Syntactic contexts for fnding semantically related words. LOT Occasional Series, 4:173-186, 2005.
RC Schank, RP Abelson, and Plans Scripts. Goals and understanding erlbaum. Hillsdale, NJ, 1977.
Yoshimasa Tsuruoka and Jun'ichi Tsujii. Bidirectional inference with the easiest first strategy for tagging sequence data. In Proceedings of the conference on
human language technology and empirical methods in natural language processing, pages 467-474. Association for Computational Linguistics, 2005.