研究生: |
葉玫慈 Yeh, Mei-Cih |
---|---|
論文名稱: |
重述語的自動生成與改錯 Automatic Generation of Phrasal Paraphrases and Corrections |
指導教授: |
張俊盛
Chang, Jyun-Sheng |
口試委員: |
蘇豐文
Soo, Von-Wun 陳浩然 Chen, Hao-Jan |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 49 |
中文關鍵詞: | 關鍵詞抽取 、重述語產生 、詞向量 |
外文關鍵詞: | Synonym Extraction, Paraphrase Generation, Word Embedding |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出ㄧ基於單語語料庫抽取同義詞並自動生成片語式重述語的方法。我們利用同義詞替換給定片語中的所有實詞,並使用N元語法來證實重述語的可用性。此方法採用語法規則與詞向量,抽取可能的同義詞並過濾非同義詞。在執行期間,系統動態替換給定片語中的每個實詞來產生重述語候補,並利用多重統計式同義詞測量法、詞向量、N元語法作為重述語排序的依據。我們提出ㄧ重述語系統的原型,Rephrase2.0 (http://ironman.nlpweb.org:13142/),採用網路規模的語料訓練,作為實踐此論文方法的依據。實驗結果證實結合語法規則與詞向量,可以自動生成品質良好之重述語,對於語言參照和第二語言學習有一定的幫助。
We introduce a new method for automatically generating phrasal paraphrases based on synonyms extracted from the monolingual corpus. In our approach, each content word in a given phrase is replaced with synonyms and then validated using Ngrams. The method involves extracting and filtering synonymous relations based on surface patterns and word embedding. At run-time, content words in the given phrase are replaced with synonyms to derive candidate paraphrases, and re-ranking is performed on the candidates based on synonym measures, word embedding, and Ngram statistics. We present a prototype paraphrasing system, Rephraser2.0 available at http://ironman.nlpweb.org:13142/, that applies the method to a Web scale corpus. Our methodology clearly supports combining surface patterns and word embedding for generating paraphrases useful for language reference and second-language learning.
Colin Bannard and Chris Callison-Burch. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 597–604. Association for Computational Linguistics, 2005.
Jon Barwise and John Perry. Shifting situations and shaken attitudes. Linguistics and Philosophy, 8(1):105–161, 1985. Regina Barzilay and Kathleen R Mckeown. Information Fusion for Multidocument Summerization: Paraphrasing and Generation. PhD thesis, Columbia University, 2003.
Joanne Boisson, Ting-Hui Kao, Jian-Cheng Wu, Tzu-Hsi Yen, and Jason S Chang. Linggle: a web-scale linguistic search engine for words in context. In ACL (Conference System Demonstrations), pages 139–144, 2013.
Thorsten Brants and Alex Franz. Web 1T 5-gram Version 1. Linguistic Data Consortium. Philadelphia: Linguistic Data Consortium, 2006. Chris Callison-Burch, Philipp Koehn, and Miles Osborne. Improved statistical machine translation using paraphrases. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL’06, pages 17–24, Stroudsburg, PA, USA, 2006. Association
for Computational Linguistics. doi: 10.3115/1220835.1220838. URL http://dx.doi.org/10.3115/1220835.1220838.
Timothy Chklovski and Patrick Pantel. Verbocean: Mining the web for fine-grained semantic verb relations. In EMNLP, volume 4, pages 33–40, 2004. David Crystal. English as a global language. Cambridge university press, 2012.
James R Curran and Marc Moens. Improvements in automatic thesaurus extraction. In Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition-Volume 9, pages 59–66. Association for Computational Linguistics, 2002.
Paul Deane. A nonparametric method for extraction of candidate phrasal terms. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 605–613. Association for Computational Linguistics, 2005.
Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and Richard Harshman. Indexing by latent semantic analysis.
Journal of the American society for information science, 41(6):391, 1990.
Stefan Evert and Brigitte Krenn. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 188–195. Association for Computational Linguistics, 2001.
Anthony Fader, Luke S Zettlemoyer, and Oren Etzioni. Paraphrase-driven learning for open question answering. In ACL (1), pages 1608–1618, 2013.
Gavin Fairbairn and Christopher Winch. Reading, writing, and reasoning: a guide for students. McGraw-Hill Education (UK), 2011.
John R Firth. A synopsis of linguistic theory, 1930-1955.
Studies in linguistic analysis, pages 1–32, 1957.
Juri Ganitkevitch. Large-scale paraphrasing for natural language understanding. NAACL HLT SRW 2013, page 62, 2013.
Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. Ppdb: The paraphrase database. In HLT-NAACL, pages 758–764, 2013.
Zellig S Harris. Distributional structure. Word, 10(2-3):146–162, 1954.
Marti A Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics-Volume 2, pages 539–545. Association for Computational Linguistics, 1992.
Ali Ibrahim, Boris Katz, and Jimmy Lin. Extracting structural paraphrases from aligned monolingual corpora. In Proceedings of the
Second International Workshop on Paraphrasing - Volume 16, PARA-PHRASE ’03, pages 57–64, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics. doi: 10.3115/1118984.1118992. URL https://doi.org/10.3115/1118984.1118992.
Syafini Ismail and N. R. Maasum. The effects of cooperative learning in enhancing writing performance. In Proceedings of the language and culture: creating and fostering global communities, pages 400–434. SOLLS.INTEC 09 International Conference, 2009.
John S Justeson and Slava M Katz. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural language engineering, 1(01):9–27, 1995.
Adam Kilgarriff, Milos Hus ́ak, Katy McAdam, Michael Rundell, and Pavel Rychl`y. Gdex: Automatically finding good dictionary examples in a corpus. In Proc. Euralex, 2008.
Stanley Kok and Chris Brockett. Hitting the right paraphrases in good time. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 145–153. Association for Computational Linguistics, 2010.
Dekang Lin and Patrick Pantel. Discovery of inference rules for question-answering. Natural Language Engineering, 7(04):343–360, 2001.
Jimmy Lin and Boris Katz. Question answering from the web using knowledge annotation and knowledge mining techniques. In Proceedings of the twelfth international conference on Information and knowledge management, pages 116–123. ACM, 2003.
Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman, and Slav Petrov. Syntactic annotations for the google books ngram corpus. In Proceedings of the ACL 2012 system demonstrations, pages 169–174. Association for Computational Linguistics, 2012.
Yuval Marton, Chris Callison-Burch, and Philip Resnik. Improved statistical machine translation using monolingually-derived paraphrases. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP’09, pages 381–390, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. ISBN 978-1-932432-59-6. URL http://dl.acm.org/citation.cfm?id=1699510.1699560.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013a.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119, 2013b.
Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevich, and Chris Callison-Burch Ben Van Durme. Ppdb 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), Beijing, China, July 2015. Association for Computational Linguistics.
Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional clustering of english words. In Proceedings of the 31st annual meeting on Association for Computational Linguistics, pages 183–190. Association for Computational Linguistics, 1993.
Lonneke van der Plas and Gosse Bouma. Syntactic contexts for finding semantically related words. LOT Occasional Series, 4:173–186, 2005. RC Schank, RP Ablson, and Plans Scripts. Goals and understanding erlbaum. Hillsdale, NJ, 1977.
Gary F. Simons and Charles D. Fennig. Simons, gary f. and charles d. fennig (eds.). 2017. ethnologue: Languages of the world, twentieth edition. dallas, texas: Sil international. Online version: http://www. ethnologue. com, 2017. Andrew Trask, Phil Michalak, and John Liu. sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings. arXiv preprint arXiv:1511.06388, 2015.
Yoshimasa Tsuruoka and Jun’ichi Tsujii. Bidirectional inference with the easiest-first strategy for tagging sequence data. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 467–474. Association for Computational Linguistics, 2005.
Peter Wallis. Information retrieval based on paraphrase. In Proceedings of PACLING Conference, 1993.
Jian-Cheng Wu, Yu-Chia Chang, Teruko Mitamura, and Jason S Chang. Automatic collocation suggestion in academic writing. In Proceedings of the ACL 2010 Conference Short Papers, pages 115–119. Association for Computational Linguistics, 2010.
Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. A new dataset and method for automatically grading esol texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT’11, pages 180–189, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. ISBN 978-1-932432-87-9. URL http://dl.acm.org/citation.cfm?id=2002472.2002496