簡易檢索 / 詳目顯示

研究生: 黃婕雅
Huang, Chieh Ya
論文名稱: 利用網路連續詞統計之同義詞 與重述語的自動產生方法
Automatic Generation of Synonyms and Paraphrases based on Web Grams
指導教授: 張俊盛
Jason S. Chang
口試委員: 張智星
劉顯仲
陳浩然
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 42
中文關鍵詞: 同義詞抽取重述語產生資訊檢索
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 重述語是指使用不同的文字表達相同的意思。重述在在英語寫作教學及自然語言處理上都有很重要的應用。本論文提出一套統計式網路n連詞為本的檢索同義詞與重述語的方法。此方法首先擷取大規模網路語料庫中的連結詞語法結構,並藉由排名系數、重疊系數、相互資訊等統計指標過濾詞彙找尋同義詞。我們進一步由英文單字的同義詞,運用語言搜尋引擎,擴充到片語式重述語,並透過統計式分類器,進一步篩選重述語。我們標示將近200個英文片語,來訓練分類器。我們進行實驗,使用本論文提出的系統於大規模的網路語料庫中,檢索同義詞及重述語。實驗結果顯示本論文提出的方法,能有效檢索回同義詞及重述語。


    A paraphrase is to express the same semantic content using different words. The use of paraphrases has been widely discussed in both the literature of teaching English writing and Natural Language Processing (NLP). In this paper, we introduce a new method for extracting synonyms and paraphrases for a given word or phrase based on Web-scale n-grams. In our approach, we use surface patterns to extract trigram over the Web, and filter out noises with rank ratio, overlap coefficient with Pointwise mutual information (PMI). Furthermore, we derive phrasal paraphrases from refined synonyms. In our experiments, we applied system to find phrase-level paraphrases, and trained a classifier for about 200 phrases. The experimental results show that the method has the potential to generate good paraphrases of a given phrase.

    Abstract ii Acknowledgements iii Contents v List of Figures vii List of Tables viii 1 Introduction 1 2 Related Work 5 3 Methodology 8 3.1 Problem Statement 9 3.2 Word-level Paraphrases Extraction 11 3.2.1 Extract Trigrams with Linguistic Features 11 3.2.2 Filtering Candidate Synonyms with Rank Ratio 12 3.2.3 Filtering Synonyms with Overlap Coefficient 16 3.3 Phrase-level Paraphrases Generation 19 3.4 The System RePhraser 2.0 20 4 Experiment and Evaluation 23 4.1 Experimental Setting 24 4.2 Evaluation of Word-level Paraphrases 24 4.3 Evaluation of Phrase-level Paraphrases 26 5 Conclusion and Future Work 29 Reference 31 Appendix 35 A Samples of Results of Synonyms Evaluation 35 B Samples of Results of Phrasal Paraphrase Evaluation 40

    Colin Bannard and Chris Callison-Burch. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 597-604. Association for Computational Linguistics,2005.
    Jon Barwise and John Perry. Shifting situations and shaken attitudes. Linguistics and Philosophy, 8(1):105-161, 1985.
    Joanne Boisson, Ting-Hui Kao, Jian-Cheng Wu, Tzu-Hsi Yen, and Jason S Chang. Linggle: a web-scale linguistic search engine for words in context. In ACL (Conference System Demonstrations), pages 139-144, 2013.
    Gerlof Bouma. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL, pages 31-40, 2009.
    Timothy Chklovski and Patrick Pantel. Verbocean: Mining the web for fine-grained semantic verb relations. In EMNLP, volume 4, pages 33-40, 2004.
    Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational linguistics, 16(1):22-29, 1990.
    Robert De Beaugrande. Introduction to text linguistics, 1981.
    Paul Deane. A nonparametric method for extraction of candidate phrasal terms. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 605-613. Association for Computational Linguistics, 2005.
    Stefan Evert and Brigitte Krenn. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 188-195. Association for Computational
    Linguistics, 2001.
    Gavin Fairbairn and Christopher Winch. Reading, writing and reasoning: a guide for students. McGraw-Hill Education (UK), 2011.
    Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles, and Benjamin Van Durme. Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1168-1179. Association for Computational Linguistics, 2011.
    Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. Ppdb: The paraphrase database. In HLT-NAACL, pages 758-764, 2013.
    Google. Google books ngram viewer. http://storage.googleapis.com/books/ngrams/books/datasetsv2.html, 2009. [Online; accessed 2013].
    MAK Halliday and CM Matthiessen. An introduction to functional grammar. edward arnold, london. Australian Rev. Appl. Linguist, 10(2):163-181, 1985.
    Marti A Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics-Volume 2, pages 539-545. Association for Computational Linguistics, 1992.
    Casey Keck. The use of paraphrase in summary writing: A comparison of l1 and l2 writers. Journal of Second Language Writing, 15(4):261-278, 2006.
    Adam Kilgarriff. ITRI-96-10 Putting frequencies in the dictionary, 1996.
    Stanley Kok and Chris Brockett. Hitting the right paraphrases in good time. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 145-
    153. Association for Computational Linguistics, 2010.
    Dekang Lin and Patrick Pantel. Discovery of inference rules for question-answering. Natural Language Engineering, 7(04):343-360, 2001.
    Jimmy Lin and Boris Katz. Question answering from the web using knowledge annotation and knowledge mining techniques. In Proceedings of the twelfth international conference on Information and knowledge management, pages 116-123.
    ACM, 2003.
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Effcient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013a.
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111-3119, 2013b.
    Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional clustering of english words. In Proceedings of the 31st annual meeting on Association for Computational Linguistics, pages 183-190. Association for Computational Linguistics, 1993.
    Lonneke van der Plas and Gosse Bouma. Syntactic contexts for fnding semantically related words. LOT Occasional Series, 4:173-186, 2005.
    RC Schank, RP Abelson, and Plans Scripts. Goals and understanding erlbaum. Hillsdale, NJ, 1977.
    Yoshimasa Tsuruoka and Jun'ichi Tsujii. Bidirectional inference with the easiest first strategy for tagging sequence data. In Proceedings of the conference on
    human language technology and empirical methods in natural language processing, pages 467-474. Association for Computational Linguistics, 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE