基於圖形自動抽取改述片語演算法｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳臆淳 Chen, Yi-Chun
論文名稱：	基於圖形自動抽取改述片語演算法 A Graph-based Automatic Paraphrase Extraction Algorithm
指導教授：	張俊盛 Chang, Jason S.
口試委員:	陳信希 Chen, Hsin-Hsi 張嘉惠 Chang, Chia-Hui
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2013
畢業學年度：	101
語文別：	英文
論文頁數：	54
中文關鍵詞：	產生改述片語、圖形理論方法、權重式 PageRank 演算法、語言動機特徵
外文關鍵詞：	Paraphrase generation, Graph-based method, Weighted PageRank Algorithm, Linguistically motivated feature
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

改述片語是使用不同的描述方式來表達相同的意思。自動產生改述片語可以應用於許多自然語言處理的課題中。我們提出一個產生改述片語的方法，使得產生的片語保留原始片語的語意及語法。此方法將產生改述片語的問題轉換為圖形理論方法，圖形中包含直接及間接的改述片語關係。方法中使用多個語言學的特徵來辨識候選改述片語之間的相似度，並使用權重式PageRank演算法評估候選改述片語與原始片語的相關性。本論文使用一組常用於學術文章的片語作為評估，人工評估結果顯示我們提出的方法優於現今最佳的其他方法。文法與語意的精確度均有顯著的提升。

Paraphrases are alternative ways to express the same meaning. Automatically generating paraphrases can be applied in many of National Language Processing tasks. We propose a method for generating paraphrases which preserve the meaning and the syntax of a given phrase. In our approach, the paraphrasing problem is transformed into a graph representing direct and indirect paraphrase relations. The method involves incorporating various linguistically motivated features to reflect the similarities of paraphrase candidates, and using Weighted PageRank Algorithm to evaluate the relevance of paraphrase candidates. Evaluation on a set of phrases commonly used in research articles shows that our method significantly outperforms the state-of-the-art methods under both semantic and syntactic considerations.

摘要    i
ABSTRACT    ii
致謝辭    iii
TABLE OF CONTENTS    iv
LIST OF FIGURES    v
LIST OF TABLES    vi
CHAPTER 1 INTRODUCTION    1
CHAPTER 2 RELATED WORK    5
CHAPTER 3 METHOD    9
3.1 Problem Statement    9
3.2 Graph Construction    10
3.3 Paraphrase Generation Framework    13
3.4 Linguistically Motivated Feature    15
CHAPTER 4 EXPERIMENTAL SETTING    20
4.1 Experimental Setting and Tuning    20
4.2 Paraphrase Generation Methods Compared    23
4.3 Evaluation Data Sets and their Judgments    26
4.4 Evaluation Metrics    27
CHAPTER 5 EVALUATION RESULTS    32
CHAPTER 6 CONCLUSION AND FUTURE WORK    42
REFERENCES    44
Appendix A – Development data set    49
Appendix B – Test phrases    50
Appendix C – Sample Output and Judgments    53

                                

Bannard, C. and Callison-Burch C. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL.

Barzilay, R. and McKeown, K. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the ACL, pages 50–57.

Bhagat, R. and Ravichandran, D. 2008. Large scale acquisition of paraphrases for learning surface patterns. In Proceedings of ACL/HLT.

Callison-Burch, C. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proceedings of EMNLP, pages 196–205.

Callison-Burch, C., Koehn, P., and Osborne, M. 2006. Improved statistical machine translation using paraphrases. In Proceedings of HLT/NAACL, pages 17-24.

Carletta, J. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249–254.

Chan, T. P., Callison-Burch, C., and Durme, B. V. 2011. Reranking bilingually extracted paraphrases using monolingual distributional similarity. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pages 33-42.

Chen, M. H., Huang, S. T., Huang, C. C., Liou, H. C. and Chang, J. S. 2012. PREFER: using a graph-based approach to generate paraphrases for language learning. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 80-85.

Cover, T. M., Thomas, J. A. 1991. Elements of information theory. John Wiley & Sons.

Ganitkevitch, J., Callison-Burch, C., Napoles, C., and Durme, B. V. 2011. Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. In Proceedings of EMNLP.

Ganitkevitch, J., Durme, B. V., and Callison-Burch, C. 2012. Monolingual distributional similarity for text-to-text generation. In Proceedings of *SEM. Association for Computational Linguistics.

Järvelin, K., Kekäläinen, J. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems. 20(4), pages 422-446.

Koehn, P. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit.

Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of HLT/NAACL.

Kok, S. and Brockett, C. 2010. Hitting the right paraphrases in good time. In Proceedings of NAACL/HLT, pages 145-153.

Landis, J. R. and Koch, G. G. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159–174.

Lin, D. and Pantel, P. 2001. Discovery of inference rules for question answering. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 323–328.

Madnani, N. and Dorr, B. 2010. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics, 36(3):341–388.

Madnani, N., Ayan, N. F., Resnik, P., and Dorr, B. 2007. Using paraphrases for parameter tuning in statistical machine translation. In Proceedings of the ACL Workshop on Statistical Machine Translation.

Mckeown, K. R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J. L., Nenkova, A., Sable, C., Schiffman, B., and Sigelman, S. 2002. Tracking and summarizing news on a daily basis with Columbia’s newsblaster. In Proceedings of HLT, pages 280-285.

Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19-51.

Page, L., Brin, S., Motwani, R., Winograd, T. 1999. The PageRank citation ranking: bringing order to the web. Technical Report. pages 1999-66, Stanford University InfoLab.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 2002. Numerical recipes in C++. Cambridge University Press, Cambridge, UK.

Szpektor, I., Shnarch, E., and Dagan, I. 2007. Instance based evaluation of entailment rule acquisition. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 456-463.

Riezler, S., Vasserman, A., Tsochantaridis, I., Mittal, V., and Liu, Y. 2007. Statistical machine translation for query expansion in answer retrieval. In Proceedings of ACL.

Tsuruoka, Y., Tateishi, Y., Kim, J. D., Ohta, T., McNaught, J., Ananiadou, S., and Tsujii, J. 2005. Developing a robust part-of-speech tagger for biomedical text. In Advances in Informatics - 10th Panhellenic Conference on Informatics, LNCS 3746, pages 382–392.

Voorhees, E. M. and Tice, D. M. 1999. The TREC-8 question answering track evaluation. In Proceedings of the Eighth Text RE-trieval Conference (TREC-8), pages 84–106.

Xing, W. and Ghorbani, A. 2004. Weighted pagerank algorithm. In Proceedings of the 2nd Annual Conference on Communication Networks and Services Research, pages 305–314.

Zhao, S., Wang, H., Liu, T., and Li, S. 2008. Pivot approach for extracting paraphrase patterns from bilingual corpora. In Proceedings of ACL/HLT.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文