研究生: |
陳維德 Wei-Teh Chen |
---|---|
論文名稱: |
統計式機器翻譯之句法式辭彙重排 Learning Syntactical Reordering of Source Sentences for Statistical Machine Translation |
指導教授: |
張俊盛
Jason S. Chang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 英文 |
論文頁數: | 60 |
中文關鍵詞: | 詞彙排序 、統計式機器翻譯 、句法剖析樹 |
外文關鍵詞: | Syntactical Reordering, Statistical Machine Translation, Parse Tree |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們提出一個由原文句的句法結構剖析樹,學習句法式的辭彙重排模組。我們的方法先利用句法剖析工具,對原文句進行句法剖析,並使用單字對應工具,從句對應的雙語資料中,產生單字對應的資訊。針對所產生的句法樹,我們經由節點所對應到的目標句單詞位置,決定節點重新排列到目標與順序的機率函數。此一機率函數是透過句法樹節點的各種特徵值,以及所標記的順序標記,輸入到機器學習的工具以自動學習重新排列機率。在測試時,我們先將測試句經由句法剖析工具,產生句法樹,再藉由訓練模型,對句法樹中的節點進行機率預測,產生重新排序的剖析樹,以及此句法樹的葉節點所組成的重排句。最後,我們將重排句輸入到現成的機器翻譯系統,產生翻譯句。
我們實際撰寫了程式,由香港語料中選出訓練及測試資料,比較片語式統計機器翻譯作法顯示在搭配我們的辭彙重排模型,是否有翻譯提昇的效果。實驗的結果,以BLEU分數做評估,搭配我們的模組,比未搭配高出了1%。實驗顯示搭配了我們的方法,對於統計式機器翻譯模組的辭彙重排,有正面的幫助,並改善了機器翻譯的品質。
We present a method for learning to perform syntactical reordering in machine translation. In our approach, source sentences are parsed into parse trees aimed at reordering source parse trees into reordered parse trees closer to target language structure. The method involves aligning words, parsing source sentences into parse trees, determining tree nodes reordering operation, and training a probability model using tree node features via machine learning model. At run-time, we parse the test sentence to obtain the parse trees, estimating reordering operation for each tree node using the trained model, and returning the sequence of words in reordered source parse tree to obtain reordered source sentence. We submit reordered source sentence to a state-of-the-art machine translation system for evaluation. We describe the implementation of the method using parallel Hong Kong corpus. The experiment results show that phrase-based machine translation model with our reordering model outperforms machine translation model without our reordering model in terms of BLEU score. Our methodology is clearly a step forward for producing more fluent and grammatical translation.
Peter F. Brown, John Cocke, Stephen A. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. `A Statistical Approach to Machine Translation`, in Computational Linguistics, volume 16(2): 79-85.
David Chiang. 2005. `A Hierarchical Phrase-Based Model for Statistical Machine Translation`, in Proceedings of Association for Computational Linguistics (ACL-05) pages 263-270
Dan Klein, and Christopher D. Manning. 2002. `Fast Exact Inference with a Factored Model for Natural Language Parsing`, in Advances in Neural Information Processing Systems 15.
Dan Klein, and Christopher D. Manning. 2003. `Accurate Unlexicalized Parsing`, in Proceedings of 41st Meeting of the Association for Computational Linguistics (ACL-03).
Kevin Knight. 1999. `Decoding Complexity in Word-Replacement Translation Models`, in Computational Linguistics, volume 25(4): 607-615.
Philipp Koehn. 2004. `Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Models`, in Proceeding of the Sixth Conference of the Association for Machine Translation in the Americas, pages 115-124.
Philipp Koehn, 2006. `MOSES: A beam search decoder for factored phrase-based statistical translation models`. http://www.statmt.org/moses/.
John Lafferty, Andrew McCallum, Fernando Pereira. 2001. `Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data`, in Proceedings of International Conference on Machine Learning (ICML-01), pages 591-598
Daniel Marcu, Wei Wang, Abdessamad Echihabi, Kevin Knight. 2006. `SPMT: Statistical Machine Translation with Syntactified Target Language Phrases` in Proceedings of EMNLP, pages 44-52, Sydney, Australia.
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu. 2001. `BLEU: a Method for Automatic Evaluation of Machine Translation` in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL-012), pages 311-318.
Chao Wang, Michael Collins, Philipp Koehn. 2007. `Chinese Syntactic Reordering for Statistical Machine Translation` in Proceedings of EMNLP, Prague, Czech Republic.
Kenji Yamada, Kevin Knight. 2001. `A syntax-based Statistical Translation Model` in Proceedings of the 39th Annual Conference of the Association for Computational Linguistics (ACL-01).
Hanna M. Wallach. 2004. `Conditional Random Fields: An Introduction`, in Technical Report MS-CIS-04-21, Department of Computer and Information Science, University of Pennsylvania.
Dekai Wu. 1997. `Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora` in Computational Linguistics, volume 23(3): 377-403