統計式機器翻譯之句法式辭彙重排｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳維德 Wei-Teh Chen
論文名稱：	統計式機器翻譯之句法式辭彙重排 Learning Syntactical Reordering of Source Sentences for Statistical Machine Translation
指導教授：	張俊盛 Jason S. Chang
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2007
畢業學年度：	95
語文別：	英文
論文頁數：	60
中文關鍵詞：	詞彙排序、統計式機器翻譯、句法剖析樹
外文關鍵詞：	Syntactical Reordering, Statistical Machine Translation, Parse Tree
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本論文中，我們提出一個由原文句的句法結構剖析樹，學習句法式的辭彙重排模組。我們的方法先利用句法剖析工具，對原文句進行句法剖析，並使用單字對應工具，從句對應的雙語資料中，產生單字對應的資訊。針對所產生的句法樹，我們經由節點所對應到的目標句單詞位置，決定節點重新排列到目標與順序的機率函數。此一機率函數是透過句法樹節點的各種特徵值，以及所標記的順序標記，輸入到機器學習的工具以自動學習重新排列機率。在測試時，我們先將測試句經由句法剖析工具，產生句法樹，再藉由訓練模型，對句法樹中的節點進行機率預測，產生重新排序的剖析樹，以及此句法樹的葉節點所組成的重排句。最後，我們將重排句輸入到現成的機器翻譯系統，產生翻譯句。
我們實際撰寫了程式，由香港語料中選出訓練及測試資料，比較片語式統計機器翻譯作法顯示在搭配我們的辭彙重排模型，是否有翻譯提昇的效果。實驗的結果，以BLEU分數做評估，搭配我們的模組，比未搭配高出了1%。實驗顯示搭配了我們的方法，對於統計式機器翻譯模組的辭彙重排，有正面的幫助，並改善了機器翻譯的品質。

We present a method for learning to perform syntactical reordering in machine translation. In our approach, source sentences are parsed into parse trees aimed at reordering source parse trees into reordered parse trees closer to target language structure. The method involves aligning words, parsing source sentences into parse trees, determining tree nodes reordering operation, and training a probability model using tree node features via machine learning model. At run-time, we parse the test sentence to obtain the parse trees, estimating reordering operation for each tree node using the trained model, and returning the sequence of words in reordered source parse tree to obtain reordered source sentence. We submit reordered source sentence to a state-of-the-art machine translation system for evaluation. We describe the implementation of the method using parallel Hong Kong corpus. The experiment results show that phrase-based machine translation model with our reordering model outperforms machine translation model without our reordering model in terms of BLEU score. Our methodology is clearly a step forward for producing more fluent and grammatical translation.

摘要    i
Abstract    ii
致謝辭    iii
Table of Contents    iv
List of Tables    v
List of Figures    vi
Chapter 1 Introduction    1
Chapter 2 Related Work    4
Chapter 3 Syntactical Reordering Model    8
3.1 Language Model in Machine Translation    8
3.2 Problem Statement    10
3.3 Training Process    11
3.3.1 Preprocessing for Training Data    12
3.3.2 Determining Word Reordering Operation    14
3.3.3 Training a CRF Model Using All Features    17
3.4 Run-Time Reordering Estimation    23
Chapter 4 Experimental Results    25
4.1 Experimental Setup    25
4.2 Determine the Reordering Operations    28
4.3 Testing Data and Evaluation Metrics    28
4.4 Evaluation Results    29
Chapter 5 Conclusion and Future Work    31
Reference    34
Appendix A - Features List    36
Appendix B - Samples of Results    40

                                

Peter F. Brown, John Cocke, Stephen A. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. `A Statistical Approach to Machine Translation`, in Computational Linguistics, volume 16(2): 79-85.
David Chiang. 2005. `A Hierarchical Phrase-Based Model for Statistical Machine Translation`, in Proceedings of Association for Computational Linguistics (ACL-05) pages 263-270
Dan Klein, and Christopher D. Manning. 2002. `Fast Exact Inference with a Factored Model for Natural Language Parsing`, in Advances in Neural Information Processing Systems 15.
Dan Klein, and Christopher D. Manning. 2003. `Accurate Unlexicalized Parsing`, in Proceedings of 41st Meeting of the Association for Computational Linguistics (ACL-03).
Kevin Knight. 1999. `Decoding Complexity in Word-Replacement Translation Models`, in Computational Linguistics, volume 25(4): 607-615.
Philipp Koehn. 2004. `Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Models`, in Proceeding of the Sixth Conference of the Association for Machine Translation in the Americas, pages 115-124.
Philipp Koehn, 2006. `MOSES: A beam search decoder for factored phrase-based statistical translation models`. http://www.statmt.org/moses/.
John Lafferty, Andrew McCallum, Fernando Pereira. 2001. `Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data`, in Proceedings of International Conference on Machine Learning (ICML-01), pages 591-598
Daniel Marcu, Wei Wang, Abdessamad Echihabi, Kevin Knight. 2006. `SPMT: Statistical Machine Translation with Syntactified Target Language Phrases` in Proceedings of EMNLP, pages 44-52, Sydney, Australia.
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu. 2001. `BLEU: a Method for Automatic Evaluation of Machine Translation` in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL-012), pages 311-318.
Chao Wang, Michael Collins, Philipp Koehn. 2007. `Chinese Syntactic Reordering for Statistical Machine Translation` in Proceedings of EMNLP, Prague, Czech Republic.
Kenji Yamada, Kevin Knight. 2001. `A syntax-based Statistical Translation Model` in Proceedings of the 39th Annual Conference of the Association for Computational Linguistics (ACL-01).
Hanna M. Wallach. 2004. `Conditional Random Fields: An Introduction`, in Technical Report MS-CIS-04-21, Department of Computer and Information Science, University of Pennsylvania.
Dekai Wu. 1997. `Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora` in Computational Linguistics, volume 23(3): 377-403

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文