簡易檢索 / 詳目顯示

研究生: 陳函斌
Chen, Han-Bin
論文名稱: 學習機器翻譯中的雙語重排序模型
Learning Bilingual Linguistic Reordering Model for Statistical Machine Translation
指導教授: 張俊盛
Chang, Jason S.
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 44
中文關鍵詞: 統計式機器翻譯重排序模型最大熵值法
外文關鍵詞: statistical machine translation, reordering model, maximum entropy
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們提出一個改良的機器翻譯重排序模型,可用於以BTG為基礎的統計式機器翻譯。改良的方法,主要是利用雙語的資訊,以取出最大熵值法(Maximum Entropy)模型的特徵值。方法的主要步驟,是從對應完成的大量平行語料中,取出重排序例子,進而從中提取最大熵值法特徵值。我們利用取出的特徵值來訓練出最大熵值法重排序模型。本研究的測試方式,是利用美國NIST,在2006年及2008年所提供的測試資料,進行中文翻譯成英文的實驗。在評估方面,我們使用BLEU準則來進行評分。實驗結果顯示,我們提出的雙語資訊重排序模型,在以BLEU分數為評估標準的測試中,以顯著的差距,超越以片語為基礎的機器翻譯系統,以及用雙語單字為特徵值的BTG翻譯系統。本論文的主要貢獻在於,我們的方法使用極少量的特徵值數目,卻能獲得更高的機器翻譯品質。


    In this thesis, we propose a method for learning a reordering model for BTG-based statistical machine translation (SMT). The model focuses on linguistic features extracted from bilingual phrases. Our method involves extracting reordering examples as well as features such as part-of-speech and word class from aligned parallel sentences. The features are classified with special considerations of phrase lengths. We then use these features to train the Maximum Entropy (ME) reordering model. With the model, we performed Chinese-to-English translation tasks. Experimental results show that our bilingual linguistic model significantly outperforms the state-of-the-art phrase-based and BTG-based SMT systems, measured with BLEU scores. Our methodology not only reduce the feature size by a large margin, compared to previously proposed lexicalized reordering models, but also improves the translation quality.

    摘要 i ABSTRACT ii 致謝辭 iii Table of Contents iv List of Tables v List of Figures vi Chapter 1 Introduction 1 Chapter 2 Related Work 6 Chapter 3 Method 10 3.1 Problem Statement 12 3.2 Bilingual Linguistic Reordering Model 13 3.3 Predicting Reordering in Decoder 24 Chapter 4 Experimental Settings 28 4.1 Training Models 28 4.2 MT Systems Compared 30 4.3 Evaluation Metric and Data 31 Chapter 5 Results and Analysis 33 Chapter 6 Conclusion and Future Works 38 References 40 Appendix A - Examples of Translation Improvements 42

    David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL 2005, pp. 263-270.
    Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts.
    Philipp Koehn, Franz Joseph Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of HLT/NAACL 2003.
    Philipp Koehn. 2004. Pharaoh: a Beam Search Decoder for Phrased-Based Statistical Machine Translation Models. In Proceedings of AMTA 2004.
    Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne and David Talbot. 2005. Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation. In International Workshop on Spoken Language Translation.
    Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan,Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constrantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of ACL 2007, Demonstration Session.
    Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. In Proceedings of ACL 2003.
    Shankar Kumar and William Byrne. 2005. Local phrase reordering models for statistical machine translation. In Proceedings of HLT-EMNLP 2005.
    Wei-Yun Ma and Keh-Jiann Chen. 2003. Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff. In Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp168-171.
    Franz Josef Och. 1999. An efficient method for determining bilingual word classes. In EACL ’99: Ninth Conference of the European Chapter of the Association for Computational Linguistics, pages 71–76, Bergen, Norway, June.
    Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29:19-51.
    Franz Josef Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30:417-449.
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the ACL, pages 311–318.
    Slav Petrov and Dan Klein. 2007. Improved Inferencefor Unlexicalized Parsing. In Proceedings of HLT-NAACL 2007.
    Andreas Stolcke. 2002. SRILM – an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, volume 2, pages 901–904.
    Dekai Wu. 1996. A Polynomial-Time Algorithm for Statistical Machine Translation. In Proceedings of ACL 1996.
    Deyi Xiong, Shuanglong Li, Qun Liu, Shouxun Lin, and Yueliang Qian. 2005. Parsing the Penn Chinese treebank with semantic knowledge. In Proceedings of IJCNLP 2005, pages 70-81.
    Deyi Xiong, Qun Liu and Shouxun Lin. 2006. Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation. In Proceedings of ACL-COLING 2006.
    Deyi Xiong, Min Zhang, Aiti Aw, Haitao Mi, Qun Liu, and Shouxun Liu. 2008a. Refinements in BTG-based statistical machine translation. In Proceedings of IJCNLP 2008, pp. 505-512.
    Deyi Xiong, Min Zhang, Ai Ti Aw, and Haizhou Li. 2008b. Linguistically Annotated BTG for Statistical Machine Translation. In Proceedings of COLING 2008.
    Nianwen Xue, Fei Xia, Fu-Dong Chiou, and Martha Palmer. 2005. The Penn Chinese Treebank: Phrase structure annotation of a large corpus. Natural Language Engineering, 11(2):207–238.
    R. Zens, H. Ney, T. Watanabe, and E. Sumita. 2004. Reordering Constraints for Phrase-Based Statistical Machine Translation. In Proceedings of CoLing 2004, Geneva, Switzerland, pp. 205-211.
    Le Zhang. 2004. Maximum Entropy Modeling Toolkit for Python and C++. Available at http://homepa ges.inf.ed.ac.uk/s0450736/maxent_toolkit.html.
    Dongdong Zhang, Mu Li, Chi-Ho Li and Ming Zhou. 2007. Phrase Reordering Model Integrating Syntactic Knowledge for SMT. In Proceedings of EMNLP-CoNLL 2007.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE