簡易檢索 / 詳目顯示

研究生: 范唯耕
FAN, WEI-GENG
論文名稱: 片語翻譯模型為本之片語翻譯對應擷取
Phrase Correspondence Extraction in Bilingual Corpora Based on Phrase Translation Model
指導教授: 張俊盛
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 90
中文關鍵詞: 機器翻譯片語翻譯
外文關鍵詞: machine translation, phrase translation
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們提出新的演算法與機率參數來改進 Yu (2001) 所提出的統計式片語對應與翻譯模型,並延伸應用在雙語句子中擷取片語翻譯。Yu的模型是轉化Brown(1993)的模型翻譯機率成兩個機率函數:詞彙翻譯機率、片語對應機率,並依此處理片語對應與翻譯問題。在訓練階段,相對於Yu所採用的EM演算法,我們改採Co-Training演算法來使兩個機率模型能夠較獨立的被訓練,而減輕受另一機率模型的影響;另一方面,在片語對應機率模型,增加詞性的參數,使得因為詞性所產生的不同翻譯對應也可獲得適當的估計;而在擷取片語翻譯應用上,希望在雙語語料內,有效率擷取片語翻譯提供大量且翻譯品質良好的片語語料,並進而改善機器翻譯系統的品質或是供翻譯者利用提升翻譯品質與效率。
    我們實際製作了程式,以國立編譯館學術名詞語料庫中約一百萬的中英雙語詞條為語料,使用Co-Training演算法來訓練,並使用 Och 等人(2000)的評估方法來評估模型的效率,實驗結果得到96% 的精確率、94% 的召回率;此外也使用光華雜誌中英文平行語料庫抽取測試資料,評估模型在雙語句子中擷取片語翻譯的效率,實驗結果得到86% 的精確率、76% 的召回率;而相同條件下,在片語與句子內的所有實驗結果也均優於評估IBM Model 4所獲得的實驗結果。證明我們的作法不論在片語或句子內,在統計為本的架構下均可優於以詞彙為本的作法。


    We introduce a method to extract the phrase correspondence from a given bilingual sentence in a bilingual corpus by using a phrase-based translation model.
    In our approach, the bilingual source and target sentences are transformed into a set of phrase pairs with the purpose of selecting the phrase correspondences with maximum phrase translation probability, which include lexical translation probability (LTP) and phrase alignment probability (PAP). The method involves estimating PAP from phrases hand tagged with phrase alignment information, learning LTP from PAP and a bilingual phrase lexicon, and iteratively co-training LTP and PAP.
    At run time, we generate a set of candidate translations from the target sentence for each phrase in the source sentence, calculate probabilities of candidates by using the phrase-based translation model, and determine the candidate with maximum probability as the correspondence.
    We implemented the proposed method by using bilingual phrase lexicon of National Institute for Compilation and Translation and the parallel corpus of Sinorama magazine. Comparative evaluation on phrases in a set of bilingual sentences randomly chosen from the parallel corpus shows that our model outperforms the IBM Model 4.
    Experimental results prove that the proposed method could efficiently improve the performance of extracting phrase correspondences in bilingual corpora and further provide valuable phrase-level translation knowledge for work on machine translation and computer assisted translation systems.

    摘要 i ABSTRACT ii 致謝辭 iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation 1 1.3 Phrase Correspondence Extraction 3 Chapter 2 Related Work 6 Chapter 3 Phrase Machine Translation 10 3.1 Problem Statement 10 3.2 Machine Translation Model 11 3.2.1 Phrase-Based Translation Model 13 3.3 Training the Phrase Translation Model 18 3.3.1 Hand Tagging Bilingual Phrases with Alignment Information 19 3.3.2 Estimating Phrase Alignment Probability according to the Seed Data 20 3.3.3 Estimating Lexical Translation Probability 22 3.3.4 Estimating Phrase Alignment Probability 26 3.3 Run Time Phrase Correspondence Extraction 28 Chapter 4 Experiments and Analysis 31 4.1 Training the Phrase Translation Model 31 4.2 Evaluation Metrics 34 4.2.1 Metric for Phrase Alignment 34 4.2.2 Metric for Phrase Correspondence Extraction 36 4.3 Evaluation Results 38 4.3.1 Evaluation for Phrase Alignment 38 4.3.2 Evaluation for Phrase Correspondence Extraction 41 Chapter 5 Conclusion and Future Work 43 References 44 Appendix A - Test Set for Phrase Alignment 47 (a) 100 bilingual phrases with respect to two word English phrases 47 (b) 200 bilingual phrases with respect to three word English phrases 50 (c) 200 bilingual phrases with respect to four word English phrases 55 Appendix B - Test Set for Phrase Correspondence Extraction 62

    Brown, Peter F.; Cocke, John; Della Pietra, Stephen A.; Della Pietra, Vincent J.; Jelinek, Frederick; Lafferty, John D.; Mercer, Robert L. and Roossin, Paul S.: 1990, `A statistical approach to machine translation`, in Computational Linguistics, volume 16(2): 79–85.
    Brown, Peter F.; Della Pietra, Stephen A.; Della Pietra, Vincent J. and Mercer, Robert L.: 1993, `The mathematics of machine translation: Parameter estimation`, in Computational Linguistics, volume 19(2): 263–311.
    Blum, A., and T. Mitchell: 1998, ‘Combining labeled and unlabeled data with co-training’, in Proceedings of COLT98, pp. 92–100
    Catizone, R., G.Russell, & S. Warwick: 1989, `Deriving translation data from bilingual texts`, in Proceedings of the First International Lexical Acquisition Workshop, Detroit, USA.
    Fung, F.: 1995, `A pattern matching method for finding noun and proper noun translations from noisy parallel corpora`, in Proceedings of ACL-1995, pp. 236–243
    Gale, W., and K. Church: 1991, `Identifying word correspondences in parallel texts`, in Proceedings Speech and Natural Language Workshop, pp. 152–157
    Hiroyuki Kaji, Y. Kida, and Y. Morimoto: 1992, `Learning Translation Templates from Bilingual Text`, in Proceedings of COLING 1992, volume 2, pp. 672–678.
    Kupiec, J.,: 1993, `An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora`, in Proceedings of ACL-1993, pp. 23–30
    Kumano, A., and H. Hirakawa: 1994, `Building an MT dictionary from parallel texts based on linguistic and statistical information`, in Proceedings of COLING 1994, pp. 76–81.
    Kenji Imamura: 2002, `Application of translation knowledge acquired by hierarchical phrase alignment for pattern-based MT’, in Proceedings of TMI-2002, pp. 74–84.
    Koehn, P., and K. Knight: 2003, `Feature-rich Statistical Translation of Noun Phrases`, in Proceedings of ACL-2003, pp. 311–318.
    Koehn, P., F. J. Och, and D. Marcu: 2003, `Statistical Phrase-Based Translation`, in Proceedings of HLT/NAACL-2003, pp.127–133.
    Melamed, I. D.,: 1995, `Automatic evaluation and uniform filter cascades for inducing N-best translation lexicons`, in Proceedings of the Third Workshop on Very Large Corpora, pp. 184–198.
    Mitchell, T.,: 1999, ‘The role of unlabeled data in supervised learning’, in Proceedings of the Sixth International Colloquium on Cognitive Science
    Meyers, Adam, Michiko Kosaka, and Ralph Grishman: 2000, `Chart-based translation rule application in machine translation`, in Proceedings of COLING-2000, pp. 537–543.
    Moore, R. C.,: 2001, `Towards a simple and accurate statistical approach to learning translational relationships among words`, in Proceedings of ACL-2001, pp. 79–86.
    Marcu, D., and W. Wong: 2002, `A Phrase-Based, Joint Probability Model for Statistical Machine Translation`, in Proceedings of EMNLP-2002, pp.133–139.
    Och, F. J., C. Tillmann, and H. Ney: 1999: `Improved alignment models for statistical machine translation`, in Proceedings of EMNLP-WVLC 1999, pp. 20–28
    Och, F. J., and H. Ney: 2000, `A Comparison of Alignment Models for Statistical Machine Translation`, in Proceedings of COLING 2000, pp. 1086–1090
    Jian, J.-Y., Chang, Y.-C., and Chang, J.-S: 2004, `Collocational Translation Memory Extraction Based on Statistical and Linguistic Information.`, in ROCLING XV (ROCLING 2004)I, Taipei, Taiwan
    Slocum, J.: 1985, `A Survey of Machine Translation: Its History, Current Status, and Future Prospects`, in Computational Linguistics, volume 11(1): 1–17
    Smaja, F., McKeown K., and Hatzivassiloglou V.: 1996, `Translating Collocations for Bilingual Lexicons: A Statistical Approach`, in Computational Linguistics, volume 22(1): 1–38.
    Ta-wei Yu: 2001, ‘A New Approach to Statistical Translation Model for Phrases’, A Thesis Presented to the National Tsing Hua University for the Degree Master of Computer Science, pp. 1–66
    Tomas, J., and F. Casacuberta: 2003, `Combining phrase-based and template-based aligned models in statistical translation`, in Proceedings of IbPRIA-2003, pp. 1020–1031
    Wu, D., and X. Xia: 1994, `Learning an English-Chinese lexicon from a parallel corpus`, in Proceedings of AMTA-94, pp. 206–213
    Yamamoto, Kaoru and Yuji Matsumoto: 2000, `Acquisition of phrase-level bilingual correspondence using dependency structure`, in Proceedings of COLING-2000, pp. 933–939.
    Yamada, K., and K. Knight: 2001, `A syntax-based statistical translation model`, in Proceedings of ACL-2001, pp. 523–530.
    Zens, R., and H. Ney:2004, `Improvements in Phrase-Based Statistical Machine Translation`, in Proceedings of HLT-NAACL-2004, pp. 257–264

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE