片語翻譯模型為本之片語翻譯對應擷取｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	范唯耕 FAN, WEI-GENG
論文名稱：	片語翻譯模型為本之片語翻譯對應擷取 Phrase Correspondence Extraction in Bilingual Corpora Based on Phrase Translation Model
指導教授：	張俊盛
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2005
畢業學年度：	93
語文別：	英文
論文頁數：	90
中文關鍵詞：	機器翻譯、片語翻譯
外文關鍵詞：	machine translation, phrase translation
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本論文中，我們提出新的演算法與機率參數來改進 Yu (2001) 所提出的統計式片語對應與翻譯模型，並延伸應用在雙語句子中擷取片語翻譯。Yu的模型是轉化Brown（1993）的模型翻譯機率成兩個機率函數：詞彙翻譯機率、片語對應機率，並依此處理片語對應與翻譯問題。在訓練階段，相對於Yu所採用的EM演算法，我們改採Co-Training演算法來使兩個機率模型能夠較獨立的被訓練，而減輕受另一機率模型的影響；另一方面，在片語對應機率模型，增加詞性的參數，使得因為詞性所產生的不同翻譯對應也可獲得適當的估計；而在擷取片語翻譯應用上，希望在雙語語料內，有效率擷取片語翻譯提供大量且翻譯品質良好的片語語料，並進而改善機器翻譯系統的品質或是供翻譯者利用提升翻譯品質與效率。
我們實際製作了程式，以國立編譯館學術名詞語料庫中約一百萬的中英雙語詞條為語料，使用Co-Training演算法來訓練，並使用 Och 等人（2000）的評估方法來評估模型的效率，實驗結果得到96% 的精確率、94% 的召回率；此外也使用光華雜誌中英文平行語料庫抽取測試資料，評估模型在雙語句子中擷取片語翻譯的效率，實驗結果得到86% 的精確率、76% 的召回率；而相同條件下，在片語與句子內的所有實驗結果也均優於評估IBM Model 4所獲得的實驗結果。證明我們的作法不論在片語或句子內，在統計為本的架構下均可優於以詞彙為本的作法。

We introduce a method to extract the phrase correspondence from a given bilingual sentence in a bilingual corpus by using a phrase-based translation model.
In our approach, the bilingual source and target sentences are transformed into a set of phrase pairs with the purpose of selecting the phrase correspondences with maximum phrase translation probability, which include lexical translation probability (LTP) and phrase alignment probability (PAP). The method involves estimating PAP from phrases hand tagged with phrase alignment information, learning LTP from PAP and a bilingual phrase lexicon, and iteratively co-training LTP and PAP.
At run time, we generate a set of candidate translations from the target sentence for each phrase in the source sentence, calculate probabilities of candidates by using the phrase-based translation model, and determine the candidate with maximum probability as the correspondence.
We implemented the proposed method by using bilingual phrase lexicon of National Institute for Compilation and Translation and the parallel corpus of Sinorama magazine. Comparative evaluation on phrases in a set of bilingual sentences randomly chosen from the parallel corpus shows that our model outperforms the IBM Model 4.
Experimental results prove that the proposed method could efficiently improve the performance of extracting phrase correspondences in bilingual corpora and further provide valuable phrase-level translation knowledge for work on machine translation and computer assisted translation systems.

摘要    i
ABSTRACT    ii
致謝辭    iii
Table of Contents    iv
List of Tables    vi
List of Figures    vii
Chapter 1  Introduction    1
1.1  Background    1
1.2  Motivation    1
1.3  Phrase Correspondence Extraction    3
Chapter 2  Related Work    6
Chapter 3  Phrase Machine Translation    10
3.1   Problem Statement    10
3.2   Machine Translation Model    11
3.2.1  Phrase-Based Translation Model    13
3.3   Training the Phrase Translation Model    18
3.3.1  Hand Tagging Bilingual Phrases with Alignment Information    19
3.3.2  Estimating Phrase Alignment Probability according to the Seed Data    20
3.3.3  Estimating Lexical Translation Probability    22
3.3.4  Estimating Phrase Alignment Probability    26
3.3   Run Time Phrase Correspondence Extraction    28
Chapter 4  Experiments and Analysis    31
4.1   Training the Phrase Translation Model    31
4.2   Evaluation Metrics    34
4.2.1   Metric for Phrase Alignment    34
4.2.2   Metric for Phrase Correspondence Extraction    36
4.3   Evaluation Results    38
4.3.1   Evaluation for Phrase Alignment    38
4.3.2   Evaluation for Phrase Correspondence Extraction    41
Chapter 5  Conclusion and Future Work    43
References    44
Appendix A - Test Set for Phrase Alignment    47
(a)  100 bilingual phrases with respect to two word English phrases    47
(b)  200 bilingual phrases with respect to three word English phrases    50
(c)  200 bilingual phrases with respect to four word English phrases    55
Appendix B - Test Set for Phrase Correspondence Extraction    62

                                

Brown, Peter F.; Cocke, John; Della Pietra, Stephen A.; Della Pietra, Vincent J.; Jelinek, Frederick; Lafferty, John D.; Mercer, Robert L. and Roossin, Paul S.: 1990, `A statistical approach to machine translation`, in Computational Linguistics, volume 16(2): 79–85.
Brown, Peter F.; Della Pietra, Stephen A.; Della Pietra, Vincent J. and Mercer, Robert L.: 1993, `The mathematics of machine translation: Parameter estimation`, in Computational Linguistics, volume 19(2): 263–311.
Blum, A., and T. Mitchell: 1998, ‘Combining labeled and unlabeled data with co-training’, in Proceedings of COLT98, pp. 92–100
Catizone, R., G.Russell, & S. Warwick: 1989, `Deriving translation data from bilingual texts`, in Proceedings of the First International Lexical Acquisition Workshop, Detroit, USA.
Fung, F.: 1995, `A pattern matching method for finding noun and proper noun translations from noisy parallel corpora`, in Proceedings of ACL-1995, pp. 236–243
Gale, W., and K. Church: 1991, `Identifying word correspondences in parallel texts`, in Proceedings Speech and Natural Language Workshop, pp. 152–157
Hiroyuki Kaji, Y. Kida, and Y. Morimoto: 1992, `Learning Translation Templates from Bilingual Text`, in Proceedings of COLING 1992, volume 2, pp. 672–678.
Kupiec, J.,: 1993, `An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora`, in Proceedings of ACL-1993, pp. 23–30
Kumano, A., and H. Hirakawa: 1994, `Building an MT dictionary from parallel texts based on linguistic and statistical information`, in Proceedings of COLING 1994, pp. 76–81.
Kenji Imamura: 2002, `Application of translation knowledge acquired by hierarchical phrase alignment for pattern-based MT’, in Proceedings of TMI-2002, pp. 74–84.
Koehn, P., and K. Knight: 2003, `Feature-rich Statistical Translation of Noun Phrases`, in Proceedings of ACL-2003, pp. 311–318.
Koehn, P., F. J. Och, and D. Marcu: 2003, `Statistical Phrase-Based Translation`, in Proceedings of HLT/NAACL-2003, pp.127–133.
Melamed, I. D.,: 1995, `Automatic evaluation and uniform filter cascades for inducing N-best translation lexicons`, in Proceedings of the Third Workshop on Very Large Corpora, pp. 184–198.
Mitchell, T.,: 1999, ‘The role of unlabeled data in supervised learning’, in Proceedings of the Sixth International Colloquium on Cognitive Science
Meyers, Adam, Michiko Kosaka, and Ralph Grishman: 2000, `Chart-based translation rule application in machine translation`, in Proceedings of COLING-2000, pp. 537–543.
Moore, R. C.,: 2001, `Towards a simple and accurate statistical approach to learning translational relationships among words`, in Proceedings of ACL-2001, pp. 79–86.
Marcu, D., and W. Wong: 2002, `A Phrase-Based, Joint Probability Model for Statistical Machine Translation`, in Proceedings of EMNLP-2002, pp.133–139.
Och, F. J., C. Tillmann, and H. Ney: 1999: `Improved alignment models for statistical machine translation`, in Proceedings of EMNLP-WVLC 1999, pp. 20–28
Och, F. J., and H. Ney: 2000, `A Comparison of Alignment Models for Statistical Machine Translation`, in Proceedings of COLING 2000, pp. 1086–1090
Jian, J.-Y., Chang, Y.-C., and Chang, J.-S: 2004, `Collocational Translation Memory Extraction Based on Statistical and Linguistic Information.`, in ROCLING XV (ROCLING 2004)I, Taipei, Taiwan
Slocum, J.: 1985, `A Survey of Machine Translation: Its History, Current Status, and Future Prospects`, in Computational Linguistics, volume 11(1): 1–17
Smaja, F., McKeown K., and Hatzivassiloglou V.: 1996, `Translating Collocations for Bilingual Lexicons: A Statistical Approach`, in Computational Linguistics, volume 22(1): 1–38.
Ta-wei Yu: 2001, ‘A New Approach to Statistical Translation Model for Phrases’, A Thesis Presented to the National Tsing Hua University for the Degree Master of Computer Science, pp. 1–66
Tomas, J., and F. Casacuberta: 2003, `Combining phrase-based and template-based aligned models in statistical translation`, in Proceedings of IbPRIA-2003, pp. 1020–1031
Wu, D., and X. Xia: 1994, `Learning an English-Chinese lexicon from a parallel corpus`, in Proceedings of AMTA-94, pp. 206–213
Yamamoto, Kaoru and Yuji Matsumoto: 2000, `Acquisition of phrase-level bilingual correspondence using dependency structure`, in Proceedings of COLING-2000, pp. 933–939.
Yamada, K., and K. Knight: 2001, `A syntax-based statistical translation model`, in Proceedings of ACL-2001, pp. 523–530.
Zens, R., and H. Ney:2004, `Improvements in Phrase-Based Statistical Machine Translation`, in Proceedings of HLT-NAACL-2004, pp. 257–264

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文