研究生: |
楊坤儒 Yang, Kun-Ju |
---|---|
論文名稱: |
運用網路語料庫之雙語詞彙對應 Bilingual Word Alignment Using Web as Corpus |
指導教授: |
張俊盛
Chang, Jason S. |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 41 |
中文關鍵詞: | 詞彙對應 、網路語料庫 |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們提出一個新方法,利用網路資源對雙語平行的句子做辭彙對應。我們的方法首先將平行句中的詞彙轉換成擴充查詢式,送至搜尋引擎查回多筆摘要,接著從摘要中統計出最有可能互為翻譯的單字和詞組。同時我們會計算出一些必要的特徵值,以便於能夠更精確的找出正確的辭彙對應。最後,我們從查詢回來的摘要中以特徵值對候選者進行過濾、計分與排序,挑選出最可能互為翻譯的單字和詞組。實驗結果顯示,我們的方法對於詞彙對應與克服資料稀疏的問題有顯著的效果。
We introduce a method for aligning words and phrases in a given pair of bilingual sentence using the Web as corpus. In our approach, each Chinese word and all English words are transformed into an query, sent to the search engine to retrieve mixed-code snippets. We use the returned snippets to align words and phrases such that the word and aligned word or phrase are likely to be translation counterparts. The method involves calculating features of alignment candidates, filtering candidates, scoring candidates, and ranking candidates. Finally, we select the most likely word or phrase alignment for each source word. The results show that the method can reach 53.2% recall and 88.6% precision.
Peter E. Brown, Cocke, John, Della Pietra, Stephen A., Della Pietra, Vincent J., Jelinek, Frederick, Lafferty, John D., Mercer, Robert L., and Roossin, Paul S. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2), 79-85.
Peter. E. Brown, Della Pietra, S. A., Della Pietra, V. J., and Mercer, R.L. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2): 263-311.
Dorr and J. Bonnie. 1993. Machine Translation: A View from the Lexicon. MIT Press.
Pascale Fung and Kathleen McKeown. 1997. Finding terminology translations from non-parallel corpora. In The 5th Annual Workshop on Very Large Corpora, pages 192-202,Hong Kong, Aug.
P. Fung AND L. Y. Yee 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of The 36th Annual Conference of the Association for Computational Linguistics. 414-420.
W. John Hutchins. 1995. Machine Translation: A Brief History. In Koerner, E.F.K. and Asher, R.E. eds. Concise history of the language sciences (Oxford: Pergamon), 431-445.
A. Kilgarriff and G. Grefenstette. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics, 29:333–347.
Lopez, A. 2007. A survey of statistical machine translation. Technical report, University of Maryland technical report, UMIACS-TR-2006-47.
Dekang Lin, Shaojun Zhao, Benjamin Van Durme and Marius Pas¸ca. 2008. Mining Parenthetical Translations from the Web by Word Alignment. In Proceedings of ACL-08:HLT, pages 994–1002, Columbus, Ohio, USA.
I. D. Melamed. (1997) A Word-to-Word Model of Translational Equivalence, Proceedings of the 35th Conference of the Association for Computational Linguistics. Madrid, Spain.
Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, 2004. Improved Machine Translation Performace via Parallel Sentence Extraction from Comparable Corpora. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference.
M. Nagata, T. Saito, and K. Suzuki. 2001. Using the Web as a bilingual dictionary. In Proc. of ACL DD-MT Workshop.
Och, Franz Josef and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19–51.
Li Shao and Hwee Tou Ng. Mining new word translations from comparable corpora. In Proceedings of Coling 2004, Geneva, Switzerland, pp. 618–624.
J.C. Wu, T. Lin, J.S. Chang. Learning to Find English to Chinese Transliterations on the Web. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 996–1004.