研究生: |
徐唯槐 Peter Wei-Huai Hsu |
---|---|
論文名稱: |
Mining Domain-Specific Translations on the Web 利用網路搜尋之特定領域術語翻譯 |
指導教授: |
張俊盛
Jason S. Chang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 125 |
中文關鍵詞: | 機器翻譯 、術語翻譯 、網路語料庫 、擴充查詢 |
外文關鍵詞: | Machine Translation, Terminology Translation, Web as Corpus, Query Expansion |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
We introduce a method for learning to find domain-specific translations for a given term on the Web. In our approach, the source term is transformed into an expanded query aimed at maximizing the probability of retrieving translations from a very large collection of mixed-code documents. The method involves automatically generating sets of target-language words from training data in specific domains, automatically evaluating target words for effectiveness in retrieving documents containing the sought-after translations. At run time, the given term is transformed into an expanded query and submitted to a search engine, and ranked translations are extracted from the document snippets returned by the search engine. We present a prototype search engine, TermMine, which applies the method to Web search engines. Evaluations on a set of terms show that TermMine outperforms state-of-the-art machine translation systems.
在本論文中,我們提出一個新方法,以擷取網路上特定領域名詞的翻譯。我們的方法首先將一個原始語言的專有名詞轉換成擴充查詢式,以期增加搜尋引擎回傳含有翻譯的文件之機會,以便精確地抽取出文件摘要內的相關翻譯。我們會預先針對每一個不同的知識領域,訓練出所屬的目標語言關鍵詞。這些領域關鍵詞可以幫助我們有效的從網路上收集包含領域相關翻譯的文件資料;到了執行階段,我們便將欲翻譯的專有名詞,以領域相關關鍵字擴充成有效的查詢式,送交搜尋引擎處理,並且從查詢的結果中擷取出對應翻譯。我們將我們所提出的方法實作成了一個名為 TermMine 的翻譯系統,實驗和評估的結果顯示,所提出的方法的確可以有效地,改善特定領域名詞翻譯的效果。
Peter E. Brown, Cocke, John, Della Pietra, Stephen A., Della Pietra, Vincent J., Jelinek, Frederick, Lafferty, John D., Mercer, Robert L., and Roossin, Paul S. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2), 79-85.
P. F. Brown, Della Pietra, S. A., Della Pietra, V. J., and Mercer, R.L. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics,
Dorr and J. Bonnie 1993. Machine Translation: A View from the Lexicon. MIT Press.
W.John Hutchins. Machine Translation: A Brief History. 1995. In Koerner, E.F.K. and Asher, R.E. eds. Concise history of the language sciences (Oxford: Pergamon), 431-445.
P. Fung AND L. Y. Yee 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of The 36th Annual Conference of the Association for Computational Linguistics. 414-420.
M. Nagata, T. Saito, and K. Suzuki 2001. Using the Web as a bilingual dictionary. In Proc. of ACL DD-MT Workshop.
Y. Cao and H. Li. 2002. Base noun phrase translation using web data and the EM algorithm. In Proc. Of COLING.
W.-H. Lu, L.-F. Chien, and H.-J. Lee. 2002. Translation of Web queries using anchor text mining. ACM TALIP: 159-172.
Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL 2003, pages 127–133.
P. Koehn. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proceedings of the Sixth Conference of the Association for Machine Translation in the Americas, pages 115–124.
F. Huang, Y. Zhang and Stephan Vogel. 2004. Mining Key Phrase Translations from Web Corpora. In Proc. of HLT-EMNLP, pp. 483-490.
Li Shao and Hwee Tou Ng. Mining new word translations from comparable corpora. In Proceedings of Coling 2004, Geneva, Switzerland, pp. 618–624.
Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, 2004. Improved Machine Translation Performace via Parallel Sentence Extraction from Comparable Corpora. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference.
J.C. Wu, T. Lin, J.S. Chang. Learning Source-Target Surface Patterns for Web-based Terminology Translation. ACL Interactive Poster and Demonstration Sessions.
C.Y. Su. 2006. Binlinbual Proper Nouns Extraction through Web Mining. Master thesis, Department of Computer Science, National Chao Tung Universiy, Taiwan.
J.C. Wu, T. Lin, J.S. Chang. Learning to Find English to Chinese Transliterations on the Web. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 996–1004.
Lopez, A. 2007. A survey of statistical machine translation. Technical report, University of Maryland technical report, UMIACS-TR-2006-47.