簡易檢索 / 詳目顯示

研究生: 徐唯槐
Peter Wei-Huai Hsu
論文名稱: Mining Domain-Specific Translations on the Web
利用網路搜尋之特定領域術語翻譯
指導教授: 張俊盛
Jason S. Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 125
中文關鍵詞: 機器翻譯術語翻譯網路語料庫擴充查詢
外文關鍵詞: Machine Translation, Terminology Translation, Web as Corpus, Query Expansion
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • We introduce a method for learning to find domain-specific translations for a given term on the Web. In our approach, the source term is transformed into an expanded query aimed at maximizing the probability of retrieving translations from a very large collection of mixed-code documents. The method involves automatically generating sets of target-language words from training data in specific domains, automatically evaluating target words for effectiveness in retrieving documents containing the sought-after translations. At run time, the given term is transformed into an expanded query and submitted to a search engine, and ranked translations are extracted from the document snippets returned by the search engine. We present a prototype search engine, TermMine, which applies the method to Web search engines. Evaluations on a set of terms show that TermMine outperforms state-of-the-art machine translation systems.


    在本論文中,我們提出一個新方法,以擷取網路上特定領域名詞的翻譯。我們的方法首先將一個原始語言的專有名詞轉換成擴充查詢式,以期增加搜尋引擎回傳含有翻譯的文件之機會,以便精確地抽取出文件摘要內的相關翻譯。我們會預先針對每一個不同的知識領域,訓練出所屬的目標語言關鍵詞。這些領域關鍵詞可以幫助我們有效的從網路上收集包含領域相關翻譯的文件資料;到了執行階段,我們便將欲翻譯的專有名詞,以領域相關關鍵字擴充成有效的查詢式,送交搜尋引擎處理,並且從查詢的結果中擷取出對應翻譯。我們將我們所提出的方法實作成了一個名為 TermMine 的翻譯系統,實驗和評估的結果顯示,所提出的方法的確可以有效地,改善特定領域名詞翻譯的效果。

    摘要 ABSTRACT Table of Contents List of Figures 1 Introduction 2 Related Work 3 The TermMine System 3.1 Problem Statement 3.2 Learning How to Transform Source Terms into Effective Queries 3.2.1 Selecting Training Terms 3.2.2 Generating and Filtering Candidate Keywords 3.2.3 Weighting and Ranking Candidate Words for Query Expansion 3.3 Extracting Translations via Query Expansion at Run Time 37 4 Experimental Settings 4.1 Training TermMine 39 4.2 Comparing Different Sources of Training Data 4.3 Translation Systems Compared 4.4 Evaluation Metrics 4.5 Web-based Relevance Feedback 4.6 Other Advanced Evaluations 4.7 Giving More Information for Google Translate 5 Evaluation Results 5.1 Comparing Training Data from Different Sources 5.2 Results of Overall Evaluations 5.3 Evaluations on OOV and DST Terms 5.4 Testing Google Translate in Sentential Context 6 Future Work and Summary References Appendix A – Wikipedia Lists of Topic Used in Experiment used from Wikipedia Appendix B – Key Words Generated Appendix C - Key Words Generated from ETDS Appendix D –Experimental Results Appendix E – Contexts Used for Testing Google Translate

    Peter E. Brown, Cocke, John, Della Pietra, Stephen A., Della Pietra, Vincent J., Jelinek, Frederick, Lafferty, John D., Mercer, Robert L., and Roossin, Paul S. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2), 79-85.

    P. F. Brown, Della Pietra, S. A., Della Pietra, V. J., and Mercer, R.L. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics,

    Dorr and J. Bonnie 1993. Machine Translation: A View from the Lexicon. MIT Press.

    W.John Hutchins. Machine Translation: A Brief History. 1995. In Koerner, E.F.K. and Asher, R.E. eds. Concise history of the language sciences (Oxford: Pergamon), 431-445.

    P. Fung AND L. Y. Yee 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of The 36th Annual Conference of the Association for Computational Linguistics. 414-420.

    M. Nagata, T. Saito, and K. Suzuki 2001. Using the Web as a bilingual dictionary. In Proc. of ACL DD-MT Workshop.

    Y. Cao and H. Li. 2002. Base noun phrase translation using web data and the EM algorithm. In Proc. Of COLING.

    W.-H. Lu, L.-F. Chien, and H.-J. Lee. 2002. Translation of Web queries using anchor text mining. ACM TALIP: 159-172.

    Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL 2003, pages 127–133.

    P. Koehn. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proceedings of the Sixth Conference of the Association for Machine Translation in the Americas, pages 115–124.

    F. Huang, Y. Zhang and Stephan Vogel. 2004. Mining Key Phrase Translations from Web Corpora. In Proc. of HLT-EMNLP, pp. 483-490.

    Li Shao and Hwee Tou Ng. Mining new word translations from comparable corpora. In Proceedings of Coling 2004, Geneva, Switzerland, pp. 618–624.

    Dragos Stefan Munteanu, Alexander Fraser and Daniel Marcu, 2004. Improved Machine Translation Performace via Parallel Sentence Extraction from Comparable Corpora. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference.

    J.C. Wu, T. Lin, J.S. Chang. Learning Source-Target Surface Patterns for Web-based Terminology Translation. ACL Interactive Poster and Demonstration Sessions.

    C.Y. Su. 2006. Binlinbual Proper Nouns Extraction through Web Mining. Master thesis, Department of Computer Science, National Chao Tung Universiy, Taiwan.

    J.C. Wu, T. Lin, J.S. Chang. Learning to Find English to Chinese Transliterations on the Web. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 996–1004.

    Lopez, A. 2007. A survey of statistical machine translation. Technical report, University of Maryland technical report, UMIACS-TR-2006-47.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE