簡易檢索 / 詳目顯示

研究生: 陳郁儒
Chen, Yu-Ru
論文名稱: 利用網路搜尋搭配詞翻譯
Mining Bilingual Collocations on the Web
指導教授: 張俊盛
Chang, Jason S.
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 53
中文關鍵詞: 搭配詞翻譯查詢擴展機器輔助翻譯
外文關鍵詞: collocation translation, query expansion, computer-assisted translation
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文描述一以擴展查詢為本之方法,自動學習透過有效擴展查詢,自網路語料中擷取搭配詞之翻譯。此方法對一給定之中文搭配詞,自動學習擴展查詢之詞彙,用以透過搜尋引擎搜尋翻譯。在訓練階段,我們利用平行語料取得語料翻譯及訓練擴展查詢詞彙,並利用網路資源驗證其有效性。在執行階段,輸入之中文搭配詞自動轉換成一組查詢字串,並傳送至搜尋引擎,再擷取候選翻譯,最後利用相似度過濾候選翻譯及排序,並呈現可能的翻譯。除了平行語料庫含有的翻譯外,本方法可在網路搜尋到更多參考翻譯。實驗結果顯示,不論對第二外語學習者、翻譯者、機器翻譯系統都有所幫助。


    In this paper, we introduce a new method for learning to find translation equivalents of a given collocation on the Web based on the query expansion strategy. Our approach involves finding translations in a parallel corpus and learning query expansion terms for the given collocation in order to bias search engines towards returning the top-ranked snippets containing sought-after translations. We utilized the corpus translations from parallel corpus and attempt to learn additional QE terms for retrieving more translations on the Web. The query expansion method is trained on a parallel corpus and validated on the Web. At run time, a given collocation is automatically transformed into a set of queries and sent to a search engine. Then candidate translations are retrieved from the returned snippets and ranked according to their similarity with respect to the corpus translations. Our method provides significantly more translation equivalents from the Web in addition to translations found in parallel corpus, which could be used to assist language learners, translator, and the development of machine translation systems.

    摘要 i ABSTRACT ii 致謝辭 iii TABLE OF CONTENTS iv LIST OF FIGURES v LIST OF TABLES vi CHAPTER 1 INTRODUCTION 1 CHAPTER 2 RELATED WORK 6 CHAPTER 3 METHOD 10 3.1 Problem Statement 10 3.2 Learning to Generate Queries 11 3.2.1 Retrieving Corpus Translations 12 3.2.2 Generating Input-dependent QE Terms 14 3.2.3 Learning to Generate Input-independent QE Terms 16 3.3 The Run Time Process 21 CHAPTER 4 Experimental Setting and Results 24 4.1 Experimental Settings 24 4.2 Methods Compared 26 4.3 Evaluation Data Sets and Metrics 29 4.4 Evaluation Results 33 4.5 Discussion and Error Analysis 40 CHAPTER 5 Conclusion and Future Work 43 References 44 Appendix A – Training data 48 Appendix B – Test data 49 Appendix C – Sample output 50

    Eugene Agichtein, Steve Lawrence, Luis Gravano. 2004. Learning to find answers to questions on the Web. In ACM Transactions on Internet Technology (TOIT), 4(2):129-162.
    Bogdan Babych, Anthony Hartley, Serge Sharoff, and Olga Mudraya. 2007. Assisting Translators in Indirect Lexical Transfer. In Proceedings of 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007). Prague, Czech Republic.
    Yunbo Cao and Hang Li. 2002. Base noun phrase translation using Web data and the EM algorithm. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), 1-7. Taipei, Taiwan.
    Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46.
    Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-04), 33-40. Barcelona, Spain.
    Ido Dagan and Ken Church. 1994. TERMIGHT: Identifying and Translating Technical Terminology. In Proceedings of the 4th ACL Conference on Applied Natural Language Processing, 34-40. Stuttgart, Germany.
    Pascale Fung and Kathleen McKeown. 1997. Finding terminology translations from non-parallel corpora. In Proceedings of the 5th Annual Workshop on Very Large Corpora, 192–202. Hong Kong.
    Eric J. Glover, Gary W. Flake, Steve Lawrence, William P. Birmingham, Andries Kruger, C. Lee Giles, David M. Pennock. Improving Category Specific Web Search by Learning Query Modifications. In Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001), 23-31.
    Martin Kay. 1980. The proper place of men and machines in language translation. Xerox Palo Alto Research Center. Republished in Machine Translation, 12(1-2):3-23, 1997.
    Adam Kilgarriff and Gregory Grefenstette. 2003. Introduction to the Special Issue on the Web as Corpus. Computational Linguistics, 29 (3):333-347.
    Julian Kupiec. 1993. An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, 17-22. Columbus, Ohio.
    J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1):159-174. International Biometric Society
    Mirella Lapata and Frank Keller. 2005. Web-based Models for Natural Language Processing. ACM Transactions on Speech and Language Processing. 2(1):1-30.
    Hang Li, Yunbo Cao, and Cong Li. 2003. Using bilingual Web data to mine and rank translations. Intelligent Systems, IEEE, 18(4):54- 59.
    Edward Loper, Steven Bird. 2002. NLTK: The Natural Language Toolkit. In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, 62-69. Philadelphia, ACL.
    Ya-Juan LÜ and Ming Zhou. 2004. Collocation Translation Acquisition Using Monolingual Corpora. In Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, 167-174. Barcelona, Spain.
    Macmillan Education. 2007. Macmillan English Dictionary 2nd Edition. Macmillan Education, Oxford.
    Kathleen R. McKeown and Dragomir R. Radev. 2000. Collocations. A Handbook of Natural Language Processing, 507-523. Edited by Robert Dale, Hermann Moisl, Harold Somers. New York, Marcel Dekker.
    Preslav Nakov, Marti Hearst. 2005. Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL), 17–24, Ann Arbor.
    Franz Josef Och, Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1):19-51.
    Kumiko Ohmori and Masanobu Higashida. 1999. Extracting bilingual collocations from non-aligned parallel corpora. In Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI99), 88–97. Chester, UK.
    Violeta Seretan and Eric Wehrli. 2007. Collocation translation based on sentence alignment and parsing. In Actes de la 14e conf´erence sur le Traitement Automatique des Langues Naturelles (TALN 2007), 401–410. Toulouse, France.
    Serge Sharoff, Bogdan Babych, Paul Rayson, Olga Mudraya, Scott Piao. 2006. ASSIST: Automated Semantic Assistance for Translators. In companion proceedings to the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), 139-142. Trento, Italy.
    Frank Smadja , Kathleen R. McKeown , Vasileios Hatzivassiloglou. 1996. Translating collocations for bilingual lexicons: a statistical approach. Computational Linguistics, 22(1):1-38.
    Ellen M. Voorhees and Dawn M. Tice. 1999. The TREC-8 question answering track evaluation. In Proceedings of TREC-8, 84–106.
    Hua Wu, Ming Zhou. 2003. Synonymous Collocation Extraction Using Translation Information. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 120-127. Sapporo, Japan.
    Jian-Cheng Wu, Peter Wei-Huai Hsu, Chiung-Hui Tseng, Jason S. Chang. 2008. Mining the Web for Domain-Specific Translations. In Proceedings of the 8th conference of the Association for Machine Translation in the Americas (AMTA), 21-25. Waikiki, Hawaii.
    Jian-Cheng Wu, and Jason S. Chang. 2007. Learning to find English to Chinese transliterations on the Web. In Proceedings of EMNLP-CoNLL, 996-1004.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE