簡易檢索 / 詳目顯示

研究生: 黃仲淇
Huang, Chung-chi
論文名稱: 互動式電腦輔助翻譯與寫作助手
An Interactive Computer-Aided Translation and Writing Assistant
指導教授: 張俊盛
Chang, Jason S.
口試委員: 梁婷
Liang, Tyne
高照明
Gao, Zhao-Ming
張智星
Jang, Jyh-Shing Roger
陳克健
Chen, Keh-Jiann
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 165
中文關鍵詞: 字詞使用樣式文法概念為本翻譯建議文字預測提示未知詞電腦輔助翻譯電腦輔助寫作電腦輔助語言學習寫作建議模組機器翻譯自動評分準則BLEU
外文關鍵詞: text completion, grammatical constructions, text prediction, sublexical/constituent translation, phraseological tendencies, n-grams, language and translation models, inverted files, pattern grammar
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個建議接續譯文之文法和翻譯的方法,以期減輕語言學習者在翻譯時選字的負擔、減少學習者文法或片語使用的錯誤、進而提升寫作品質,尤其是所謂在寫作質效(productivity)上。在我們的方法中,這些包含字詞使用樣式(pattern)的文法還有翻譯建議是即時產生,並且跟線上寫作互動平台整合為一,不需要另外開啟其他的查詢頁面或是仰賴其他系統。方法上包含自動為未知詞抽取、組合翻譯候選,自動分析目標語語料以擷取出文法概念為本(syntax-based)的字詞使用樣式、或字詞使用傾向(phraseological tendencies),另外也自動抽取雙語翻譯配對以幫助譯文文字預測(text prediction)。執行時期,原文和目前系統使用者現階段輸入的譯文將會被切成n-grams以產生接續譯文的文法字詞使用和翻譯建議。這些建議將即時(real time, on the fly)被評估、排序並整合傳送給系統使用者作為提示。我們將此方法實作成一個雛型系統TransAhead,並將其應用在電腦輔助翻譯和電腦輔助寫作上,或甚至在電腦輔助語言學習上。實驗結果顯示我們的未知詞模組為系統未知詞提供可接受的翻譯候選並且減輕現存翻譯系統中未知詞帶來的負面影響(週遭選字與排序問題),而寫作建議模組(亦或是字詞使用樣式模組)則對語言學習者在寫作上有明顯的幫助,尤其是在冠詞和介係詞的使用上。整體評估發現本論文所提出並實作的TransAhead雛型系統所提供的譯文和寫作建議在翻譯和寫作上有相當大的潛力,因為平均而言系統使用者在翻譯的表現(利用機器翻譯自動評分準則—BLEU)上皆有顯著的提升。


    We introduce a learning method for predicting text completion in writing, and grammatical constructions to assist in the translation of a source text. In the proposed approach, predictions are offered on the fly during sentence translation to help the user in making appropriate lexical and grammar choices, thus improving writing quality and productivity. The method involves automatically extracting and evaluating sublexical/constituent translations for out-of-vocabulary (hereafter referred to as OOV) words (i.e., out-of-vocabulary module for text prediction), automatically analyzing target-language sentences to generate general and syntax-based phraseological tendencies (i.e., target-language writing suggestion module for grammar prediction), and automatically learning high-confidence word- or phrase-level translation equivalents (i.e., text prediction). At run-time, the source text and the translation prefix entered by the user are broken down into n-grams to generate grammar and translation predictions, which are further combined and ranked via translation and language models. These ranked prediction candidates are then displayed to the user in a pop-up menu as translation or writing hints. We present a prototype writing assistant, TransAhead, that applies the method to a human-computer collaborative environment for computer-assisted translation and computer-assisted language learning. Experimental results show that the OOV module indeed provides good translations for unknown words, and eases the impact of OOV on translation quality. It was also found that language learners substantially benefit from the writing module’s phraseology information. Overall, our methodology supports inline text and grammar predictions and has great potential for assisting language learners or novice translators in the process of translation, writing or even language learning.

    CHAPTER 1 Introduction -- 1 CHAPTER 2 Related Work -- 5 2.1 Computer-Assisted Translation -- 5 2.2 Out-of-Vocabulary Issue in MT -- 7 2.3 Phraseology Learning -- 10 CHAPTER 3 The TransAhead System -- 13 3.1 Problem Statement -- 13 3.2 The OOV Module of Text Prediction -- 14 3.2.1 Module Problem Statement -- 15 3.2.2 Finding Sublexical/Constituent Translations -- 16 3.2.3 Generating and Ranking OOV Translations -- 23 3.3 The Writing Suggestion Module of Grammar Prediction -- 27 3.3.1 Module Problem Statement -- 28 3.3.2 Reference Corpus Preprocessing -- 29 3.3.3 GRASP Usage Summary Construction -- 30 3.4 Run-Time TransAhead Grammar and Text Prediction -- 33 CHAPTER 4 Improving the TransAhead System -- 36 4.1 Module Problem Statement -- 38 4.2 Error Correcting Procedure -- 39 4.3 Preliminary Evaluation on Error Correction -- 44 CHAPTER 5 Experiments -- 46 5.1 Evaluation on the OOV Module -- 46 5.1.1 Underlying MT System and Data Sets Used -- 47 5.1.2 Query Formats and Bilingual Resources -- 48 5.1.3 Parameter Tuning -- 52 5.1.4 Translation Results with/without the OOV Module -- 54 5.1.5 Discussion -- 56 5.2 Evaluation on the Writing Suggestion Module -- 60 5.2.1 Experimental Setting -- 61 5.2.2 Translation Results of Constrained Experiments -- 61 5.3 Evaluation on TransAhead -- 66 CHAPTER 6 Future Work and Summary -- 71 References -- 73 Publications -- 83 Appendix A - OOV Types and Their Examples in the NIST MT-08 Test Set -- 88 Appendix B - Example Translations with OOV Words Correctly Translated -- 92 Appendix C - Example Translations with OOV Words Partially Translated -- 95 Appendix D - Example Translations with OOV Words not in Combination Form -- 100 Appendix E - Translation Task for Evaluation on GRASP Module -- 106 Appendix F - Translation Task for Evaluation on GRASP Module (the post-test of the sampled GRASP users) -- 107 Appendix G - Translation Task for Evaluation on GRASP Module (the post-test of the sampled non-GRASP users) -- 119 Appendix H - Chinese Texts for Translation -- 131 Appendix I - English Language Learners’ Translations with TransAhead Assistance -- 132 Appendix J - English Language Learners’ Translations without TransAhead Assistance -- 149

    Joshua S. Albrecht, Rebecca Hwa, and G. Elisabeta Marai. 2009. Correcting automatic translation through collaborations between MT and monolingual target-language users. In Proceedings of the European Chapter of the Association for Computational Linguistics, pages 60-68.
    Karunesh Arora, Michael Paul, and Eiichiro Sumita. 2008. Translation of unknown words in phrase-based statistical machine translation for languages of rich morphology. In Proceedings of the SLTU.
    Ming-Hong Bai, Keh-Jiann Chen, and Jason S. Chang. 2008. Improving word alignment by adjusting Chinese word segmentation. In Proceedings of the International Conference on Natural Language Processing, pages 249-256.
    Sergio Barrachina, Oliver Bender, Francisco Casacuberta, Jorge Civera, Elsa Cubel, Shahram Khadivi, Antonio Lagarda, Hermann Ney, Jesus Tomas, Enrique Vidal, and Juan-Miguel Vilar. 2008. Statistical approaches to computer-assisted translation. Computational Linguistics, 35(1): 3-28.
    Morton Benson, Evellyn Benson, and Robert Ilson. 1986. The BBI Combinatory Dictionary of English: A guide to word combinations. Philadelphia: John Benjamins.
    Chris Brockett, William B. Dolan, and Michael Gamon. 2006. Correcting ESL errors using phrasal SMT techniques. In Proceedings of the Association for Computational Linguistics, pages 249-256.
    Ralf D. Brown and Sergei Nirenburg. 1990. Human-computer interaction for semantic disambiguation. In Proceedings of the International Conference on Computational Linguistics, pages 42-47.
    Jill Burstein, Martin Chodorow, and Claudia Leacock. 2004. Automated essay evaluation: the criterion online writing service. AI Magazine, 25(3): 27-36.
    Yunbo Cao and Hang Li. 2002. Base noun phrase translation using web data and the EM algorithm. In Proceedings of the International Conference on Computational Linguistics.
    Yu-Chia Chang, Jason S. Chang, Hao-Jan Chen, and Hsien-Chin Liou. 2008. An automatic collocation writing assistant for Taiwanese EFL learners: a case of corpus-based NLP technology. Computer Assisted Language Learning, 21(3): 283-299.
    Winnie Cheng, Chris Greaves, and Martin Warren. 2006. From n-gram to skipgram to concgram. Corpus Linguistics, 11(4): 411-433.
    Martin Chodorow and Claudia Leacock. 2000. An unsupervised method for detecting grammatical errors. In Proceedings of the North American Chapter of the Association for Computational Linguistics, pages 140-147.
    Rachele De Felice and Stephen G. Pulman. 2008. A classifer-based approach to preposition and determiner error correction in L2 English. In Proceedings of the International Conference on Computational Linguistics, pages 169-176.
    Steven Donahue. Formal errors: Mainstream and ESL students. Presented at the 2001 Conference of the Two-Year College Association.
    Philip Durrant. 2009. Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purpose (ESP), 28(3): 157-169.
    Matthias Eck, Stephan Vogel, and Alex Waibel. 2008. Communicating unknown words in machine translation. In Proceedings of the Conference on Language Resources and Evaluation.
    John R. Firth. 1957. Modes of meaning. In Papers in Linguistics. London: Oxford University Press, pages 190-215.
    George Foster, Philippe Langlais, Elliott Macklovitch, and Guy Lapalme. 2002. TransType: text prediction for translators. In Proceedings of the ACL Demonstrations, pages 93-94.
    Pascale Fung and Percy Cheung. 2004. Mining ver-non-parallel corpora: parallel sentence and lexicon extraction via bootstrapping and EM. In Proceedings of the Empirical Methods in Natural Language Processing.
    Michael Gamon, Claudia Leacock, Chris Brockett, William B. Dolan., Jianfeng Gao, Dmitriy Belenko, and Alexandre Klementiev. 2009. Using statistical techniques and web search to correct ESL errors. Computer Assisted Langauge Instruction Consortium, 26(3): 491-511.
    Michael Gamon and Claudia Leacock. 2010. Search right and thou shalt find … using web queries for learner error detection. In Proceedings of the North American Chapter of the Association for the Computational Linguistics Workshop on Innovative Use of NLP for Building Educational Applications, pages 37-44.
    Hany Hassan and Jeffrey Sorensen. 2005. An integrated approach for Arabic-English named entity translation. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages.
    Matthieu Hermet, Alain Desilets, and Stan Szpakowicz. 2008. Using the web as a linguistic resource to automatically correct lexico-syntatic errors. In Proceedings of the LREC, pages 874-878.
    Chu-Ren Huang, Ru-Yng Chang, and Hsiang-Pin Lee. 2004. Sinica BOW (Bilingual Ontological Wordnet): integration of bilingual WordNet and SUMO. In Proceedings of the International Conference on Language Resources and Evaluation.
    Chung-Chi Huang, Mei-Hua Chen, Shih-Ting Huang, Hsien-Chin Liou, and Jason S. Chang. 2011. GRASP: Grammar- and Syntax-based Pattern-Finder in CALL. In Proceedings of the ACL Workshop on Innovative Use of NLP for Building Educational Applications, pages 96-104.
    Susan Hunston and Gill Francis. 2000. Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English. Amsterdam: John Benjamins.
    Jia-Yan Jian, Yu-Chia Chang, and Jason S. Chang. 2004. TANGO: Bilingual collocational concordancer. In Proceedings of the Association for the Computational Linguistics Poster.
    Martin Kay. 1973. The MIND system. In R. Rustin, editor, Natural Language Processing, pages 155-188.
    Adam Kilgarriff, Pavel Rychly, Pavel Smrz, and David Tugwell. 2004. The sketch engine. In Proceedings of the EURALEX, pages 105-116.
    Kevin Knight and Jonathan Graehl. 1997. Machine transliteration. In Proceedings of the European Chapter of the Association for Computational Linguistics.
    Philipp Koehn and Kevin Knight. 2003. Empirical methods for compound splitting. In Proceedings of European Chapter of the Association for Computational Linguistics.
    Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics, pages 48-54.
    Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Empirical Methods in Natural Language Processing.
    Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation.
    Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the ACL 2007 Demo and Poster.
    Philipp Koehn. 2009. A web-based interactive computer aided translation tool. In Proceedings of the Association for the Computational Linguistics Software Demonstrations, pages 17-20.
    Philippe Langlais, George Foster, and Guy Lapalme. 2000. TransType: a computer-aided translation typing system. In Proceedings of the North American Chapter of the Association for the Computational Linguistics Workshop on Embedded Machine Translation System, pages 46-51.
    Philippe Langlais and Alexandre Patry. 2007. Translating unknown words by analogical learning. In Proceedings of the Empirical Methods in Natural Language Processing.
    Claudia Leacock and Martin Chodorow. 2003. Automated grammatical error detection. In M.D. Shermis and J.C. Burstein, editors, Automated Essay Scoring: A Cross-Disciplinary Perspective, pages 195-207.
    Chong Min Lee, Soojeong Eom, and Markus Dickinson. 2009. Toward analyzing Korean learner particles. In Computer Assisted Language Instruction Consortium Workshop.
    Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10, page 707.
    Zhifei Li and David Yarowsky. 2008. Unsupervised translation induction for Chinese abbreviations using monolingual corpora. In Proceedings of the Association for Computational Linguistics.
    Wei-Yun Ma and Keh-Jiann Chen. 2003. Introduction to CKIP Chinese word segmentation system for the first international Chinese word segmentation bakeoff. In Proceedings of the Association for the Compuational Linguistics Workshop on Chinese Language Processing.
    Yanjun Ma, Nicolas Stroppa, and Andy Way. 2007. Bootstrapping word alignment via word packing. In Proceedings of the Association for Computational Linguistics, pages 304-311.
    Yuval Marton, Chris Callison-Burch, and Philip Resnik. 2009. Improved statistical machine translation using monolingually-derived paraphrases. In Proceedings of the Empirical Methods in Natural Language Processing.
    George A. Miller. 1995. WordNet: a lexical database for English. Communications of the ACM, 38(11).
    Shachar Mirkin, Lucia Specia, Nicola Cancedda, Ido Dagan, Marc Dymetman, and Idan Szpektor. 2009. Source-language entailment modeling for translating unknown terms. In Proceedings of the Association for Computational Linguistics.
    Dragos Stefan Munteanu and Daniel Marcu. 2005. Improving machine translation performance by exploiting non-parallel corpora. Computational Linguistics, 31(4).
    Masaaki Nagata, Teruka Saito, and Kenji Suzuki. 2001. Using the Web as a bilingual dictionary. In Proceedings of the ACL Workshop on Data-driven Methods in Machine Translation.
    Laurent Nepveu, Guy Lapalme, Philippe Langlais, and George Foster. 2004. Adaptive language and translation models for interactive machine translation. In Proceedings of the Empirical Methods in Natural Language Processing, pages 190-197.
    Diane Nicholls. 1999. The Cambridge Learner Corpus – error coding and analysis for writing dictionaries and other books for English Learners.
    Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19-51.
    Daniel Ortiz-Martinez, Ismael Garcia-Varea, and Francisco Casacuberta. 2010. Online learning for interactive statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics, pages 546-554.
    Daniel Ortiz-Martinez, Luis A. Leiva, Vicent Alabau, Ismael Garcia-Varea, and Francisco Casacuberta. 2011. An interactive machine translation system with online learning. In Proceedings of the Association for the Computational Linguistics System Demonstrations, pages 68-73.
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the the Association for Computational Linguistics.
    John M. Sinclair. 1987. The nature of the evidence. In J. Sinclair (ed.) Looking Up. Collins: 150-159.
    Frank Smadja. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1): 143-177.
    Andreas Stolcke. 2002. SRILM – an extensible language modeling toolkit. In Proceedings of the ICSLP.
    Michael Stubbs. 2004. At http://web.archive.org/web/20070828004603/http://www.uni-trier.de/uni/fb2/anglistik/Projekte/stubbs/icame-2004.htm.
    Chaofen Sun. 2006. Chinese: a linguistic introduction.
    Guihua Sun, Xiaohua Liu, Gao Cong, Ming Zhou, Zhongyang Xiong, John Lee, and Chin-Yew Lin. 2007. Detecting erroneous sentences using automatically mined sequential patterns. In Proceedings of the Association for Computational Linguistics, pages 81-88.
    Takaaki Tanaka and Timothy Baldwin. 2003. Noun-noun compound machine translation: a feasibility study on shallow processing. In Proceedings of the Association for the Computational Linguistics Workshop on Multiword Expressions: Analysis, Acquisition, and Treatment.
    Joel Tetreault, Jennifer Foster, and Martin Chodorow. 2010. Using parse features for prepositions selection and error detection. In Proceedings of the Association for Computational Linguistics, pages 353-358.
    Nai-Lung Tsao and David Wible. 2009. A method for unsupervised broad-coverage lexical error detection and correction. In Proceedings of the NAACL Workshop, pages 51-54.
    Larraitz Uria, Bertol Arrieta, Arantza D. De Ilarraza, Montse Maritxalar, and Maite Oronoz. 2009. Determiner errors in Basque: analysis and automatic detection. Procesamiento del Lenguaje Natural, pages 41-48.
    David Vilar, Jan-T. Peter, and Hermann Ney. 2007. Can we translate letters? In Proceedings of the ACL Workshop on Statistical Machine Translation, pages 33-39.
    Luis von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the CHI.
    David Wible and Nai-Lung Tsao. 2010. StringNet as a computational resource for discovering and investigating linguistic constructions. In Proceedings of the NAACL Workshop, pages 25-31.
    Mei Yang and Karin Kirchhoff. 2006. Phrase-based backoff models for machine translation of highly inflected languages. In Proceedings of the European Chapter of the Association for Computational Linguistics, pages 41-48.
    David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the Association for Computational Linguistics, pages 189-196.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE