簡易檢索 / 詳目顯示

研究生: 許瑋芩
Hsu, Wei-Chin
論文名稱: 學習使用文法規則與反向翻譯來翻譯片語
Learning to Use Grammar Patterns and Back-Translation to Translate Phrases
指導教授: 張俊盛
Chang, Jason S.
口試委員: 張智星
陳浩然
楊謦瑜
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 50
中文關鍵詞: Pattern GrammarBack TranslationMachine Translation
外文關鍵詞: 文法規則, 反向翻譯, 機器翻譯
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一種機器翻譯的方法,可以更精確地翻譯片語並使其接近自然產生的語言。在研究方法中,文法規則會被轉換為查詢指令,以產生出原始語言(SL)和目標語言(TL)的訓練資料。該方法包含自動使用語法規則搜索自然產生的SL片語(SP)、使用既有的翻譯系統翻譯SP來得到TL片語(TP)、自動處理TP以產生常用的TL片語(TP')並經由機器翻譯來得到SL片語(SP')。最後,我們用SP'-TP'(反向翻譯)以及自動訓練機器翻譯模型。我們也介紹我們採用這個方法,實際製作的雛型系統Phrasism。在訓練模型時,Phrasism透過n-gram檢索系統和現有的機器翻譯系統,來生成片語訓練資料。我們從零開始建立資料,並以小量資料來測試評估方法的可行性。根據專業人員的評估表明,該方法的雖然使用小量的資料來訓練模型,但仍可做出合理的結果。我們的方法根據文法規則(PatternGrammar)和反向翻譯(Back Translation)產生訓練資料,所產生的機器翻譯可以有效地翻譯常用片語。


    We propose a method for translating phrases retrieved from a linguistic search engine of recurrent phrases. In our approach, grammatical rules are transformed into new queries aimed at generating the training data from the source language (SL) and target language (TL) phrases. The method involves automatically using grammar patterns for searching naturally occurring SL phrases (SP), automatically translating SP to TL phrases (TP) using a given translation system, and automatically processing TP to create common TL phrases (TP') that could be back-translated to obtain SL phrases (SP'), and automatically training a new SP'-TP' model using back-translated SP’ and TP’ pairs. We present Phrasism, a prototype machine translation system. Phrasism generates parallel phrase training data by using a pattern-based n-gram retrieval system and an existing machine translation system. We generate phrases from scratch and test the feasibility of the method with low-resource phrase data. Human judgment and evaluation on a set of randomly selected phrases show that the method generates reasonably good translations. Our methodology creates phrase data based on grammar patterns and back translation, resulting in additional improvement in a machine translation system for common phrases.

    Abstract i 摘要 ii 致謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 5 3 Methodology 9 3.1 Problem Statement .......................... 9 3.2 Generating Bilingual Phrase Data .......................... 11 3.2.1 Generating source language phrases using grammar patterns 11 3.2.2 Translate source language phrases to obtain target language phrases .......................... 14 3.2.3 Process target language phrases to extract naturally occur-ring phrases .......................... 15 3.2.4 Translate naturally occurring target language phrases .......................... 18 3.3 Train NMT model .......................... 19 4 Experiments .......... 21 4.1 Datasets .......................... 22 4.2 Preprocessing training data .......................... 23 4.2.1 Generating a small amount of training data .......................... 24 4.2.2 Generating a large amount of training data .......................... 25 4.3 Setting Hyper-parameters of NMT model .......................... 26 4.4 Models and Systems Compared .......................... 27 4.4.1 The Setting of the First Experiment .......................... 27 4.4.2 The Setting of the Second Experiment .......................... 28 4.4.3 Systems Compared .......................... 29 4.5 Evaluation Phrase Translation and Judgments .......................... 30 5 Results and Discussion .......... 32 5.1 Evaluations from the First Experiment .......................... 32 5.2 Evaluations from the Second Experiment .......................... 34 6 Conclusion and Future Work .......... 36 Reference .......... 38 Appendices .......... 42

    1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate, 2016.
    2. Timothy Baldwin and Takaaki Tanaka. Translation by machine of complex nominals: Getting it right. InProceedings of the Workshop on Multiword Expressions: Integrating Processing, pages 24–31, Barcelona, Spain, July 2004.
    3. Joanne Boisson, Ting-Hui Kao, Jian-Cheng Wu, Tzu-Hsi Yen, and Jason S. Chang. Linggle: a web-scale linguistic search engine for words in context. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 139–144, Sofia, Bulgaria, August 2013.
    4. Ondˇrej Bojar and Aleˇs Tamchyna. Improving translation model by monolingual data. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 330–336, Edinburgh, Scotland, July 2011.
    5. Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L.Mercer. The mathematics of statistical machine translation: Parameter estimation.Computational Linguistics, 19(2):263–311, 1993.
    6. Yunbo Cao and Hang Li. Base noun phrase translation using web data and the EM algorithm. In COLING 2002: The 19th International Conference on Computational Linguistics, 2002.
    7. Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. Understanding back-translation at scale, 2018.
    8. Sergey Edunov, Myle Ott, Marc’Aurelio Ranzato, and Michael Auli.On the evaluation of machine translation systems trained with back-translation, 2020.
    9. Sanel Hadziahmetovic Jurida, Mirza Dzanic, Tanja Pavlovic, Alma Jahic, andJasmina Hanic. Netspeak: Linguistic properties and aspects of online communication in postponed time.Journal of Foreign Language Teaching and AppliedLinguistics, 3, 01 2016.
    10. Susan Hunston and Gill Francis. Pattern Grammar: A Corpus-driven approach to the lexical grammar of English. Studies in Corpus Linguistics. Benjamins, January 1999.
    11. Marcin Junczys-Dowmunt. Microsoft translator at wmt 2019: Towards large-scale document-level neural machine translation, 2019.
    12. Alina Karakanta, Jon Dehdari, and Josef van Genabith. Neural machine translation for low-resource languages without parallel corpora.Machine Translation,32(1-2):167–189, 2018.
    13. Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush.OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations, pages 67–72, Vancouver, Canada, July 2017.
    14. Philipp Koehn.Noun Phrase Translation. PhD thesis, USA, 2003.
    15. Philipp Koehn, Franz J. Och, and Daniel Marcu. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 127–133, 2003.
    16. Taku Kudo. Subword regularization: Improving neural network translation models with multiple subword candidates. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages66–75, Melbourne, Australia, July 2018.
    17. Taku Kudo and John Richardson. SentencePiece: A simple and language in-dependent subword tokenizer and detokenizer for neural text processing. InProceedings of the 2018 Conference on Empirical Methods in Natural LanguageProcessing: System Demonstrations, pages 66–71, Brussels, Belgium, November 2018.
    18. Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation, 2015.
    19. Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, and SergeyEdunov. Facebook FAIR’s WMT19 news translation task submission. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: SharedTask Papers, Day 1), pages 314–319, Florence, Italy, August 2019.
    20. Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models.Computational Linguistics, 29(1):19–51, 2003.
    21. Collins COBUILD Grammar Patterns. 1: Verbs.Collins COBUILD, the University of Birmingham, 1996.
    Alberto Poncelas, Dimitar Shterionov, Andy Way, Gideon Maillette de Buy 22. Wen-niger, and Peyman Passban. Investigating backtranslation in neural machinetranslation, 2018.
    23. Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving neural machinetranslation models with monolingual data. In Proceedings of the 54th AnnualMeeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Berlin, Germany, August 2016.
    24. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning withneural networks, 2014.
    25. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017.

    QR CODE