簡易檢索 / 詳目顯示

研究生: 吳易東
Wu, Yi-Dong
論文名稱: 改善學生英文習作中之動詞
Dealing with Improper Verbs in Writing based on Language Model
指導教授: 張俊盛
Chang, Jason S.
口試委員: 陳浩然
高宏宇
白明弘
吳鑑城
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 45
中文關鍵詞: 文法改錯語言模型
外文關鍵詞: Grammatical Error Correction, Language Model
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個英文動詞改善建議的方法,自動針對句子內的動詞提供一些更適當的動詞使用建議。我們採取利用語言模型(Language Model, LM)和分類器所構成的管線(Pipeline)技術來建立系統。此方法涉及利用語言模型生成動詞候選詞,及利用分類器過濾不恰當的動詞後選詞。實驗結果顯示,我們的方法跟基準(Baseline)比起,獲得較佳的結果。


    We present an verb suggestion system that automatically detects verbs and suggests a list of appropriate alternative verbs in a given sentence. In our approach, we adopt a pipeline technique, which is composed of a language model and a classifier. The method involves generating alternative verbs with masked language model and filtering inappropriate alternatives with a classifier. Preliminary evaluation shows that the proposed system outperforms the baseline.

    Abstract i 摘要 ii 致謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 5 3 Methodology 9 3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Prepare Training Data and Train a Classi er to Filter out Inappro- priate Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1 Extracting Verbs from a Training Corpus . . . . . . . . . . . 11 3.2.2 Generating Alternative Verbs with LM . . . . . . . . . . . . 12 iv 3.2.3 Constructing a Verbs Improvement Training Data . . . . . . 14 3.2.4 Training a Classi er to Filter out Inappropriate Alternatives 17 3.3 Run-Time Sentence Rephrasing and Filtering . . . . . . . . . . . . 18 4 Experiment 21 4.1 Datasets and Toolkits . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Model Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.1 Masked Language Model . . . . . . . . . . . . . . . . . . . . 27 4.3.2 Classi er . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.3 Textual Entailment Model . . . . . . . . . . . . . . . . . . . 27 4.4 Threshold Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.5 Model Compared . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.6 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 Evaluation Results 33 5.1 Results of FCE testing data . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Results from 100 Random Selection Data . . . . . . . . . . . . . . . 35 6 Conclusion and Future Work 38 Reference 40

    1. Dzmitry Bahdanau, Kyunghyun Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ArXiv, 1409, 09 2014.
    2. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2015.
    3. Chris Brockett, William B. Dolan, and Michael Gamon. Correcting ESL errors using phrasal SMT techniques. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 249-256, Sydney, Australia, July 2006. Association for Computational Linguistics. doi: 10.3115/1220175.1220207. URL https://www.aclweb.org/anthology/P06-1032.
    4. Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. A statistical approach to machine translation. Computational Linguistics, 16(2): 79{85, 1990. URL https://www.aclweb.org/anthology/J90-2002.
    5. Christopher Bryant, Mariano Felice, Oistein E. Andersen, and Ted Briscoe. The BEA-2019 shared task on grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52-75, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-4406. URL https://www.aclweb.org/anthology/W19-4406.
    6. Daniel Dahlmeier and Hwee Tou Ng. A beam-search decoder for grammatical error correction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 568-578, Jeju Island, Korea, July 2012. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/D12-1052.
    7. Rachele De Felice and Stephen G. Pulman. A classifier-based approach to preposition and determiner error correction in L2 English. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 169-176, Manchester, UK, August 2008. Coling 2008 Organizing Committee. URL https://www.aclweb.org/anthology/C08-1022.
    8. Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Peters, Michael Schmitz, and Luke S. Zettlemoyer. Allennlp: A deep semantic natural language processing platform. 2017.
    9. Roman Grundkiewicz and Marcin Junczys-Dowmunt. Near human-level performance in grammatical error correction with hybrid machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 284-290, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2046. URL https://www.aclweb.org/anthology/N18-2046.
    10. Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial-strength Natural Language Processing in Python, 2020. URL https://doi.org/10.5281/zenodo.1212303.
    11. Peter Andrew Howarth. Phraseology in english academic writing". 1996. Nal Kalchbrenner and Phil Blunsom. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1700-1709, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/D13-1176.
    12. Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, and Kentaro Inui. Encoder-decoder models can benefit from pre-trained masked language models in grammatical error correction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4248-4254, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.391. URL https://www.aclweb.org/anthology/2020.acl-main.391.
    13. Claudia Leacock, M. Chodorow, Michael Gamon, and J. Tetreault. Automated grammatical error detection for language learners, second edition. Synthesis Lectures on Human Language Technologies, 7:1-185, 01 2014. doi: 10.2200/S00562ED1V01Y201401HLT025.
    14. Li-er Liu. A corpus-based lexical-semantic investigation of verb-noun miscollocations in taiwan learners' english. 01 2002.
    15. Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. The CoNLL-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1-14, Baltimore, Maryland, June 2014. Association for Computational Linguistics. doi: 10.3115/v1/W14-1701. URL https://www.aclweb.org/anthology/W14-1701.
    16. Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. GECToR - grammatical error correction: Tag, not rewrite. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163-170, Seattle, WA, USA -> Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.bea-1.16. URL https://www.aclweb.org/anthology/2020.bea-1.16.
    17. Marek Rei and Anders Sogaard. Jointly learning to label sentences and tokens. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6916-6923, Jul. 2019. doi: 10.1609/aaai.v33i01.33016916. URL https://ojs.aaai.org/index.php/AAAI/article/view/4669.
    18. Marek Rei and Helen Yannakoudakis. Auxiliary objectives for neural error detection models. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 33-43, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/W17-5004. URL https://www.aclweb.org/anthology/W17-5004.
    19. Alla Rozovskaya and Dan Roth. Generating confusion sets for context-sensitive error correction. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 961-970, Cambridge, MA, October 2010. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/D10-1094.
    20. Alla Rozovskaya and Dan Roth. Algorithm selection and model adaptation for ESL correction tasks. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 924-933, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/P11-1093.
    21. Joel Tetreault, Jennifer Foster, and Martin Chodorow. Using parse features for preposition selection and error detection. In Proceedings of the ACL 2010 Conference Short Papers, pages 353{358, Uppsala, Sweden, July 2010. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/P10-2065.
    22. Chung-Ting Tsai, Jhih-Jie Chen, Ching-Yu Yang, and Jason S. Chang. LinggleWrite: a coaching system for essay writing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 127-133, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-demos.17. URL https://www.aclweb.org/anthology/2020.acl-demos.17.
    23. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38-45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
    24. Jian-Cheng Wu, Yu-Chia Chang, Teruko Mitamura, and Jason S. Chang. Automatic collocation suggestion in academic writing. In Proceedings of the ACL 2010 Conference Short Papers, pages 115{119, Uppsala, Sweden, July 2010. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/P10-2021.

    QR CODE