簡易檢索 / 詳目顯示

研究生: 鍾幸芸
Chung, Hsin-Yun
論文名稱: 輔助寫作的文法提示系統
Grammar Level Auto-Complete for Assistive Writing
指導教授: 張俊盛
Chang, Jason S.
口試委員: 張智星
JANG, Jyh-Shing
鍾曉芳
Chung, Siaw-Fong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 35
中文關鍵詞: 文法提示文字生成
外文關鍵詞: Grammar Pattern, Text Generation
相關次數: 點閱:49下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個寫作建議的系統:GrammarGenie,可以自動預測不完整句子後續的文法樣式 (Grammar pattern)。 我們爬取字典中的例句和文法樣式,並將其轉換成所需格式作為訓練資料,然後以微調大型語言模型 T5 (Text-to-Text Transfer Transformer) 的方法來建立系統。 實驗結果顯示,我們的系統除了預測其他字典例句的文法樣式十分優秀外,在實際運用上也具有出色的預測能力。


    We present a method that automatically generates corresponding grammar patterns for a given incomplete sentence. In our approach, partial sentences allow the system to predict probable grammar patterns. The method involves crawling a dictionary of example sentences, converting these sentences into incomplete sentences and grammar patterns, and using this data to fine-tune a large language model to fill in the incomplete sentences. At run-time, the system receives partial sentences and outputs the highest probability grammar pattern. Evaluation on a set of open courses transcript shows that the system has excellent predictive capabilities on the average. Our methodology supports combining many example sentences, resulting in improved model accuracy.

    Abstract i 摘要 ii 致謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 5 3 Method 9 3.1 Problem Statement 9 3.2 Prepare Training Data and Train Model 10 3.3 Run-Time Grammar Pattern Predicting 14 4 Experiment 16 4.1 Datasets and Pre-trained Model 17 4.2 System Compared 18 4.3 TestData 19 4.4 Evaluation Metrics 20 5 Evaluation Results 23 5.1 Results of Automatic Evaluation 23 5.2 Results of Human Evaluation 24 6 Conclusion and Future Work 31 Reference 33

    Peter F Brown, Vincent J Della Pietra, Peter V Desouza, Jennifer C Lai, and
    Robert L Mercer. Class-based n-gram models of natural language. Computational
    linguistics, 18(4):467–480, 1992.
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan,
    Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda
    Askell, et al. Language models are few-shot learners. Advances in neural information
    processing systems, 33:1877–1901, 2020.
    Jim Chang and Jason S Chang. Writeahead2: Mining lexical grammar patterns
    for assisted writing. In Proceedings of the 2015 conference of the north American
    chapter of the association for computational linguistics: Demonstrations, pages
    106–110, 2015.
    Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin
    Lu, Jackie Tsay, Yinan Wang, Andrew M Dai, Zhifeng Chen, et al. Gmail
    smart compose: Real-time assisted writing. In Proceedings of the 25th ACM
    SIGKDD International Conference on Knowledge Discovery & Data Mining,
    pages 2287–2295, 2019.
    33
    James H. Martin Dan Jurafsky. Speech and Language Processing (3rd ed. draft).
    2024. URL https://web.stanford.edu/~jurafsky/slp3/.
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining
    of deep bidirectional transformers for language understanding. arXiv
    preprint arXiv:1810.04805, 2018.
    S. Hunston and G. Francis. Pattern Grammar: A Corpus-driven Approach to the
    Lexical Grammar of English. Pattern Grammar: A Corpus-driven Approach
    to the Lexical Grammar of English. John Benjamins Publishing Company,
    2000. ISBN 9789027222732. URL https://books.google.com.tw/books?id=
    nqqh46Q0uVMC.
    Susan Hunston, G Francis, and Elizabeth Manning. Collins COBUILD Grammar
    Patterns 1: Verbs. HarperCollins, 1996. ISBN 0003750620.
    Susan Hunston, G Francis, and Elizabeth Manning. Collins Cobuild Grammar
    Patterns 2: Nouns and Adjectives. HarperCollins, 1998. ISBN 9780003750676.
    Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text
    summarization branches out, pages 74–81, 2004.
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation
    of word representations in vector space, 2013. URL https://arxiv.org/abs/
    1301.3781.
    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method
    for automatic evaluation of machine translation. In Proceedings of the 40th
    34
    annual meeting of the Association for Computational Linguistics, pages 311–
    318, 2002.
    Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global
    vectors for word representation. In Proceedings of the 2014 conference on empirical
    methods in natural language processing (EMNLP), pages 1532–1543, 2014.
    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
    Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits
    of transfer learning with a unified text-to-text transformer. Journal of machine
    learning research, 21(140):1–67, 2020.
    Tzu-Hsi Yen, Jian-Cheng Wu, Jim Chang, Joanne Boisson, and Jason S Chang.
    Writeahead: Mining grammar patterns in corpora for assisted writing. In Proceedings
    of ACL-IJCNLP 2015 system demonstrations, pages 139–144, 2015.
    Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav
    Artzi. Bertscore: Evaluating text generation with bert. arXiv preprint
    arXiv:1904.09675, 2019.

    QR CODE