研究生: |
鍾幸芸 Chung, Hsin-Yun |
---|---|
論文名稱: |
輔助寫作的文法提示系統 Grammar Level Auto-Complete for Assistive Writing |
指導教授: |
張俊盛
Chang, Jason S. |
口試委員: |
張智星
JANG, Jyh-Shing 鍾曉芳 Chung, Siaw-Fong |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 35 |
中文關鍵詞: | 文法提示 、文字生成 |
外文關鍵詞: | Grammar Pattern, Text Generation |
相關次數: | 點閱:49 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一個寫作建議的系統:GrammarGenie,可以自動預測不完整句子後續的文法樣式 (Grammar pattern)。 我們爬取字典中的例句和文法樣式,並將其轉換成所需格式作為訓練資料,然後以微調大型語言模型 T5 (Text-to-Text Transfer Transformer) 的方法來建立系統。 實驗結果顯示,我們的系統除了預測其他字典例句的文法樣式十分優秀外,在實際運用上也具有出色的預測能力。
We present a method that automatically generates corresponding grammar patterns for a given incomplete sentence. In our approach, partial sentences allow the system to predict probable grammar patterns. The method involves crawling a dictionary of example sentences, converting these sentences into incomplete sentences and grammar patterns, and using this data to fine-tune a large language model to fill in the incomplete sentences. At run-time, the system receives partial sentences and outputs the highest probability grammar pattern. Evaluation on a set of open courses transcript shows that the system has excellent predictive capabilities on the average. Our methodology supports combining many example sentences, resulting in improved model accuracy.
Peter F Brown, Vincent J Della Pietra, Peter V Desouza, Jennifer C Lai, and
Robert L Mercer. Class-based n-gram models of natural language. Computational
linguistics, 18(4):467–480, 1992.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan,
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda
Askell, et al. Language models are few-shot learners. Advances in neural information
processing systems, 33:1877–1901, 2020.
Jim Chang and Jason S Chang. Writeahead2: Mining lexical grammar patterns
for assisted writing. In Proceedings of the 2015 conference of the north American
chapter of the association for computational linguistics: Demonstrations, pages
106–110, 2015.
Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin
Lu, Jackie Tsay, Yinan Wang, Andrew M Dai, Zhifeng Chen, et al. Gmail
smart compose: Real-time assisted writing. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining,
pages 2287–2295, 2019.
33
James H. Martin Dan Jurafsky. Speech and Language Processing (3rd ed. draft).
2024. URL https://web.stanford.edu/~jurafsky/slp3/.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining
of deep bidirectional transformers for language understanding. arXiv
preprint arXiv:1810.04805, 2018.
S. Hunston and G. Francis. Pattern Grammar: A Corpus-driven Approach to the
Lexical Grammar of English. Pattern Grammar: A Corpus-driven Approach
to the Lexical Grammar of English. John Benjamins Publishing Company,
2000. ISBN 9789027222732. URL https://books.google.com.tw/books?id=
nqqh46Q0uVMC.
Susan Hunston, G Francis, and Elizabeth Manning. Collins COBUILD Grammar
Patterns 1: Verbs. HarperCollins, 1996. ISBN 0003750620.
Susan Hunston, G Francis, and Elizabeth Manning. Collins Cobuild Grammar
Patterns 2: Nouns and Adjectives. HarperCollins, 1998. ISBN 9780003750676.
Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text
summarization branches out, pages 74–81, 2004.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation
of word representations in vector space, 2013. URL https://arxiv.org/abs/
1301.3781.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method
for automatic evaluation of machine translation. In Proceedings of the 40th
34
annual meeting of the Association for Computational Linguistics, pages 311–
318, 2002.
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global
vectors for word representation. In Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP), pages 1532–1543, 2014.
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits
of transfer learning with a unified text-to-text transformer. Journal of machine
learning research, 21(140):1–67, 2020.
Tzu-Hsi Yen, Jian-Cheng Wu, Jim Chang, Joanne Boisson, and Jason S Chang.
Writeahead: Mining grammar patterns in corpora for assisted writing. In Proceedings
of ACL-IJCNLP 2015 system demonstrations, pages 139–144, 2015.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav
Artzi. Bertscore: Evaluating text generation with bert. arXiv preprint
arXiv:1904.09675, 2019.