簡易檢索 / 詳目顯示

研究生: 劉宗翰
Liu, Chung-Han
論文名稱: 用預訓練語言模型進行自動化作文相關性評分
Automatic Essay Relevancy Scoring with Pretrained-Language Models
指導教授: 張俊盛
Chang, Jason S.
口試委員: 張智星
JANG, JYH-SHING
鍾曉芳
Chung, Siaw-Fong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2024
畢業學年度: 113
語文別: 英文
論文頁數: 37
中文關鍵詞: 自動作文評分深度學習
外文關鍵詞: Automatic Essay Scoring, Deep learning
相關次數: 點閱:73下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了自動生成作文題目和學生作文之間相關性分數的方法。我們將作文和題目拆分為句子,然後利用基於預訓練語言模型來學習題目與作文之間的相關性。此方法包括使用語言模型生成句子內嵌向量(embedding),學習這些內嵌向量之間的關係,並用內嵌向量來訓練分類器。在執行時,作文題目和學生作文各自拆分為句子,然後通過層次模型生成最終分數。模型對學生作文評分的表現與人類評分者相當,而且比起基準方法(baseline)相比,有更好的結果。


    We introduce a method for automatically generating a relevancy score given a student essay and the essay prompt. In our approach, the essays and prompts are split into sentences. Then we utilize transformer-based pre-trained language models to learn the relevancy between the prompt and the essay. The method involves generating sentence embeddings using sentence transformers, learning the relationships between these embeddings, and training a classifier based on them. At run-time, essays and prompts are broken down into sentences, then passed into a hierarchical model to produce the final score. Blind evaluation on a set of real learner essays shows that it performs comparable to human raters and outperforms the baseline method.

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . .. 4 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . 7 3.2 Data preprocessing and model training . . . . . . . . . . . 9 3.2.1 Tokenizing and Splitting Essay prompt and Essay into sentences. . . . . . . . . . 9 3.2.2 Transforming Sentences into Sentence Embeddings . . . . . 10 3.2.3 Generating Relevancy Scores . . . . . . . . . . . . . . . . . . 11 3.3 Run-time Relevancy Score Prediction . . . . . . . . . . . . . . . . . 11 4 Experiment . . . . . . . . . . . 13 4.1 Datasets and Toolkits . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2.1 Preparing Dataset for relevancy scoring . . . . . . . . . . . . 16 4.2.2 Expert analytic scoring . . . . . . . . . . . . . . . . . . . . . 16 4.2.3 Preparing Dataset for off-topic essay detection . . . . . . . . 16 4.3 Relevancy Scoring Model Training . . . . . . . . . . . . . . . . . . . 17 4.3.1 Model Compared . . . . . . . . . . . . . . . . . . . . . . . . 17 4.4 Training Off-topic Essay Detection Model . . . . . . . . . . . . . . 19 4.4.1 Models Compared . . . . . . . . . . . . . . . . . . . . . . . . 19 4.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.5.1 Evaluation Metrics for relevancy scoring . . . . . . . . . . . 20 4.5.2 Evaluation Metrics for off-topic essay detection . . . . . . . 22 5 Evaluation Results . . . . . . . . . . . 24 5.1 Results of Expert Annotated relevancy score . . . . . . . . . . . . . 24 5.2 Results of Off-topic Essay detection . . . . . . . . . . . . . . . . . . 25 6 Conclusion and Future Work . . . . . . . . . . .28 Appendices . . . . . . . . . . . 29 Reference . . . . . . . . . . . 33

    Yigal Attali and Jill Burstein. Automated essay scoring with e-rater® v.2. The
    Journal of Technology, Learning and Assessment, 4(3), Feb. 2006. URL https:
    //ejournals.bc.edu/index.php/jtla/article/view/1650.
    Majdi H. Beseiso and Saleh Alzahrani. An empirical analysis of bert embedding for
    automated essay scoring. International Journal of Advanced Computer Science
    and Applications, 2020. URL https://api.semanticscholar.org/CorpusID:
    229256901.
    Steven Bird and Edward Loper. NLTK: The natural language toolkit. In Proceed-
    ings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217,
    Barcelona, Spain, July 2004. Association for Computational Linguistics. URL
    https://aclanthology.org/P04-3031.
    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka-
    plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,
    Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger,
    Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu,
    Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin,
    Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish,
    33
    Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot
    learners, 2020. URL https://arxiv.org/abs/2005.14165.
    Minping Chen and Xia Li. Relevance-based automated essay scoring via hierar-
    chical recurrent model. In 2018 International Conference on Asian Language
    Processing (IALP), pages 378–383, 2018. doi: 10.1109/IALP.2018.8629256.
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-
    training of deep bidirectional transformers for language understanding, 2019.
    URL https://arxiv.org/abs/1810.04805.
    Kosuke Doi, Katsuhito Sudoh, and Satoshi Nakamura. Automated essay scoring
    using grammatical variety and errors with multi-task learning and item response
    theory, 2024. URL https://arxiv.org/abs/2406.08817.
    Fei Dong and Yue Zhang. Automatic features for essay scoring – an empirical
    study. In Jian Su, Kevin Duh, and Xavier Carreras, editors, Proceedings of
    the 2016 Conference on Empirical Methods in Natural Language Processing,
    pages 1072–1077, Austin, Texas, November 2016. Association for Computational
    Linguistics. doi: 10.18653/v1/D16-1115. URL https://aclanthology.org/
    D16-1115.
    Fei Dong, Yue Zhang, and Jie Yang. Attention-based recurrent convolutional
    neural network for automatic essay scoring. In Roger Levy and Lucia Specia,
    editors, Proceedings of the 21st Conference on Computational Natural Language
    Learning (CoNLL 2017), pages 153–162, Vancouver, Canada, August 2017. As-
    sociation for Computational Linguistics. doi: 10.18653/v1/K17-1017. URL
    https://aclanthology.org/K17-1017.
    34
    William Falcon and The PyTorch Lightning team. PyTorch Lightning, March
    2019. URL https://github.com/Lightning-AI/lightning.
    Zhiwei Jiang, Tianyi Gao, Yafeng Yin, Meng Liu, Hua Yu, Zifeng Cheng, and Qing
    Gu. Improving domain generalization for prompt-aware essay scoring via dis-
    entangled representation learning. In Anna Rogers, Jordan Boyd-Graber, and
    Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Asso-
    ciation for Computational Linguistics (Volume 1: Long Papers), pages 12456–
    12470, Toronto, Canada, July 2023. Association for Computational Linguistics.
    doi: 10.18653/v1/2023.acl-long.696. URL https://aclanthology.org/2023.
    acl-long.696.
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation
    of word representations in vector space, 2013. URL https://arxiv.org/abs/
    1301.3781.
    Atsushi Mizumoto and Masaki Eguchi. Exploring the potential of using an ai lan-
    guage model for automated essay scoring. Research Methods in Applied Linguis-
    tics, 2(2):100050, 2023. ISSN 2772-7661. doi: https://doi.org/10.1016/j.rmal.
    2023.100050. URL https://www.sciencedirect.com/science/article/pii/
    S2772766123000101.
    Ellis Batten Page. Project Essay Grade: PEG., pages 43–54. Automated essay
    scoring: A cross-disciplinary perspective. Lawrence Erlbaum Associates Pub-
    lishers, Mahwah, NJ, US, 2003. ISBN 0-8058-3973-9 (Hardcover).
    Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using
    siamese bert-networks. In Proceedings of the 2019 Conference on Empirical
    35
    Methods in Natural Language Processing. Association for Computational Lin-
    guistics, 11 2019. URL https://arxiv.org/abs/1908.10084.
    Jingbo Sun, Tianbao Song, Jihua Song, and Weiming Peng. Improving automated
    essay scoring by prompt prediction and matching. Entropy, 24(9), 2022. ISSN
    1099-4300. doi: 10.3390/e24091206. URL https://www.mdpi.com/1099-4300/
    24/9/1206.
    Kaveh Taghipour and Hwee Tou Ng. A neural approach to automated essay
    scoring. In Jian Su, Kevin Duh, and Xavier Carreras, editors, Proceedings
    of the 2016 Conference on Empirical Methods in Natural Language Processing,
    pages 1882–1891, Austin, Texas, November 2016. Association for Computational
    Linguistics. doi: 10.18653/v1/D16-1193. URL https://aclanthology.org/
    D16-1193.
    Yi Tay, Minh C. Phan, Luu Anh Tuan, and Siu Cheung Hui. Skipflow: Incorpo-
    rating neural coherence features for end-to-end automatic text scoring. CoRR,
    abs/1711.04981, 2017. URL http://arxiv.org/abs/1711.04981.
    Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, and Furu Wei. Minilmv2:
    Multi-head self-attention relation distillation for compressing pretrained trans-
    formers, 2021. URL https://arxiv.org/abs/2012.15828.
    Yongjie Wang, Chuang Wang, Ruobing Li, and Hui Lin. On the use of bert
    for automated essay scoring: Joint learning of multi-scale essay representa-
    tion. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir
    Meza Ruiz, editors, Proceedings of the 2022 Conference of the North American
    Chapter of the Association for Computational Linguistics: Human Language
    36
    Technologies, pages 3416–3425, Seattle, United States, July 2022. Association
    for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.249. URL
    https://aclanthology.org/2022.naacl-main.249.
    Bokai Yang, Sungjin Nam, and Yuchi Huang. “why my essay received a 4?”: A
    natural language processing based argumentative essay structure analysis. In
    Ning Wang, Genaro Rebolledo-Mendez, Noboru Matsuda, Olga C. Santos, and
    Vania Dimitrova, editors, Artificial Intelligence in Education, pages 279–290,
    Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-36272-9.
    Ruosong Yang, Jiannong Cao, Zhiyuan Wen, Youzheng Wu, and Xiaodong He.
    Enhancing automated essay scoring performance via fine-tuning pre-trained lan-
    guage models with combination of regression and ranking. In Trevor Cohn, Yu-
    lan He, and Yang Liu, editors, Findings of the Association for Computational
    Linguistics: EMNLP 2020, pages 1560–1569, Online, November 2020. Associa-
    tion for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.141.
    URL https://aclanthology.org/2020.findings-emnlp.141.
    Yupin Yang, Jiang Zhong, Chen Wang, and Qing Li. Exploring relevance and
    coherence for automated text scoring using multi-task learning. pages 323–328,
    07 2022. doi: 10.18293/SEKE2022-024.

    QR CODE