研究生: |
凱 碧 Silvia Gabriela Herrera Poggio |
---|---|
論文名稱: |
篇章結構自動評分:運用 ChatGPT 進行資料標記 Automatic Organization Scoring: Leveraging ChatGPT for Data Annotation |
指導教授: |
張俊盛
Chang, Jason S. 胡敏君 Hu, Anita |
口試委員: |
高宏宇
Kao, Hung-Yu 黃芸茵 Huang, Yun-Yin |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 33 |
中文關鍵詞: | 作文自動評分 、ChatGPT 、評分模型 |
外文關鍵詞: | Automatic essay Scoring, ChatGPT, Scoring model |
相關次數: | 點閱:158 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們提出了一套針對作文組織面向的生成式自動評分方法。在這套方法中,我們為每個句子都額外標註組織結構的資訊,以最大化的提升模型在組織面向評分的準確性。這個方法涉及了使用ChatGPT標註中級學生作文的句子,為文章提供了額外的結構特徵資訊,搭配專業英文老師所批閱之作文整體性評分,用以訓練自動評分模型,並進而能產生組織面向的分項評分。在執行階段中,輸入的作文句子分別被標註上組織結構的特徵,作為額外資訊輸入模型以取得評分。在一組真實學生作文測試集的評估顯示,這個方法的評分效果相當接近專業英文老師的批閱水準。這證明了我們的方法在缺乏分項評分的作文資料集上,也能有效的提供組織面向的分項自動評分,且達到相當良好的效能。
We introduce a method for automatically generating an organizational aspect score for student essays. In our approach, the essay is separated into sentences, which are transformed into structurally annotated sentences aimed at maximizing the probability of obtaining an accurate organizational score. The method involves leveraging ChatGPT to enrich intermediate-level essay dataset with tags that highlight the structural characteristics of the essay, training a model using the structurally annotated dataset and the corresponding holistic score to then automatically generate organizational scores. At run-time, the input essay is transformed into a sentence-level structurally annotated essay, which is then fed into the model to derive the score. Blind evaluation on a set of real learner essays shows that the achieves comparable performance to human evaluators. Our methodology cleanly supports automatic organization score, yielding reasonably good performance results.
Abdi, H. (2007). Z-scores. Encyclopedia of measurement and statistics, 3 , 1055–
1058.
Brack, A., Entrup, E., Stamatakis, M., Buschermöhle, P., Hoppe, A., & Ewerth, R.
(2024). Sequential sentence classification in research papers using cross-domain
multi-task learning. International Journal on Digital Libraries, 1–24.
Cantor, A. B. (1996). Sample-size calculations for cohen’s kappa. Psychological
methods, 1 (2), 150.
Center, U. E. E. (2024). 107 academic year subject ability test english test scoring cri-
teria explanation. Retrieved from https://www.ceec.edu.tw/xcepaper/cont
?xsmsid=0J066588036013658199&qperoid=0J133544156387960269&sid=
0J133630011608125564 (Updated 2024-06-14. Accessed 2024-06-14)
Do, H., Kim, Y., & Lee, G. G. (2024). Autoregressive score generation for multi-trait
essay scoring. arXiv preprint arXiv:2403.08332 .
Gilardi, F., Alizadeh, M., & Kubli, M. (2023). Chatgpt outperforms crowd workers
for text-annotation tasks. Proceedings of the National Academy of Sciences,
120 (30), e2305016120.
Haller, S. (2020). Automatic short answer grading using text-to-text transfer
transformer model (Unpublished master’s thesis). University of Twente.
Ibekwe-SanJuan, F., Chen, C., & Roberto, P. (2008). Identifying strategic informa-
tion from scientific articles through sentence classification. In 6th international
conference on language resources and evaluation conference (lrec-08) (pp. 1518–
1522).
Kim, J., & Kim, J. (2018). The impact of imbalanced training data on machine
learning for author name disambiguation. Scientometrics, 117 (1), 511–526.
Lagakis, P., & Demetriadis, S. (2021). Automated essay scoring: A review
of the field. In 2021 international conference on computer, information and
telecommunication systems (cits) (p. 1-6). doi: 10.1109/CITS52676.2021.9618476
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement
for categorical data. Biometrics, 33 (1), 159–174. Retrieved 2024-05-25, from
http://www.jstor.org/stable/2529310
Lee, S.-H. (2023). Writingprofile: Learning to predict trait-specific scores for learner
essays (Master’s thesis, National Tsing Hua University, Hsinchu, Taiwan). Re-
trieved from https://etd.lib.nycu.edu.tw/cgi-bin/gs32/hugsweb.cgi?o=
dnthucdr&s=id=%22G021090657020%22.&searchmode=basic (Advisor: Jason
S. Chang. Committee Members: Chih-Hsing Chang, Chao-Ming Gao, Jo-Chi
Hsiao. Student ID: 109065702. Year of Publication: 112 (R.O.C.). Academic Year
of Graduation: 111. Language: English. Pages: 34)
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an ai language
model for automated essay scoring. Research Methods in Applied Linguistics,
2 (2), 100050.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., . . . Liu,
P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text
transformer. Journal of machine learning research, 21 (140), 1–67.
Samosa, R. C., Sayong, J. M., Gonzales, M. P., Dacusan, R. G., & Menguito, V.
(2021). Opinion, reason, explanation and opinion (oreo) as an innovation to
improve learners’ writing skills among grade four learners. Online Submission,
5 (12), 166–172.
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in
psychology: Implications for training of researchers. Psychological methods, 1 (2),
115.
Shermis, M. D., & Barrera, F. D. (2002). Exit assessments: Evaluating writing
ability through automated essay scoring. Non-Journal.
Törnberg, P. (2023). Chatgpt-4 outperforms experts and crowd workers in an-
notating political twitter messages with zero-shot learning. arXiv preprint
arXiv:2304.06588 .
Training, L., & Center, T. (n.d.). Language training and testing center. Retrieved
from https://www.lttc.ntu.edu.tw/ (Accessed: 2024-06-01)
Uto, M., & Okano, M. (2020). Robust neural automated essay scoring using
item response theory. In Artificial intelligence in education: 21st international
conference, aied 2020, ifrane, morocco, july 6–10, 2020, proceedings, part i 21
(pp. 549–561).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . .
Polosukhin, I. (2017). Attention is all you need. Advances in neural information
processing systems, 30 .
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., . . . Schmidt, D. C.
(2023). A prompt pattern catalog to enhance prompt engineering with chatgpt