研究生: |
江詠筑 Chiang, Yung-Chu |
---|---|
論文名稱: |
WordGenie:從已標注錯誤之英文學習者語料庫分析易混淆詞及教學 WordGenie: Learning analytics of miscollocations in an annotated learner corpus |
指導教授: |
張俊盛
CHANG, JYUN-SHENG |
口試委員: |
陳浩然
Chen, Hao-Jan 杜海倫 Tu, Hai-Lun 蕭若綺 Hsiao, Jo-Chi |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2025 |
畢業學年度: | 113 |
語文別: | 英文 |
論文頁數: | 69 |
中文關鍵詞: | 易混淆詞語料庫 、語義分群 、詞頻分析 、大型語言模型 、生成式微課程 、提示詞工程 |
外文關鍵詞: | Confusable Words Corpus, Semantic Collocate Clustering, Frequency analysis, Large Language Model, Micro-lesson generation, prompt engineering |
相關次數: | 點閱:15 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一套基於英語學習者寫作中常見錯誤,提供語境適切的替代詞與搭配詞分類方法。該方法將學習者錯誤與更正搭配詞組從多個語料庫中挑出,並結合詞頻統計、語境分群,進一步依據不同統計視角進行排序與篩選。根據研究方法,開發了一個名為 WordGenie 的系統,收錄學習者語料庫中常出現的大量錯誤字分析結果,協助學習者克服易混淆單字的寫作錯誤。實驗結果顯示,WordGenie 在尋找易混淆單字、根據語意情境的單字替換詞、提供利於記憶的字彙學習教材,皆補足現有文法寫作工具缺乏的資源,能有效減低學習者的認知負荷,更輕鬆有效地學習。本系統亦支援整合多個語料庫資源,透過詞頻導向、語境感知與語義分群等機制的結合,有效提升詞彙建議的準確性與實用性。
We introduce a method for generating reference material derived from confusable words in annotated learner corpora in English. In our approach, corpora are transformed into quantitative frequency data aimed at finding the most common errors, corrections, and collocations. The method involves clustering collocation patterns of replacement candidates, ranking the error, replacement, and collocation data based on frequency of occurrence in an annotated corpus, and filtering them using statistical features. At run-time, confusable words are identified in learner input, and the system retrieves and ranks alternative word suggestions by grouping and ranking structural statistical data. We present a prototype system, WordGenie, that implements our confusable-word learning method. Blind evaluation on a set of confusable-word instances shows that WordGenie significantly outperforms existing online writing tools in identifying confusable words and delivering context-sensitive vocabulary suggestions while reducing learners cognitive load. Our methodology supports the integration of multiple corpus resources, combining mechanisms such as word frequency orientation, context
awareness and semantic grouping, resulting in effectively improving the accuracy and practicality of vocabulary suggestions.