簡易檢索 / 詳目顯示

研究生: 楊媛茜
Yuan-Chien Yang
論文名稱: 應用於語言學習與測驗之網路為本語意分析
Web-Based Semantic Processing for Self-Paced Language Learning and Assessment
指導教授: 張俊盛
Jason S. Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 70
中文關鍵詞: 語意分析語義辨析重述
外文關鍵詞: Semantic Processing, Word Sense Disambiguation, Paraphrasing
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 電腦輔助自動出題(Computer Assisted Item Generation)為自然語言處理(Natural Language Processing)領域近年來剛起步的研究,有相當大的應用潛力,可提供電腦輔助語言學習(Computer Assisted Language Learning)所急需的自動化工具。結合自然語言處理與網路資料庫(Web-as-Corpus)技術來輔助學術閱讀更是近年來的重點研究項目,其目的在於使閱讀內容更為豐富與易於吸收,另一方面亦可輔助閱讀理解測驗(Reading Comprehension Test)的半自動化出題。

    在本論文中,我們提出以網路資源為本的創新概念,來針對學習性(Learned Genre)的文章進行語意分析(Semantic Processing)。在訓練階段我們先對一篇隨機抽取的閱讀文章進行詞性分析、基本片語分析,進而抽取出文章中的關鍵詞和動詞與名詞的搭配詞(V-N Collocation)來針對原文進行語義辨析(Word Sense Disambiguation)與重述(Paraphrasing)。我們實際製作了程式,以隨機抽取的二十篇托福閱讀文章來進行測試,並針對名詞的語義辨析與動詞的重述設計不同的評估方式。名詞語義辨析部分在較嚴苛的測試條件下,得到近六成的精確率;動詞重述部分則得到八成五的涵概率。經由實驗結果我們發現,以網路資源為本的語意分析的確有相當大的應用潛力,能跳脫傳統以資料庫進行語意分析時所面臨的資訊不足問題,亦可善加利用豐富多變與即時更新的網路資源來輔助與增益自我導向式(Self-Paced)的語言學習。


    There has been increasing interest in exploiting Natural Language Processing (NLP) technology in Computer Assisted Language Learning (CALL). Advances have been made in automatic rating of essays in standardized tests. There is also a need for automatic programs that generate test items that, after minor post-editing, are applicable in self-paced learning and low-stakes testing situations. This paper presents a novel NLP-based approach to facilitate the reading process of self-paced online learning, and to assist the semi-automatic generation of test items for reading comprehension tests (RCTs).
    The method involves identifying key words and key sentences, disambiguating word sense of the key words, paraphrasing part of the sentences, displaying disambiguated keyword definitions and paraphrased verb phrase alternatives. For that, senses of words are transformed into a set of sense-related queries combined to be with context information to collect disambiguation information or paraphrase data from the Web. We implement the proposed method based on the concept of Web-as-Corpus (WAC) for the semantic processing of word sense disambiguation and paraphrasing. Evaluation on a set of official TOEFL reading passages suggests that such a procedure is effective in terms of time, labor, and quality. Our methodology clearly provides potential for exploiting the web-based data, turning authentic texts into enriched reading materials, and assisting the generation of effective test items for reading comprehension tests.

    摘要 i ABSTRACT ii 致謝辭 iii Table of Contents iv List of Tables vi List of Figures vii Chapter 1 Introduction 1 1.1 Computer Assisted Extensive Reading 1 1.2 Computer Assisted Item Generation 3 1.3 Organization 9 Chapter 2 Related Work 10 Chapter 3 Web-Based Semantic Processing 15 3.1 Problem Statement 15 3.2 Transform Dictionary Information into Effective Queries 17 3.2.1 Prepare a Query Table for Disambiguating Key Noun Phrases 18 3.2.2 Prepare a Query Table for Paraphrasing Verb Phrases 22 3.3 A Web-based Procedure for Semantic Processing 25 3.3.1 Preprocess the Given Context 26 3.3.2 Disambiguate Word Sense of the Key Terms 27 3.3.3 Paraphrase Part of the Key Sentences 32 3.3.4 Output of the Alternative Sentences for the Extracted Key Sentences 35 Chapter 4 Experiments and Analysis 38 4.1 Training the Semantic Processing Query Tables 38 4.2 Evaluation Metrics 40 4.2.1 Metric for Key Word Sense Disambiguation 41 4.2.2 Metric for Verb Phrase Paraphrasing 41 4.3 Evaluation Results 43 4.3.1 Evaluation Result of Key Word Sense Disambiguation 43 4.3.2 Evaluation Result of Verb Phrase Paraphrasing 45 4.4 Discussion 47 4.4.1 Limitation and Future Development of the WSD Procedure 47 4.4.2 Limitation and Future Development of the Paraphrasing Procedure 53 Chapter 5 Conclusion and Future Work 56 5.1 Future Work 56 5.2 Conclusion 57 References 58 Appendix A – WordNet Glossary of Terms 62 Appendix B – An Example Article in the Learned Genre 63 Appendix C – Enriched Text Given in Appendix B 65 Appendix D – Verb Frames in WordNet 2.0 67 Appendix E – Evaluation Result of Noun Disambiguation (Sample of All-Word Setting) 68 Appendix F – Evaluation Result of Noun Disambiguation (Sample of Sample-Word Setting) 69 Appendix G – Evaluation Result of Verb Paraphrasing ( Sample ) 70

    Agirre E., & Martinez D. (2004). The effect of bias on an automatically-built word sense corpus. Proc. of the 4rd International Conference on Language Resources and Evaluations (LREC).
    Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of make in native and non-native student writing. Applied Linguisics, 22(2), 173-194.
    Barzilay, R., McKeown, K., & Elhadad, M. (1999). Information fusion in the context of multi-document summarization. Proc. of the 37th Association for Computational Linguistics (ACL’99), 550-557.
    Barzilay, R., & McKeown, K. (2001). Extracting paraphrases from a parallel corpus. Proc. of ACL-EACL2001, 50-57.
    Baumann, J. F., Kame‘enui, E. J., & Ash, G. E. (2003). Research on vocabulary instruction: Voltaire redux. In J. Flood, D. Lapp, J. R. Squire, & J. M. Jensen (Eds.), Handbook on research on teaching the English language arts (2nd ed., pp. 752-785). Mahwah, NJ: Erlbaum.
    Bruce, R., & Wiebe, J. (1994). Word-Sense Disambiguation Using Decomposable Models. Proc. of the 32nd Annual Meeting of the Association for Computational Linguistics.
    Carver, R. P. (1973). Reading as reasoning: Implications for measurement. In W. H. MacGinitie (Ed.), Assessment problems in reading. Newark, DE: International Reading Association.
    Chalhoub-Deville, M. (2001). Language testing and technology: past and future. Language Learning & Technology, 5(2), 95–98.
    Chang, Y.-C. (2005). An Automatic Collocation Writing Assistant for Taiwanese EFL Learners Based on NLP Technology. A Thesis Presented to the National Tsing Hua University for the Degree Master of Computer Science, 1-48.
    Chapman, K. B. (2005). The Marino Mission: One Girl, One Mission, One Thousand Words; 1000 Need-to-Know *SAT Vocabulary Words. Location: Cliffs Notes.
    Cheng, C.-C. (2004). Word-focused extensive reading with guidance. Selected Papers from the 13th International Symposium and Book Fair on English Teaching, 24-32.
    Deane, K. Sheehan. (2003). Automatic item generation via frame semantics, Education Testing Service.
    Gale, W. A., Church, K. W., & Yarowsky, D. (1992) One sense per discourse. Proc. of the workshop on Speech and Natural Language.
    Gao, Z.-M. (2002). An Automatic Web-Based Computer-Adaptive Vocabulary Testing System. Proc. of the Conference and Workshop on TEFL & Applied Linguistics.
    Haastrup, K. (1987). Using thinking aloud and retrospection to uncover learners’ lexical inferencing procedures. In C. Faerch & G. Kasper (Eds.), Introspection in second language research (pp. 197-212). Clevedon, UK: Multilingual Matters.
    Hanks, P., & Church, K. W. (1990) Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22-29.
    Henning, G. (1986). Item banking via dBase II: The UCLA ESL Proficiency Examination experience. In C. W. Stansfield (Ed.), Technology and language testing (pp. 69-77). Washington, DC: TESOL.
    Jian, J.-Y., Chang, Y.-C., & Chang, Jason S. (2004) TANGO: Bilingual Collocational Concordancer. Proc. of the 42th Annual Meeting of Association for Computational Linguistics.
    Leacock, C., Towell, G., & Voorhees, E.M. (1993) Toward building contextual representations of word senses using statistical models. Proc. of the 1993 ACL SIGLEX Workshop - Acquisition of Lexical Knowledge from Text.
    Leacock, C., Chodorow, M., & Miller, G.A. (1998) Using Corpus Statistics and WordNet Relations for Sense Identication. Computational Linguistics, 24(1), 147-166.
    Li, C., & Li, H. (2002) Word Translation Disambiguation Using Bilingual Bootstrapping. Proc. of the 40th Ann. Meeting Assoc. Computational Linguistics, 343-351.
    Lin, D., & Pantel, P. (2001). Discovery of inference rules for question-answering. Natural Language Engineering, 7, 343–360.
    Liu, C.-L., Wang, C-H., Gao, Z.-M., & Huang, S.-M. (2005). Applications of Lexical Information for Algorithmically Composing Multiple-Choice Cloze Items. Proc. of the Second Workshop on Building Educational Applications Using NLP, 1-8.
    Mihalcea, R., & Moldovan, D. (1999) An Automatic Method for Generating Sense Tagged Corpora. Proc. of AAAI '99, 461-466.
    Miller, G. A., & Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1), 1-28.
    Mitkov, R., & Ha, L.A. (2003). Computer-Aided Generation of Multiple-Choice Tests. Proc. of the HLT-NAACL03 Workshop on Building Educational Applications Using NLP, 17-22.
    Paribakht, T. S., & Wesche, M. (1999). Reading and “incidental” L2 vocabulary acquisition: An introspective study of lexical inferencing. Studies in Second Language Acquisition, 21, 195-218.
    Schütze, H. (1992). Dimensions of Meaning. Proc. Supercomputing 92, 787-796.
    Sekine, S. (2005) Automatic paraphrase discovery based on context and keywords between NE pairs. Proc. of International Workshop on Paraphrase, 80-87.
    The Official Guide to the New TOEFL® iBT. (2006). Location: Educational Testing Service.
    Wang, C.-H., Liu, C.-L., & Gao, Z.-M. (2003). Toward computer assisted item generation for English vocabulary tests (電腦輔助英文字彙出題系統之研究). Proc. of the 2003 Joint Conference on Artificial Intelligence, Fuzzy Systems, and Grey Systems (TAAI'03), CD-ROM.
    Wang, C.-H., Liu, C.-L., & Gao, Z.-M. (2004). 利用自然語言處理技術自動產生英文克漏詞試題之研究. Proc. of the Sixteenth Conference on Computational Linguistics and Speech Processing (ROCLING XVI), 111-120.
    Yang, C.-Y., & Hung, Jason C. (2006) Word Sense Determination using WordNet and Sense Co-occurrence. aina, Proc. of the 20th International Conference on Advanced Information Networking and Applications (AINA'06), 1, 779-784.
    Yang, Y.-C., Yang, J.-F., Chang, J.-M., & Chang, Jason S. (2005). 電腦輔助閱讀測驗自動出題. Proc. of the Sixteenth Conference on Computational Linguistics and Speech Processing (ROCLING XVII), 141-153.
    Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics, 189–196.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE