簡易檢索 / 詳目顯示

研究生: 高紅雯
Kate H. Kao
論文名稱: 建立英語分類搭配詞典:搭配詞之語意分類與標示
A Thesaurus-Based Semantic Classification of English Collocations
指導教授: 張俊盛
Jason S. Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 79
中文關鍵詞: 搭配詞語意概念概念分類檢索
外文關鍵詞: collocations, thesaurus, semantic classification, semantic relations, random walk algorithm, meaning access index
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個依照語意概念自動標示與分類搭配詞的新方法,旨在研究搭配詞語意概念索引對第二外語學習者之搭配詞能力及其與搭配詞學習工具之關係,同時並檢驗建立搭配詞概念索引架構應用於電腦輔助搭配詞學習工具的實質效益,以建立英語分類搭配詞詞典。搭配詞的研究逐漸受到重視,研究重心多放在字詞搭配教學對英文能力的提昇,認為搭配詞教學可以取代傳統二分法的字彙與文法教學,搭配詞中,動詞名詞和形容詞名詞的搭配是第二外語學習者最難掌握的,利用電腦輔助語言教學工具可提升第二外語學習者的學習效率。我們的方法首先會在訓練階段利用隨機漫步(Random Walk)演算法與一部依照概念索引的英語詞典,選出每一個字在特定概念下最相關的WordNet語意。接著,再利用WordNet中蘊涵的語義關係,針對每一個概念透過語意連結,擴充特定概念下最相關的詞彙。最後,我們從學習字典中,挑選出最難掌握其搭配詞用法的字為關鍵字,針對每一個動詞名詞和形容詞名詞搭配詞,再利用隨機漫步(Random Walk)演算法,自動為關鍵字下的搭配詞標示上最相關的語意概念,然後依照概念歸納原本零散的搭配詞條,為現有的電腦輔助搭配詞搜尋工具,建立有系統的概念層次檢索架構。我們實際撰寫程式,建立自動搭配詞語意標示分類系統,以859筆JustTheWord語言學習工具之搭配詞條為測試資料,與我們依概念分類系統產生的搭配詞群,進行效能比較。實驗的結果我們獲得近80%的準確率以及70%的召回率。實驗顯示我們的搭配詞群不但勝過JustTheWord的概念分群,本研究的自動分群結果更接近人類判斷,也說明用我們的方法進行語意標示及分類,的確可以幫助第二外語學習者的搭配詞學習效率及改善現有搭配詞學習工具的檢索效能。


    New computational tools for extracting collocations are a great boon to both language learners and lexicographers alike. A new method is proposed in this paper to organize the extremely numerous collocates that these tools can return into semantic thesaurus categories. The approach introduces a thesaurus-based semantic classification model automatically learning semantic relations for classifying adjective-noun (A-N) and verb-noun (V-N) collocations into different categories. As it is most relevant to language learners, the research focuses on the frequent patterns of collocation errors, A-N and V-N collocation pairs. Our model uses a random walk over vertices and edges on a weighted graph derived from WordNet semantic relations. We compute a semantic label stationary distribution via an iterative graphical algorithm. Semantic label of a collocate is scored by a novel divergence measure that imposes a thesaurus structure on collocation reference tools. In our experiment the resulting semantic relatedness is the WordNet-based measure, most highly correlated with human similarity judgments. The evaluation is conducted on a set of collocations whose collocates involve varying level of abstractness in the collocation usage box of Macmillan English Dictionary. We present our experimental evaluation with a collection of 150 multiple-choice questions commonly used as a similarity benchmark in TOEFL synonym test. The experimental results show that a thesaurus structure is successfully imposed to help enhance collocation production for L2 learners and significantly outperform existing collocation reference tools. The resulting semantic classification establishes close consistency among human judgments as fairly refined examples for evaluation of the model. The methodology neatly improves the performance of collocation reference tools and imposes semantic structure to collocations, which is a good starting point for a much improved and useful presentation of collocations and has been lived up to have positive consequences on robustness for semantic classification for collocations, an attractive feature for organizing broad-coverage machine-readable data to be merged together for catalogued usages of natural language processing.

    摘要 I ABSTRACT II TABLE OF CONTENTS III LIST OF FIGURES VI LIST OF TABLES VII CHAPTER 1 INTRODUCTION 1 CHAPTER 2 RELATED WORK 5 2.1 COMPUTATIONAL COLLOCATION REFERENCE TOOLS AND L2 LEARNERS 5 2.2 IMPOSING A THESAURUS STRUCTURE ON COLLOCATION REFERENCE TOOLS 7 2.3 MEANING ACCESS INDEXING IN DICTIONARIES 9 2.4 SIMILARITY OF SEMANTIC RELATIONS 10 CHAPTER 3 METHODOLOGY 14 3.1 PROBLEM WITH EXISTING COLLOCATION REFERENCE TOOLS 14 3.2 LEARNING TO BUILD SEMANTIC KNOWLEDGE BY ITERATIVE GRAPHICAL ALGORITHM 17 3.2.1 WORD SENSE ASSIGNMENT FOR INTEGRATED SEMANTIC KNOWLEDGE 17 3.2.2 EXTENDING THE COVERAGE OF THESAURUS 28 3.3 GIVING THESAURUS STRUCTURE TO COLLOCATION BY ITERATIVE GRAPHICAL ALGORITHMS 31 CHAPTER 4 EXPERIMENTAL SETTING 38 4.1 EXPERIMENTAL DATA 38 4.1.1 INPUT DATA 1: A THESAURUS FOR SEMANTIC KNOWLEDGE INTEGRATION 39 4.1.2 INPUT DATA 2: A WORD SENSE INVENTORY FOR SEMANTIC KNOWLEDGE EXTENSION 40 4.2 EXPERIMENTAL CONFIGURATIONS 41 4.2.1 STEP 1: INTEGRATING SEMANTIC KNOWLEDGE 41 4.2.2 STEP 2: EXTENDING SEMANTIC KNOWLEDGE 43 4.3 TEST DATA 44 4.4 RUNTIME OF SEMANTIC CLASSIFICATION FOR COLLOCATIONS 47 CHAPTER 5 RESULTS AND DISCUSSION 49 5.1 EVALUATION METRICS 49 5.1.1 PERFORMANCE EVALUATION FOR SEMANTIC CLUSTER SIMILARITY 49 5.1.2 ASSESSMENT OF COLLOCATION CLUSTER PERFORMANCE 51 5.1.3 CONFORMITY OF SEMANTIC LABELS 53 5.2 EVALUATION RESULTS 54 5.2.1 RESULTS OF THE COLLOCATION CLUSTER PERFORMANCE 55 5.2.2 RESULTS OF THE CONFORMITY OF SEMANTIC LABELS 57 CHAPTER 6 CONCLUSION 60 6.1 SUMMARY OF THE FINDINGS 60 6.2 IMPLICATIONS 61 6.3 LIMITATIONS OF THE STUDY AND SUGGESTIONS FOR FUTURE RESEARCH 61 REFERENCES 64 APPENDIX 1. A HEADWORD EXTRACTION OF COLLOCATION USAGE BOX FROM MACMILLAN ENGLISH DICTIONARY 68 APPENDIX 2. ASSESSMENT TEST ON THE PERFORMANCE OF COLLOCATION CLUSTERING 69 APPENDIX 3. GUIDELINES FOR EVALUATING SEMANTIC LABELS 78 APPENDIX 4. TEMINOLOGY GLOSSARY 78

    Benson, M. 1985. Collocations and Idioms. In R. Ilson (Ed.), Dictionaries, Lexicography and Language Learning (ELT Documents 120; Oxford: Pergamon), pp.61-8.
    Béjoint, H. 1994. Tradition and Innovation in Modern English Dictionaries. Oxford: Clarendon Press.
    Brants, T. and Franz, A. 2006. Web 1T 5-gram corpus version 1.1. Technical report, Google Research.
    Chen, Y. 2004. A corpus-based analysis of collocational errors in EFL Taiwanese High School students’ compositions. California State University, San Bernardino. June.
    Chklovski, T., and Pantel, P. 2004. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-04). pp. 33-40. Barcelona, Spain.
    Chen, J.N., and Chang, J. S. 1998. Topical clustering of MRD senses based on information retrieval techniques, Computational Linguistics, v.24 n.1, March 1998.
    Downing, SM, Baranowski, RA, Grosso, and LJ, Norcini, JJ. Item type and cognitive ability measured: the validity evidence for multiple true-false items in medical specialty certification. Appl Meas Educ 1995; 8:189-199.
    Deerwester, S. C., Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science (JASIS), 41(6):391–407.
    Firth, J.R. 1957. The Semantics of Linguistics Science. Papers in linguistics 1934-1951. London: Oxford University Press.
    Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
    Hall, G. 1994. Review of The Lexical Approach: The State of ELT and a Way Forward, by Michael Lewis. ELT Journal 44, 48.
    Heimlich, J.E., and Pittelman, S.D. 1986. Semantic mapping: Classroom applications. Newark, DE: International Reading Association.
    Hindle, D. 1990. Noun classification from predicate-argument structures. In Meeting of the Association for Computational Linguistics, pages 268-275.
    Hatzivassiloglou, V., and McKeown, K. R. 1993. Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning. In Proceedings of the 31st Annual Meeting of the ACL, pages 172–182.
    Jian, J. Y., Chang, Y. C., and Chang, J. S. 2004. TANGO: Bilingual Collocational Concordancer, Post & demo in ACL 2004, Barcelona.
    Johnson, D.D., and Pearson, P.D. 1984. Teaching reading vocabulary. New York: Holt, Rinehart & Winston.
    Kilgarriff, A. 1997. I Don’t Believe in Word Senses, In: Computers and the Humanities. Volume 31, Number 2, 1997 , pp. 91-113(23)
    Kilgarriff, A. and D. Tugwell. 2001b. “WORD SKETCH: Extraction and Display of Significant Collocations for Lexicography” Proceedings of COLLOCATION: Computational Extraction, Analysis and Exploitation workshop, 39th ACL and 10th EACL, 32-38.
    Kemp JE, Morrison GR, Ross SM. 1994. Developing evaluation instruments. In: Designing Effective Instruction. New York, NY: MacMillan College Publishing Company, 1994: 180-213
    Lewis, M. 1997. Implementing the lexical approach. Hove, England: Language Teaching Publications.
    Lewis, M. 2000. Language in the Lexical Approach. In. M. Lewis (ed.) Teaching Collocation: Further development in the Lexical Approach. London, Language Teaching Publications.
    Landauer, T. and Susan T. Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211-240.
    Lesk, Michael E. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of ACM SIGDOC ’86, pages 24–26, New York, NY.
    Lin, D. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In Meeting of the Association for Computational Linguistics, pages 64-71.
    Liu, L. E. 2002. A corpus-based lexical semantic investigation of vernb-noun miscollocations in Taiwan learners’ English. Tamkang University, Taipei, January.
    Morris, Jane and Graeme Hirst. 1991. Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21–48.
    Nattinger, J.R. and DeCarrico, J.S. 1992. Lexical Phrases and Language Learning. Oxford: Oxford University Press.
    Nesselhauf N. 2003. The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics 24(2), 223-242.
    Nation, I. S. P. 2001. Learning vocabulary in another language. Cambridge: Cambridge Press.
    Nirenburg, S. and Raskin, V. 1987. The subworld concept lexicon and the lexicon management system, In Computational Linguistics, v. 13, December 1987.
    Nastase, V. and Szpakowicz, S. 2003. Exploring noun–modifier semantic relations. In Fifth International Workshop on Computational Semantics (IWCS-5), pages 285–301, Tilburg, the Netherlands.
    Padó, S. and Lapata, M. 2007. Dependency-Based Construction of Semantic Space Models. Computational Linguistics, 33(2):161-199.
    Roediger, H. L., III, and Marsh, E. J. 2005. The positive and negative consequence of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155-1159.
    Readence, J. E., and Searfoss, L.W. 1986. Teaching strategies for vocabulary development. In E. K. Dishner, T. W. Bean, J. E. Readence, & D. W. Moore (Eds.), Reading in the content areas: Improving classroom instruction (2nd ed., pp. 183-188). Dubuque, IA: Kendall/ Hunt.
    Rehder, B., M. E. Schreiner, Michael B.W. Wolfe, Laham, D., Thomas K. Landauer, and Kintsch, W. 1998. Using latent semantic analysis to assess knowledge: Some technical considerations. Discourse Processes, 25:337–354.
    Scholfield, P. 1982. Using the English dictionary for comprehension. TESOL Quarterly 16: 185 194
    Salton, G. 1989. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addidon-Wesley.
    Sinatra, R., Beaudry, I. Pizzo, I., & Geishart, G. 1994. Using a computer-based semantic mapping, reading and writing approach with at-risk fourth graders. Journal of Computing in Childhood Education, 5, 93-112.
    Tono, Y. 1984. On the Dictionary User's Reference Skills. Unpublished B.Ed. Thesis. Tokyo: Tokyo Gakugei University.
    Tono, Y. 1992. The Effect of Menus on EFL Learners' Look-up Processes. LEXICOS 2 (AFRILEX Series) Stellenbosch: Buro Van de Watt
    Taba, H. 1967. Teacher's handbook for elementary social studies. Reading, MA: Addison-Wesley.
    Turney, Peter D. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), pages 417–424, Philadelphia, PA.
    Turney, Peter D. 2006. Similarity of Semantic Relations. Computational Linguistics, 32(3):379–416.
    Wanner, L., Bohnet, B., & Giereth, M. 2006. What is beyond Collocations? Insights from Machine Learning Experiments. EURALEX.

    Dictionaries

    Benson, M., Benson, E., & Ilson, R. 1986a. The BBI Dictionary of English Word Combination. John Benjamins Publishing Company. ISBN 1-55619-521-4.
    Deuter, M., Greenan, J., Noble, J., & Phillips, J. 2002. Oxford Collocations Dictionary for Students of English. Oxford: Oxford University Press. ISBN 978-0-19-431243-1.
    Glazier, S. D., 1997. Random House Word Menu. Random House Publishing Group. 810pp. ISBN 978-0-34-541441-0.
    McArthur, T. 1981. Longman Lexicon of Contemporary English. Harlow, Essex: Longman, 910pp. ISBN 0-582-55527-2.
    Procter, P. (ed.) 1995. Cambridge International Dictionary of English. Cambridge: Cambridge University Press. ISBN 0-521-77575-2.
    Rundell M. (ed.) 2002. Macmillan English Dictionary for Advanced Learners. Oxford: Macmillan Publishers. 692 pp. ISBN 0-333-95786-5.
    Summers, D. (Director) 1995. Longman Dictionary of Contemporary English (Third edition) Longman, Harlow. ISBN 0-582-43397-5.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE