研究生: |
陳奕均 |
---|---|
論文名稱: |
利用混合式模型聯結搭配詞與詞網詞意 Associating Collocations with WordNet Senses Using Hybrid Models |
指導教授: | 張俊盛 |
口試委員: |
梁婷
高照明 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 42 |
中文關鍵詞: | 搭配詞分類 、字義解岐 、詞網 、最大熵模型 、意譯 |
外文關鍵詞: | collocation classification, word sense disambiguation, WordNet, maximum entropy model, Paraphrase |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們提出一個混合式模型將英文搭配詞歸類到由詞網中所選取出來的詞意分類中。此混合式模型包含了基於機器學習、基於意譯、詞意頻率排序等方式。在訓練機器學習模型時,我們使用了已經標註好詞意分類的搭配詞,並利用由大型語料庫中所抽取出來的句子以及跨語言資料來幫助訓練模型。在執行時,輸入的搭配詞所對應的詞意由投票來決定,而投票的依據包含了以下幾種方式:1.基於機器學習的方式所預測的詞意;2.基於意譯的方式所預測的詞意;3.由詞意頻率排序的方式所預測的詞意;輸入的搭配詞將會被歸類到獲得最高票的詞意。實驗結果顯示,我們所使用的混合式模型比起在本論文中所比較的其他方式表現有顯著的提升,並提供了更可靠的搭配詞與詞意配對以幫助編撰字典和搭配詞學習。
In this paper, we introduce a hybrid method to associate English collocations with sense class members chosen from WordNet. Our combinational approach includes a learning-based method, a paraphrase-based method and a sense frequency ranking method. At training time, a set of collocations with their tagged senses is prepared. We use the sentence information extracted from a large corpus and cross-lingual information to train a learning-based model. At run time, the corresponding senses of an input collocation will be decided via majority voting. The three outcomes participated in voting are as follows: 1. the result from a learning-based model; 2. the result from a paraphrase-based model; 3. the result from sense frequency ranking method. The sense with most votes will be associated with the input collocation. Evaluation shows that the hybrid model achieve significant improvement when comparing with the other method described in evaluation time. Our method provides more reliable result on associating collocations with senses that can help lexicographers in compilation of collocations dictionaries and assist learners to understand collocation usages.
Baker L. D. and McCallum A. K. 1998. Distributional clustering of words for text classification. Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval ACM. 96 p.
Bruce R. and Wiebe J. 1994. Word-sense disambiguation using decomposable models. Proceedings of the 32nd annual meeting on association for computational linguistics Association for Computational Linguistics. 139 p.
Ciaramita M. and Johnson M. 2003. Supersense tagging of unknown nouns in wordnet. Proceedings of the 2003 conference on empirical methods in natural language processing Association for Computational Linguistics. 168 p.
Curran J. R. 2005. Supersense tagging of unknown nouns using semantic similarity. Proceedings of the 43rd annual meeting on association for computational linguistics Association for Computational Linguistics. 26 p.
Davies M. 2008. The corpus of contemporary american english (coca): 400 million words, 1990-present. Available Online at Http://www.Americancorpus.Org .
Fellbaum C. 2010. WordNet. Theory and Applications of Ontology: Computer Applications :231-43.
Gale W. A., Church K. W. and Yarowsky D. 1992. One sense per discourse. Proceedings of the workshop on speech and natural language Association for Computational Linguistics. 233 p.
Hearst M. 1991. Noun homograph disambiguation using local context in large text corpora. Using Corpora :185-8.
Hearst M. A. and Schütze H. 1996. Customizing a lexicon to better suit a computational task. Proc. of the workshop on extracting lexical knowledge Citeseer.
Inumella A, Kilgarriff A, Kovar. Associating collocations with dictionary senses.
Jiang JJ and Conrath DW. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. Arxiv Preprint Cmp-lg/9709008.
Le Z. 2004. Maximum entropy modeling toolkit for python and C. Natural Language Processing Lab, Northeastern University, China.
Leacock C., Towell G. and Voorhees E. 1993. Corpus-based statistical sense resolution. Proceedings of the ARPA workshop on human language technology. 260 p.
Lesk M. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. Proceedings of the 5th annual international conference on systems documentation ACM. 24 p.
Lin D. 2003. Dependency-based evaluation of MINIPAR. Treebanks :317-29.
Lin D. 1993. Principle-based parsing without over generation. Proceedings of the 31st annual meeting on association for computational linguistics Association for Computational Linguistics. 112 p.
Miller GA. 1995. WordNet: A lexical database for English. Commun ACM 38(11):39-41.
Pearce D. 2001. Synonymy in collocation extraction. Proceedings of the workshop on WordNet and other lexical resources, second meeting of the north american chapter of the association for computational linguistics. 41 p.
Sinha R. and Mihalcea R. 2007. Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. Semantic computing, 2007. ICSC 2007. international conference on IEEE. 363 p.
Tsuruoka Y, Tateishi Y, Kim JD, Ohta T, McNaught J, Ananiadou S, Tsujii J. 2005. Developing a robust part-of-speech tagger for biomedical text. Advances in Informatics :382-92.
Widdows D. 2003. Unsupervised methods for developing taxonomies by combining syntactic and statistical information. Proceedings of the 2003 conference of the north american chapter of the association for computational linguistics on human language technology-volume 1Association for Computational Linguistics. 197 p.
Yarowsky D. 1995. Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd annual meeting on association for computational linguistics Association for Computational Linguistics. 189 p.
Yarowsky D. 1992. Word-sense disambiguation using statistical models of roget's categories trained on large corpora. Proceedings of the 14th conference on computational linguistics-volume 2 Association for Computational Linguistics. 454 p.