簡易檢索 / 詳目顯示

研究生: 謝靜婷
ChingTing Hsieh
論文名稱: 半自動建立中文WordNet之研究
Semi-Automatic Construction of Chinese WordNet -- Using Class-based Translation Model
指導教授: 張俊盛
Jason S. Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2002
畢業學年度: 90
語文別: 英文
論文頁數: 78
中文關鍵詞: 中文詞網以類別為本的機率模型字義翻譯之選擇
外文關鍵詞: Chinese WordNet, Class-based Statistical Model
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年,有愈來愈多以知識為本的自然語言處理研究使用WordNet作為主要的詞彙資訊來源。WordNet是由美國普林斯頓大學(Princeton University)所建立的英語詞彙字典,其中包含詞與詞之間的語意關連性,如上位詞、下位詞、同義字和反義字等。隨著WordNet廣泛的應用在機器翻譯、字義辨識和資訊檢索上,無疑地,WordNet成為詞彙語意研究的標準。因此,許多研究者都在試圖建立不同語言的WordNet。為了建立中文詞彙語意研究的基礎,中文WordNet的建立是值得我們研究的課題。
    目前許多研究建立WordNet的方法大致上是以英文WordNet為核心,取得上層字義為一組基礎概念,再加以擴展。擴展核心的方法大致上有兩種:一是expand,將英文WordNet的字義,翻譯至該語言;二是合併,將WordNet核心架構合併該語言已建立的特定架構。這兩種方法在過程中都會遭遇同樣的問題,即字義辨識與翻譯的選擇。建立中文WordNet時,對於一字多義的字彙,決定某一字義之相對應的中文翻譯,是我們必須解決的問題。在我們觀察雙語(英中)同義辭典時,發現同一類別的英文字,其中文翻譯往往含有許多共同的中文單字。因此,本論文提出一個以類別為本的機率模型,透過計算共同字彙的出現機率,來解決選擇字義之翻譯的問題。

    首先,我們將WordNet中的名詞予以分類,旨在獲得較大的概念群。接著利用雙語(英中)辭典,取得英文字的所有中文翻譯。由於同一類別中的中文翻譯傾向擁有一群特定的共同字,因此,計算每一類別中,中文字出現的機率,可以為該類別中的英文字選取最適當的中文翻譯。如此,便可初步建立中文WordNet。在實驗設計上,我們只先針對名詞作實驗,並且採用SEMCOR做為評估的語料庫來源。初步結果顯示,我們的實驗結果可以得到涵蓋率(Coverage)為76.43%,提供至多三個翻譯選擇時,召回率(Recall)可達將近90%。


    WordNet is a lexical database, which organizes English nouns, verbs, adjectives and adverbs according to word sense and relationship between senses. It has been applied increasingly to many knowledge-based NLP tasks as main lexical resource, because of it wide-coverage semantic and conceptual information. WordNets for many European languages other then English are being developed in recent years. This paper proposes an approach to semi-automatic construction of Chinese WordNet using a class-based statistical model.
    Our approach to the problem of constructing Chinese WordNet is via translation of English WordNet. The main problem we have to tackle is to select the appropriate word translation for each word sense. We observe that English words for a common concept tend to have common Chinese characters in their translations. Our method consists of 1) classifying English words into several semantic classes and 2) building a class-based statistical model for estimating word translation probabilities. We have carried out experiments on handling nouns in the WordNet and evaluate our results based on coverage and recall rate.

    The evaluation shows our approach can achieve 76.43% coverage. The recall rate is 70%, 80% and 90% when top 1, top 2, and top 3 translations are used respectively.

    摘要....................................................i Abstract................................................ii 致謝辭..................................................iii Table of Contents.......................................iv List of Tables..........................................vi List of Figures.........................................vii Chapter 1 Introduction..................................1 1.1 Method and Problems............................1 1.1.1 WordNet in a Nutshell..........................2 1.1.2 Assigning Chinese Translation to Synsets.......4 1.2 Organization of the Thesis.....................5 Chapter 2 Related Researches............................6 Chapter 3 Statistical Model for Lexical Translation.....9 3.1 Framework......................................11 3.2 The Model......................................16 3.3 Applying Class-based Translation Model.........21 Chapter 4 Experimental Results and Discussion...........26 4.1 Classifying Nouns..............................28 4.2 Calculating Translation Probability............32 4.3 Evaluation.....................................34 4.4 Demonstration..................................40 4.5 Discussion.....................................42 Chapter 5 Conclusion and Future Work....................45 Reference...............................................47 Appendix I — 500 Test Data.............................49

    1.Artale A., Magnini B. and Strapparava C. (1997) Lexical Discrimination with the Italian Version of WordNet, In Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources. Madrid. Spain.
    2.Atserias J., Climent S., Farreres X., Rigau G. and Rodríguez H. (1997) Combing Multiple Methods for the Automatic Construction of Multilingual WordNets, In Proceedings of International Conference “Recent Advances in Natural Language Processing” (RANLP’97). Tzigov Chark, Bulgaria.
    3.Chang, JS, D. Yu and CJ Lee (2002) Statistical Phrase Translation Model, Journal of Computational Linguistics and Chinese Language Processing
    4.Chu-Ren Huang, Elanna I.J. Tseng and Dylan B.S. Tsai (2002) Cross-lingual Protability of Semantic Relations: Bootstrapping Chinese WordNet with English WordNet Relations 第三屆中文詞彙語意學研討會論文集, pp. 225-248
    5.Farreres X., Rigau G. and Rodríguez H. (1998) Using WordNet for Building WordNets, In Proceedings of the Workshop of Usage of WordNet in NLPS, COLING-ACL’98, pp. 65-72
    6.Global WordNet Association BackGround Document http://www.hum.uva.nl/~ewn/gwa.htm
    7.Hamp, B. and Feldweg H. (1997) GermaNet – A Lexical–Semantic Net for German, In Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources. Madrid. Spain.
    8.Kucera, H. and W. N. Francis (1967) Computational Analysis of Present-day American English. Providence: Brown University Press.
    9.Kupiec, Julian (1993) An Algorithm for finding noun phrase correspondence in bilingual corpus, In ACL 31, 23/2, pp. 17-22
    10.Longman Group 1992 Longman English-Chinese Dictionary of Contemporary English, Published by Longman Group (Far East) Ltd., Hong Kong
    11.Miller G. (1990) Five papers on WordNet, International Journal of Lexicography 3(4)
    12.Vossen, P., Diez-Orzas, P., and Peters, W. (1997) The Multilingual Design of the EuroWordNet Database. In Processing of the IJCAI-97 workshop Multilingual Ontologies for NLP Applications, August 23, 1997, Nagoya

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE