簡易檢索 / 詳目顯示

研究生: 廖彥盛
Liao, Yan-Sheng
論文名稱: 利用知識本體分類和搭配詞資訊自動翻譯WordNet
Automatically Translating WordNet Based on Ontology and Collocation Information
指導教授: 張俊盛
Chang, Jason S.
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 67
中文關鍵詞: WordNet本體關係多語系WordNet搭配詞詞義
外文關鍵詞: WordNet, ontology relation, multilingual WordNet, collocation sense
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • WordNet is a well-known English lexical database widely used in language-related research. However, it is time-consuming and labor-insensitive to manually compile such a dictionary for the languages of the world. In this paper, we propose an automatic method of translating WordNet into languages other than English. In our approach, we utilize the ontological relations in WordNet and relevant collocations to obtain appropriate translations for a given WordNet synset. The method involves automatically identifying monosemous lemmas of the synset, automatically acquiring relevant collocations, automatically translating relevant words and phrases, and automatically extracting translations of the synset. We evaluate the proposed method on a set of WordNet synsets based on the human judgment. The experimental results show that the proposed method significantly outperforms the baseline of the most frequent translations.


    WordNet是一部廣為人知的英文語意字典,並且廣泛應用於自然語言處理相關研究。然而,使用人工的方式編撰其他不同語言的WordNet,需要耗費大量的時間和人力。於本論文,我們提出一個自動的方法,翻譯WordNet到英文以外的其他語言。此一方法,對於一個被給定的WordNet同義詞集(synset) 進行翻譯,我們使用WordNet蘊含的本體關係(ontological relation)與同義詞集中相關搭配詞資訊。本方法包涵了自動辨識同義詞集中,單詞義的詞形(monosemous lemmas) ,自動取得相關的搭配詞,自動翻譯相關的字詞,最後自動抽取同義詞集的適當翻譯。我們隨機選取WordNet的一組同義詞集進行實驗並進行人工驗證,實驗結果顯示,相較於直接將同義詞集之最常見翻譯作為結果的基準方法,我們提出的方法更能準確地產生正確翻譯。

    摘要 i Abstract ii Acknowledgement iii Table of Contents iv List of Figures v List of Tables vi CHAPTER 1 Introduction 1 CHAPTER 2 Related Work 4 CHAPTER 3 Method 8 3.1 Problem Statement 9 3.2 Representative Lemmas and Collocations 9 CHAPTER 4 Experimental Settings 20 4.1 Data Set 20 4.2 Method Compared 24 4.3 Evaluation Metrics 25 4.4 Tuning Parameters 27 CHAPTER 5 Evaluation Results and Discussion 29 5.1 Experimental Results 29 5.2 Error Analysis 31 CHAPTER 6 Conclusions and Future Work 35 Reference 37 Appendix A - Evaluation Data 41 Appendix B - Sample Output 58

    Alessandro Artale, Bernado Magnini, and Carlo Strapparava. 1997. Lexical discrimination with the Italian version of WordNet. In Proceedings of ACL Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources, Madrid, Spain.
    Jordi Atserias, Salvador Climent, Xavier Farreres, German Rigau and Horacio Rodriguez. 1997. Combining multiple methods for the automatic construction of multilingual WordNets. In Proceedings of the International Conference on Recent Advances in Natural Language Processing.
    Francis Bond, Hitoshi Isahara, Kyoko Kanzaki, and Kiyotaka Uchimoto. 2008. Bootstrapping a WordNet using multiple existing WordNets. In Proceedings of the 6th International Language Resources and Evaluation (LREC2008).
    Yunbo Cao and Hang Li. 2000. Base noun phrase translation using Web data and the EM algorithm. In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), pages 1-7.
    Hsin-Hsi Chen, Chi-Ching Lin, and Wen-Cheng Lin. 2000. Construction of a Chinese-English WordNet and Its Application to CLIR. In Proceedings of 5th International Workshop on Information Retrieval with Asian Languages, pages 189-196, Hong Kong.
    Key-Sun Choi and Hee-Sook Bae. 2003. A Korean-Japanese-Chinese aligned wordnet with shared semantic hierarchy. In Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering, pages 91–96.
    Darren Cook. 2008. MLSN: A multi-lingual semantic network. In 14th Annual Meeting of the Association for Natural Language Processing, pages 1136-1139, Tokyo.
    Xavier Farreres, German Rigau, and Horacio Rodffguez. 1998. Using WordNet for building WordNets. In Proceedings of the ACL Workshop on the Usage of WordNet in Natural Language Processing Systems, pages 65-72.

    Christiane Fellbaum and Piek Vossen. 2007. Connecting the universal to the specific: Towards the global grid. In First International Workshop on Intercultural Collaboration (IWIC-2007), pages 2-16, Kyoto.
    Birgit Hamp and Helmut Feldweg. 1997. GermaNet – a lexical-semantic net for German. In Proceedings of the Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications at the 35th ACL and the 8th EACL, pages 9-15, Madrid, Spain
    Chu-Ren Huang, Elanna IJ Tseng, Dylan BS Tsai, and Brian Murphy. 2003. Cross-lingual Portability of Semantic Relations: Bootstrapping Chinese WordNet with English WordNet Relations. Language and Linguistics. 4 (3), pages 509-532.
    Chu-Ren Huang, Ru-Yng Chang, and Shiang-Bin Lee. 2004. Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO. In Proceedings of LREC2004, pages 1553-1556, Lisbon, Portugal.
    Hitoshi Isahara, Francis Bond, Kiyotaka Uchimoto, Masao Utiyama, and Kyoko Kanzaki. 2008. Development of the Japanese WordNet. In Sixth international conference on Language Resources and Evaluation (LREC 2008), Marrakech.
    Jay J. Jiang and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, pages 19-33, Taiwan.
    Hiroyuki Kaji and Mariko Watanabe. 2006. Automatic construction of Japanese WordNet. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy,
    Philipp Koehn and Kevin Knight. 2003. Feature-Rich Statistical Translation of Noun Phrases. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pages 311-318.
    Claudia Leacock, George. A. Miller, and Martin Chodorow. 1998. Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, volume 24, pages 147-165.

    Changki Lee, Geunbae Lee, and Seo Jung Yun. 2000. Automatic WordNet Mapping using Word Sense Disambiguation. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 2000), pages 142-147.
    Dekang Lin, Shaojun Zhao, Benjamin Van Durme, and Marius Pasca. 2008. Mining Parenthetical Translations from the Web by Word Alignment. In Proceedings of ACL-08:HLT, pages 994-1002, Columbus, Ohio, USA.
    Edward Loper and Steven Bird. 2002. NLTK: the natural language toolkit. In Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pages 62-69.
    Gerard de Melo and Gerhard Weikum. 2009. Towards a Universal Wordnet by Learning from Combined Evidence. In Proceedings of the 18th ACM conference on on Information and knowledge management (CIKM 2009), pages 513-522
    George A. Miller, editor. 1990. WordNet: An on-line lexical database. Special issue of International Journal of Lexicography, 3(4).
    Rada Mihalcea and Dan I. Moldovan. 1999. A Method for Word Sense Disambiguation of Unrestricted Text. In the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, pages 152-158, College Park: Association for Computational Linguistics.
    Marius Pasca and Sanda M. Harabagiu. 2001. The Informative Role of WordNet in Open-Domain Question Answering. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations. (NAACL-01, Pittsburgh, PA), pages 138–143, Pittsburgh.
    Li Shao and Hwee Tou Ng. 2004. Mining new word translations from comparable corpora. In Proceedings of COLING 2004, pages 618–624.
    Ravi Sinha and Rada Mihalcea. 2007. Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity. In Proceedings of the IEEE International Conference on Semantic Computing (ICSC 2007), pages 363-369, Irvine, CA.
    Geoffrey Towell and Ellen M. Voorhees. 1998. Disambiguating Highly Ambiguous Words. Computational Linguistics, 24(1), pages 125-145.
    Piek Vossen. 1998. Introduction to EuroWordNet. In Nancy Ide, Daniel Greenstein, and Piek Vossen, editors, Special Issue on EuroWordNet, Computers and the Humanities, 32(2-3), pages 73-89.
    David Yarowsky. 1993. One Sense Per Collocation. In Proceedings of the ARPA Workshop on Human Language Technology, pages 266-271, Princeton
    Jian-Cheng Wu, Tracy Lin, and Jason S. Chang. 2005. Learning source-target surface patterns for web-based terminology translation. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 37-40.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE