研究生: |
粘子奕 Nien, Tzu-yi |
---|---|
論文名稱: |
透過階層式翻譯分類擴充雙語WordNet Extending Bilingual WordNet via Hierarchical Word Translation Classification |
指導教授: |
張俊盛
Chang, Jason S. 張智星 Jang, Jyh-Shing Roger |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 英文 |
論文頁數: | 55 |
中文關鍵詞: | 翻譯詞意選擇 、字詞歧異辨識 、雙語WordNet 、最大熵值模型 |
外文關鍵詞: | word translation classification, word sense disambiguation, bilingual WordNet, maximum entropy model |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文描述一自動分類方法,為雙語資源(例如雙語辭典)中的詞彙翻譯配對選擇適當的詞義,進而擴充現有雙語WordNet的詞彙涵蓋範圍。此方法對給定之詞彙與翻譯,自動由廣義而狹義尋訪WordNet中的下位詞階層(hyponym hierarchy),透過逐步選擇適當的下位詞類別以減低其詞意歧異度。我們為每個可能出現詞義分歧的下位詞階層節點建構對應的分類模型;我們使用現有的雙語WordNet進行訓練,使各模型學習其下位詞詞彙翻譯的共同特徵,使得在執行階段,分類器可以透過特徵比對,選擇較為適切的下位詞節點。此外,我們也建構一個分類篩選模型,用以濾除較為不可能的詞義,提高系統的速度與精確度。實驗結果顯示,此系統能夠有效的為給定詞彙翻譯選擇正確的WordNet詞義。此分類結果將可當作系統的訓練資料,重新訓練分類模型,亦或將其與機器翻譯系統結合,使得機器翻譯系統能夠更精確的根據語意產生翻譯。
We introduce a method for leaning to assign word senses to bilingual translation pairs. In our approach, this problem is transformed into a problem on how to navigate through a sense network (e.g., WordNet) aimed at relating the features of translations to the sense nodes in the network. The method involves automatically constructing classification models for each branched nodes in the sense network and learning to reject less probable sense categories for the translations based on the translation characteristics of semantically related word groups (e.g., words in a lexical category). At run-time, given translations are expanded with their synonyms and the sense ambiguity is resolved according to the trained classification models. Evaluation shows that the method significantly outperforms the strong baseline of assigning most frequent sense to the translation pairs. Our method effectively determines adequate word senses for given word-translation pairs, suggesting the possibility of using our methods as computer-assisted tool for lexicography or of using our method to assist machine translation systems in word selection.
References
Agirre, E., and Rigau, G. (1996). Word Sense Disambiguation using Conceptual Density. 16th Conference on Computational Linguistics, (pp. 16-22). Copenhagen.
Banerjee, S., and Pedersen, T. (2002). An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. the Third International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City.
Black, E. W. (1988). An Experiment in Computational Discrimination of English Word Senses. IBM Journal of Research and Development , 185-194.
Bruce, R., and Wiebe, J. (1994). Word-Sense Disambiguation Using Decomposable Models. 32nd Annual Meeting of the Association for Computational Linguistics (pp. 139-146). Las Cruces: Association for Computational Linguistics.
Carpaut, M., and Wu, D. (2007). Improving Statistical Machine Translation using Word Sense Disambiguation. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 61-72). Prague: Association for Computational Linguistics.
Chan, Y. S., Ng, H. T., and Chiang, D. (2007). Word Sense Disambiguation Improves Statistical Machine Translation. the Association for Computational Linguistics (ACL), (pp. 33-40).
Chang, J. S., Lin, T., You, G.-N., Chuang, T. C., and Hsieh, C.-T. (2003). Building a Chinese WordNet via Class-based Translation Model. Computational Linguistics and Chinese Language Processing , 61-76.
Diab, M., and Resnik, P. (2002). An Unsupervised Method for Word Sense Tagging using Parallel Corpora. the 40th Annual Meeting of the Association for Computational Linguistics (ACL), (pp. 255-262). Philadelphia.
Gale, W. A., Church, K. W., and Yarowsky, D. (1992). Using Bilingual Materials to Develop Word Sense Disambiguation Methods. the International Conference on Theoretical and Methodological Issues in Machine Translation, (pp. 101-112).
Galley, M., and McKeown, K. (2003). ImprovingWord Sense Disambiguation in Lexical Chaining. 18th International Joint Conference on Artificial Intelligence (IJCAI 2003). Acapulco.
Hamp, B., and Feldweg, H. (1997). GermaNet - a Lexical-Semantic Net for German. ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, (pp. 9-15). Madrid.
Hearst, M. A. (1991). Noun Homograph Disambiguation using Local Context in Large Corpora. 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research, (pp. 1-15).
Hsieh, C.-T. (2000). Semi-Automatic Construction of Chinese WordNet - Using Class-based Translation Model.
Huang, C.-C., Tseng, C.-H., Kao, K. H., and Chang, J. S. (2008). A Thesaurus-based Semantic Classification of English Collocations. ROCLING 2008, (pp. 38-52). Taipei.
Huang, C.-R., Chang, R.-Y., and Lee, H.-P. (2004). Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO. 4th International Conference on Language Resources and Evaluation (LREC2004), (pp. 1553-1556). Lisbon.
Leacock, C., Towell, G., and Voorhees, E. (1993). Corpus-based Statistical Sense Resolution. ARPA Human Language Technology Workshop, (pp. 260-265).
Lesk, M. (1986). Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. 5th Annual International Conference on Systems Documentation (pp. 24-26). Toronto: Association for Computing Machinery.
Longman Group. (1992). Longman English-Chinese Dictionary of Contemporary English. Hong Kong: Longman Group (Far East) Ltd.
Mihalcea, R., and Moldovan, D. I. (1999). A Method for Word Sense Disambiguation of Unrestricted Text. the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 152-158). College Park: Association for Computational Linguistics.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. J. (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography , pp. 235-244.
Pasca, M., and Harabagiu, S. M. (2001). The Informative Role of WordNet in Open-Domain Question Answering. NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions, and Customizations, (pp. 138-143). Pittsburgh.
Towell, G., and Voorhees, E. M. (1998). Disambiguating Highly Ambiguous Words. Computational Linguistics , 125-145.
Voorhees, E. M., and Tice, D. M. (1999). The TREC-8 Question Answering Track Evaluation. TREC-8, (pp. 84-106).
Vossen, P. (1998). Introduction to EuroWordNet. Computers and the Humanities , 73-89.
Wible, D., and Kuo, C.-H. (2001). A Syntax-Lexical Semantics Interface Analysis of Collocation Errors. Pacific Second Language Research Forum.