透過階層式翻譯分類擴充雙語WordNet｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	粘子奕 Nien, Tzu-yi
論文名稱：	透過階層式翻譯分類擴充雙語WordNet Extending Bilingual WordNet via Hierarchical Word Translation Classification
指導教授：	張俊盛 Chang, Jason S. 張智星 Jang, Jyh-Shing Roger
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	55
中文關鍵詞：	翻譯詞意選擇、字詞歧異辨識、雙語WordNet 、最大熵值模型
外文關鍵詞：	word translation classification, word sense disambiguation, bilingual WordNet, maximum entropy model
相關次數：	點閱：77 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文描述一自動分類方法，為雙語資源（例如雙語辭典）中的詞彙翻譯配對選擇適當的詞義，進而擴充現有雙語WordNet的詞彙涵蓋範圍。此方法對給定之詞彙與翻譯，自動由廣義而狹義尋訪WordNet中的下位詞階層（hyponym hierarchy），透過逐步選擇適當的下位詞類別以減低其詞意歧異度。我們為每個可能出現詞義分歧的下位詞階層節點建構對應的分類模型；我們使用現有的雙語WordNet進行訓練，使各模型學習其下位詞詞彙翻譯的共同特徵，使得在執行階段，分類器可以透過特徵比對，選擇較為適切的下位詞節點。此外，我們也建構一個分類篩選模型，用以濾除較為不可能的詞義，提高系統的速度與精確度。實驗結果顯示，此系統能夠有效的為給定詞彙翻譯選擇正確的WordNet詞義。此分類結果將可當作系統的訓練資料，重新訓練分類模型，亦或將其與機器翻譯系統結合，使得機器翻譯系統能夠更精確的根據語意產生翻譯。

We introduce a method for leaning to assign word senses to bilingual translation pairs. In our approach, this problem is transformed into a problem on how to navigate through a sense network (e.g., WordNet) aimed at relating the features of translations to the sense nodes in the network. The method involves automatically constructing classification models for each branched nodes in the sense network and learning to reject less probable sense categories for the translations based on the translation characteristics of semantically related word groups (e.g., words in a lexical category). At run-time, given translations are expanded with their synonyms and the sense ambiguity is resolved according to the trained classification models. Evaluation shows that the method significantly outperforms the strong baseline of assigning most frequent sense to the translation pairs. Our method effectively determines adequate word senses for given word-translation pairs, suggesting the possibility of using our methods as computer-assisted tool for lexicography or of using our method to assist machine translation systems in word selection.

摘要    i
Abstract    ii
Acknowledgement    iii
Table of Contents    iv
List of Figures    v
List of Tables    vi
CHAPTER 1 Introduction    1
CHAPTER 2 Related Work    5
CHAPTER 3 Hierarchical Word Translation Classification    9
3.1 Problem Statement    9
3.2 Learning to Classify Translations    10
3.2.1 Propagating Translations    11
3.2.2 Training Hierarchical Word Translation Classification Models    15
3.2.3 Training Filtering Model    18
3.3 Run-Time Translation Classification    21
CHAPTER 4 Experimental Setting    24
4.1 Data Set    24
4.2 Methods Compared    27
4.3 Evaluation Metrics    29
4.4 Tuning Parameters    30
CHAPTER 5 Evaluation Results and Discussion    34
5.1 Experimental Results    34
5.2 Error Analysis    37
CHAPTER 6 Future Work and Summary    41
References    43
Appendix A － WordNet Lexicographer File Names    46
Appendix B － Evaluation Data    47
Appendix C － Expandable Translation Pairs in Dev. Set    55

                                

References
Agirre, E., and Rigau, G. (1996). Word Sense Disambiguation using Conceptual Density. 16th Conference on Computational Linguistics, (pp. 16-22). Copenhagen.
Banerjee, S., and Pedersen, T. (2002). An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. the Third International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City.
Black, E. W. (1988). An Experiment in Computational Discrimination of English Word Senses. IBM Journal of Research and Development , 185-194.
Bruce, R., and Wiebe, J. (1994). Word-Sense Disambiguation Using Decomposable Models. 32nd Annual Meeting of the Association for Computational Linguistics (pp. 139-146). Las Cruces: Association for Computational Linguistics.
Carpaut, M., and Wu, D. (2007). Improving Statistical Machine Translation using Word Sense Disambiguation. 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 61-72). Prague: Association for Computational Linguistics.
Chan, Y. S., Ng, H. T., and Chiang, D. (2007). Word Sense Disambiguation Improves Statistical Machine Translation. the Association for Computational Linguistics (ACL), (pp. 33-40).
Chang, J. S., Lin, T., You, G.-N., Chuang, T. C., and Hsieh, C.-T. (2003). Building a Chinese WordNet via Class-based Translation Model. Computational Linguistics and Chinese Language Processing , 61-76.
Diab, M., and Resnik, P. (2002). An Unsupervised Method for Word Sense Tagging using Parallel Corpora. the 40th Annual Meeting of the Association for Computational Linguistics (ACL), (pp. 255-262). Philadelphia.
Gale, W. A., Church, K. W., and Yarowsky, D. (1992). Using Bilingual Materials to Develop Word Sense Disambiguation Methods. the International Conference on Theoretical and Methodological Issues in Machine Translation, (pp. 101-112).
Galley, M., and McKeown, K. (2003). ImprovingWord Sense Disambiguation in Lexical Chaining. 18th International Joint Conference on Artificial Intelligence (IJCAI 2003). Acapulco.
Hamp, B., and Feldweg, H. (1997). GermaNet - a Lexical-Semantic Net for German. ACL workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, (pp. 9-15). Madrid.
Hearst, M. A. (1991). Noun Homograph Disambiguation using Local Context in Large Corpora. 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research, (pp. 1-15).
Hsieh, C.-T. (2000). Semi-Automatic Construction of Chinese WordNet - Using Class-based Translation Model.
Huang, C.-C., Tseng, C.-H., Kao, K. H., and Chang, J. S. (2008). A Thesaurus-based Semantic Classification of English Collocations. ROCLING 2008, (pp. 38-52). Taipei.
Huang, C.-R., Chang, R.-Y., and Lee, H.-P. (2004). Sinica BOW (Bilingual Ontological Wordnet): Integration of Bilingual WordNet and SUMO. 4th International Conference on Language Resources and Evaluation (LREC2004), (pp. 1553-1556). Lisbon.
Leacock, C., Towell, G., and Voorhees, E. (1993). Corpus-based Statistical Sense Resolution. ARPA Human Language Technology Workshop, (pp. 260-265).
Lesk, M. (1986). Automatic Sense Disambiguation using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone. 5th Annual International Conference on Systems Documentation (pp. 24-26). Toronto: Association for Computing Machinery.
Longman Group. (1992). Longman English-Chinese Dictionary of Contemporary English. Hong Kong: Longman Group (Far East) Ltd.
Mihalcea, R., and Moldovan, D. I. (1999). A Method for Word Sense Disambiguation of Unrestricted Text. the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 152-158). College Park: Association for Computational Linguistics.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K. J. (1990). Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography , pp. 235-244.
Pasca, M., and Harabagiu, S. M. (2001). The Informative Role of WordNet in Open-Domain Question Answering. NAACL 2001 Workshop on WordNet and Other Lexical Resources: Applications, Extensions, and Customizations, (pp. 138-143). Pittsburgh.
Towell, G., and Voorhees, E. M. (1998). Disambiguating Highly Ambiguous Words. Computational Linguistics , 125-145.
Voorhees, E. M., and Tice, D. M. (1999). The TREC-8 Question Answering Track Evaluation. TREC-8, (pp. 84-106).
Vossen, P. (1998). Introduction to EuroWordNet. Computers and the Humanities , 73-89.
Wible, D., and Kuo, C.-H. (2001). A Syntax-Lexical Semantics Interface Analysis of Collocation Errors. Pacific Second Language Research Forum.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文