簡易檢索 / 詳目顯示

研究生: 吳岱儒
Wu, Tai-Ju
論文名稱: def2topic:學習辭典字義定義的主題分類
def2topic:Learning to Classify Word Sense Definitions into Topics
指導教授: 張俊盛
Chang, Jason S.
口試委員: 顏安孜
蔡宗翰
劉奕汶
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 29
中文關鍵詞: Word sense disambiguationTopical classification
外文關鍵詞: 詞義解歧, 主題分類
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了一種給定辭典定義的主題標籤的方法,而所有主題標籤都是從同義詞詞典中提取的。 在我們的方法中,字義被轉換為向量,再通過機器學習和深度學習模型從同義詞詞典中選擇合適的主題標籤。 該方法包括自動提取特徵以將不同定義轉換為向量,自動給定相關字組成員的字義以生成訓練數據,以及自動學習如何為每個定義對主題進行分類。 在執行時,輸入定義被轉換為詞嵌入,並使用 DL 技術進行相關程度的排序。 我們提出了一個原型系統 def2topic,該系統將該方法應用於劍橋英漢詞典。評估表明,所提出系統的結果明顯優於基線系統(baseline)。


    We introduce a method for learning to determine multiple topic categories for a given sense definition, where topic categories are extracted from the synonym thesaurus.
    In our approach, sense definitions are transformed into vectors, aimed at providing similarity measure to disambiguate synonyms in a given thesaurus in order to generate training data.
    The method involves automatically extracting features for converting different definitions into vectors, automatically determining the intended senses of members of a group of related words to generate training data, and automatically learning to classify definitions into topics.
    At run-time, input definitions are transformed into embeddings.
    For classification into categories, we present a prototype system, def2topic, that applies the method on Cambridge English-Chinese Dictionary. Evaluation on two sets of sense definitions shows that the system significantly outperforms the baseline.

    Abstract i 摘要 ii 致謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 5 3 Methodology 9 3.1 Problem Statement........................... 9 3.2 Learning to Classify Dictionary Definitions.................. 10 3.3 Run-time definition classification . . . . . . . . . 14 4 Experiment 15 4.1 Training def2topic .................. 15 4.2 Systems Compared............................ 18 4.3 Evaluation ................................ 19 5 Evaluation Results 21 5.1 Results from the Monosemous Evaluation ................................ 21 5.2 Results from the Polysemous Evaluation ................................ 22 6 Conclusion and Future Work 25 Reference 27

    Eneko Agirre, Oier Lopez de Lacalle, and Aitor Soroa. Random walks for knowledge-based word sense disambiguation. Computational Linguistics, 40: 57–84, 03 2014. doi: 10.1162/COLI a 00164.
    Michele Bevilacqua and Roberto Navigli. Breaking through the 80% glass ceil- ing: Raising the state of the art in word sense disambiguation by incorporating knowledge graph information. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2854–2864, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main. 255. URL https://aclanthology.org/2020.acl-main.255.
    Jen Nan Chen and Jason S. Chang. Topical clustering of MRD senses based on information retrieval techniques. Computational Linguistics, 24(1):61–95, 1998. URL https://aclanthology.org/J98-1003.
    Luyao Huang, Chi Sun, Xipeng Qiu, and Xuanjing Huang. GlossBERT: BERT for word sense disambiguation with gloss knowledge. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP), pages 3507–3512, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1355. URL https: //www.aclweb.org/anthology/D19-1355.
    Sosuke Kobayashi. Contextual augmentation: Data augmentation by words with paradigmatic relations. pages 452–457, 01 2018. doi: 10.18653/v1/N18-2072.
    Michael Lesk. Automatic sense disambiguation using machine readable dic- tionaries: How to tell a pine cone from an ice cream cone. In Proceed- ings of the 5th Annual International Conference on Systems Documentation, SIGDOC ’86, page 24–26, New York, NY, USA, 1986. Association for Com- puting Machinery. ISBN 0897912241. doi: 10.1145/318723.318728. URL https://doi.org/10.1145/318723.318728.
    Roberto Navigli. Meaningful clustering of senses helps boost word sense disam- biguation performance. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 105–112, Sydney, Australia, July 2006. As- sociation for Computational Linguistics. doi: 10.3115/1220175.1220189. URL https://aclanthology.org/P06-1014.
    Alessandro Raganato, Jose Camacho-Collados, and Roberto Navigli. Word sense disambiguation: A unified evaluation framework and empirical comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 99–110, Valencia, Spain, April 2017. Association for Computational Linguistics. URL https: //aclanthology.org/E17-1010.
    Ralf C. Staudemeyer and Eric Rothstein Morris. Understanding lstm – a tutorial into long short-term memory recurrent neural networks, 2019.
    Lo ̈ıc Vial, Benjamin Lecouteux, and Didier Schwab. Sense vocabulary compres- sion through the semantic knowledge of WordNet for neural word sense dis- ambiguation. In Proceedings of the 10th Global Wordnet Conference, pages 108–117, Wroclaw, Poland, July 2019. Global Wordnet Association. URL https://aclanthology.org/2019.gwc-1.14.
    David Yarowsky. Word-sense disambiguation using statistical models of roget’s categories trained on large corpora. In COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics, 1992.
    David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In 33rd Annual Meeting of the Association for Computational Lin- guistics, pages 189–196, Cambridge, Massachusetts, USA, June 1995. Asso- ciation for Computational Linguistics. doi: 10.3115/981658.981684. URL https://aclanthology.org/P95-1026.
    Go ̈zde I ̇ ̧sgu ̈der and Mark Steedman. Data augmentation via dependency tree morphing for low-resource languages. pages 5004–5009, 01 2018. doi: 10.18653/ v1/D18-1545.

    QR CODE