簡易檢索 / 詳目顯示

研究生: 郭芝瑜
Kuo, Chih-Yu
論文名稱: 支援閱讀學習之詞彙語意解歧
Resolving Word Sense Ambiguity for Assistive Reading
指導教授: 張俊盛
Chang, Jason S.
口試委員: 陳浩然
高宏宇
吳鑑城
白明弘
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 35
中文關鍵詞: 詞義解歧
外文關鍵詞: Word Sense Disambiguation
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文介紹一個詞彙語意解歧系統,意旨在透過分析輸入文章中的歧異字並回傳對應的字典釋義,來輔助語言學習者進行閱讀。我們利用預訓練語言模型(Pre-trained Language Model),將句子中的目標字轉換成含有上下文資訊的向量。我們的方法包含將字典中例句轉換為已標記的訓練資料,以及充分利用這些少量訓練資料,自動的標記新例句,來產生更多的訓練資料。執行時,輸入與訓練資料的向量會被互相比較,以選出最相關的釋義。實驗結果顯示,我們的方法相較於基準(Baseline),解歧的正確率有顯著的提升。


    We introduce a WSD system that assists reading of language learners by dealing with ambiguous words in articles, and returning dictionary senses. In our approach, the target words in sentences are transformed into context embeddings with pre-trained language model. The method involves transforming a small set of examples from dictionary into training data, and automatically expanding training data with unannotated sentences based on existing training data. At run-time, the input sentences are transformed to embeddings and compared to the training data to determine the sense. Evaluation on a set of test articles shows that the results of the proposed method significantly outperforms the baseline.

    Abstract i 摘要 ii 致謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 5 3 Methodology 9 3.1 Problem Statement........................... 9 3.2 Constructinga Classification Model.................. 10 3.2.1 Converting Dictionary Examples into Training Data . . . . . 11 3.2.2 Adding Examples with Relevant Translation . . . . . . . . . 11 3.2.3 Adding Wikipedia Examples Relevant to Definition . . . . . 13 3.2.4 Transforming Examples into Embedding Clusters . . . . . . 14 3.2.5 Expanding TrainingData ................... 16 3.3 Runtime Disambiguation........................ 16 4 Experiments 19 4.1 SenseInventoryandTrainingData .................. 20 4.2 ModelsCompared............................ 21 4.3 TestData ................................ 22 4.4 EvaluationMetrics ........................... 23 5 Evaluation Results 25 6 Conclusion and Future Work 29 Reference 31

    1. Eneko Agirre and O. Lopez. Clustering wordnet word senses. Proceedings of the Conference on Recent Advances on Natural Language Processing (RANLP’03), pages 121–130, 01 2003.
    2. Cambridge University Press. Bank, n.d.a. URL https://dictionary. cambridge.org/dictionary/english-chinese-traditional/bank. [Online; accessed 30-June-2021].
    3. Cambridge University Press. Palm, n.d.b. URL https://dictionary. cambridge.org/dictionary/english-chinese-traditional/palm. [Online; accessed 30-June-2021].
    4. Cambridge University Press. President, n.d.c. URL https://dictionary. cambridge.org/dictionary/english-chinese-traditional/president. [Online; accessed 30-June-2021].
    5. Cambridge University Press. School, n.d.d. URL https://dictionary. cambridge.org/dictionary/english-chinese-traditional/school. [On-line; accessed 30-June-2021].
    6. Jen Nan Chen and Jason S. Chang. Topical clustering of MRD senses based on information retrieval techniques. Computational Linguistics, 24(1):61–95, 1998. URL https://www.aclweb.org/anthology/J98-1003.
    7. Philip Edmonds and Adam Kilgarriff. Introduction to the special issue on evalu-ating word sense disambiguation systems. Natural Language Engineering, 8:279 – 291, 12 2002. doi: 10.1017/S1351324902002966.
    8. Christiane Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998.
    9. Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. Embeddings for word sense disambiguation: An evaluation study. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 897–907, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-1085. URL https://www. aclweb.org/anthology/P16-1085.
    10. Sawan Kumar, Sharmistha Jat, Karan Saxena, and Partha Talukdar. Zero-shot word sense disambiguation using sense definition embeddings. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5670–5681, Florence, Italy, July 2019. Association for Computational Linguis-tics. doi: 10.18653/v1/P19-1568. URL https://www.aclweb.org/anthology/ P19-1568.
    11. Michael Lesk. Automatic sense disambiguation using machine readable dic-tionaries: How to tell a pine cone from an ice cream cone. In Proceed-ings of the 5th Annual International Conference on Systems Documentation, SIGDOC ’86, page 24–26, New York, NY, USA, 1986. Association for Com-puting Machinery. ISBN 0897912241. doi: 10.1145/318723.318728. URL https://doi.org/10.1145/318723.318728.
    12. Cong Li and Hang Li. Word translation disambiguation using bilingual bootstrap-ping. In Proceedings of the 40th Annual Meeting of the Association for Computa-tional Linguistics, pages 343–351, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073141. URL https://www.aclweb.org/anthology/P02-1044.
    13. Rada Mihalcea and Andras Csomai. Wikify! linking documents to encyclopedic knowledge. Proceedings of the sixteenth ACM Conference on Information and Knowledge Management, pages 233–, 01 2007. doi: 10.1145/1321440.1321475.
    14. Andrea Moro and Roberto Navigli. SemEval-2015 task 13: Multilingual all-words sense disambiguation and entity linking. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 288–297, Denver, Colorado, June 2015. Association for Computational Linguistics. doi: 10.18653/ v1/S15-2049. URL https://www.aclweb.org/anthology/S15-2049.
    15. Roberto Navigli. Word sense disambiguation: A survey. ACM Comput. Surv., 41(2), February 2009. ISSN 0360-0300. doi: 10.1145/1459352.1459355. URL https://doi.org/10.1145/1459352.1459355.
    16. Roberto Navigli, Kenneth Litkowski, Orin Hargraves, and Lexicographer Orin-hargraves. Semeval-2007 task 07: Coarse-grained english all-words task. pages 30–35, 01 2007.
    17. Simone Paolo Ponzetto and Roberto Navigli. Knowledge-rich word sense dis-ambiguation rivaling supervised systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1522–1531, Uppsala, Sweden, July 2010. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/P10-1154.
    18. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement De-langue, Anthony Moi, Pierric Cistac, Tim Rault, R ́emi Louf, Morgan Fun-towicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jer-nite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art nat-ural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38– 45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
    19. David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In 33rd Annual Meeting of the Association for Computational Lin-guistics, pages 189–196, Cambridge, Massachusetts, USA, June 1995. Asso-ciation for Computational Linguistics. doi: 10.3115/981658.981684. URL https://www.aclweb.org/anthology/P95-1026.
    20. Dayu Yuan, Julian Richardson, Ryan Doherty, Colin Evans, and Eric Altendorf. Semi-supervised word sense disambiguation with neural models. In Proceedings of COLING 2016, the 26th International Conference on Computational Lin-guistics: Technical Papers, pages 1374–1385, Osaka, Japan, December 2016. The COLING 2016 Organizing Committee. URL https://www.aclweb.org/ anthology/C16-1130.
    21. Zhi Zhong and Hwee Tou Ng. It makes sense: A wide-coverage word sense disam-biguation system for free text. In Proceedings of the ACL 2010 System Demon-strations, pages 78–83, Uppsala, Sweden, July 2010. Association for Computa-tional Linguistics. URL https://www.aclweb.org/anthology/P10-4014.

    QR CODE