從詞典的定義中學習詞義嵌入向量｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	李冠霖 Lee, Kuan-Lin
論文名稱：	從詞典的定義中學習詞義嵌入向量 Learning Sense Embeddings from Definitions in Dictionaries
指導教授：	張俊盛 Chang, Jason S.
口試委員:	高宏宇 Kao, Hung-Yu 顏安孜 Yen, An-Zi 劉奕汶 Liu, Yi-Wen 蘇宜青 Su, Yi-Ching
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	32
中文關鍵詞：	詞義嵌入向量、結合詞典、反向詞典
外文關鍵詞：	Sense Embeddings, Combining Dictionaries, Reverse Dictionary
相關次數：	點閱：65 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出一個學習詞義嵌入的方法，可以將多部詞典定義的詞義映射為向量。我們採取應用深度學習(Deep Learning)的研究路線——透過自編碼器 (autoencoder) 將詞義的定義映射為向量，並且最大化從向量還原定義的機率，來得到能夠反映詞義定義的向量。此方法涉及訓練自編碼器，自動對齊多部字典的詞義定義，及自動將任意的描述映射到詞義嵌入空間。實驗結果顯示，我們的方法與基準 (baseline) 相較之下，獲得較佳的結果。

We introduce a method for learning to embed word senses as defined in a given set of given dictionaries. In our approach, sense definition pairs, <word, definition> are transformed into low-dimension vectors aimed at maximizing the probability of reconstructing the definitions in an autoencoding setting. The method involves automatically training sense autoencoder for encoding sense definitions, automat- ically aligning sense definitions, and automatically generating embeddings of arbi- trary description. At run-time, queries from users are mapped to the embedding space and re-ranking is performed on the sense definition retrieved. We present a prototype sense definition embedding, SenseNet, that applies the method to two dictionaries. Blind evaluation on a set of real queries shows that the method sig- nificantly outperforms a baseline based on the Lesk algorithm. Our methodology clearly supports combining multiple dictionaries resulting in additional improve- ment in representing sense definitions in dictionaries.

Abstract i
摘要 ii
致謝 iii
Contents iv
List of Figures vi
List of Tables vii
Introduction 1
Related Work 5
Methodology 9
1 ProblemStatement........................... 9
2 Learning to Transform Sense Definitions into Vectors . . . . . . . . 11
2.1 GatheringSensesfromDictionaries . . . . . . . . . . . . . . 11 3.2.2 TrainingSenseAutoencoder .................. 11
2.3 AligningSenseDefinitions ................... 14
2.4 GeneratingSenseEmbeddings................. 15
3 Run-TimeSenseEmbeddings ..................... 16
Experimental Setting 18
1 TrainingSenseNet ........................... 18
2 SystemsCompared ........................... 21
3 EvaluationMetrics ........................... 22
Results and Discussion 24
1 ResultsfromtheAlignmentEvaluation................ 24
2 Results from the Reverse Dictionary Evaluation . . . . . . . . . . . 25
Conclusion and Future Work 27
References 29
                                

1. Steven Bird, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009.
2. Tom Bosc and Pascal Vincent. Auto-encoding dictionary definitions into consistent word embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1522–1532, 2018.
3. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre- training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
4. Katrin Erk and Sebastian Padó. A structured vector space model for word meaning in context. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 897–906, 2008.
5. Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nel- son F. Liu, Matthew Peters, Michael Schmitz, and Luke S. Zettlemoyer. Al- lennlp: A deep semantic natural language processing platform. 2017.
6. Michael A Hedderich, Andrew Yates, Dietrich Klakow, and Gerard De Melo. Using multi-sense vector embeddings for reverse dictionaries. arXiv preprint arXiv:1904.01451, 2019.
7. Felix Hill, KyungHyun Cho, Anna Korhonen, and Yoshua Bengio. Learning to un- derstand phrases by embedding the dictionary. Transactions of the Association for Computational Linguistics, 4:17–30, 2016.
8. Matthew Honnibal and Ines Montani. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental pars- ing. To appear, 2017.
9. Dimitri Kartsaklis, Mohammad Taher Pilehvar, and Nigel Collier. Mapping text to knowledge graph entities using multi-sense lstms. arXiv preprint arXiv:1808.07724, 2018.
10. Barbara Ann Kipfer. Flip Dictionary. Writer’s Digest, 2001.
11. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
12. Stanislas Lauly, Alex Boulanger, and Hugo Larochelle. Learning multilin- gual word representations using a bag-of-words autoencoder. arXiv preprint arXiv:1401.1803, 2014.
13. Michael Lesk. Automatic sense disambiguation using machine readable dictionar- ies: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation, pages 24–26, 1986.
14. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. Learning context-sensitive word embeddings with neural tensor skip-gram model. In Twenty-fourth international joint conference on artificial intelligence, 2015.
15. Qi Liu, Matt J Kusner, and Phil Blunsom. A survey on contextual embeddings. arXiv preprint arXiv:2003.07278, 2020.
16. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
17. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
18. Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. Gector–grammatical error correction: tag, not rewrite. arXiv preprint arXiv:2005.12592, 2020.
19. Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empir- ical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
20. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. Deep contextualized word represen- tations. In NAACL, 2018.
21. Slav Petrov, Dipanjan Das, and Ryan McDonald. A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086, 2011.
22. Mohammad Taher Pilehvar. On the importance of distinguishing word meaning representations: A case study on reverse dictionary mapping. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2151–2156, 2019.
23. Julien Tissier, Christophe Gravier, and Amaury Habrard. Dict2vec: Learning word embeddings using lexical dictionaries. In Proceedings of the 2017 Con- ference on Empirical Methods in Natural Language Processing, pages 254–263, 2017.
24. Tim Van de Cruys, Thierry Poibeau, and Anna Korhonen. Latent vector weight- ing for word meaning in context. In Empirical Methods in Natural Language Processing, 2011.
25. David Yarowsky. Word-sense disambiguation using statistical models of roget’s categories trained on large corpora. In COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics, 1992.
26. Lei Zheng, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, and Maosong Sun. Multi-channel reverse dictionary model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 312–319, 2020.

簡易檢索 / 詳目顯示

相關論文