研究生: |
謝瑞峰 Hsieh, Jui-Feng |
---|---|
論文名稱: |
IOW: 利用生成式AI構建學術同義詞典 In Other Words: Construction of Academic Thesaurus Using Generative AI |
指導教授: |
張俊盛
Chang, Jason S. |
口試委員: |
張智星
JANG, JYH-SHING 鍾曉芳 Chung, Siaw-Fong |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2024 |
畢業學年度: | 113 |
語文別: | 英文 |
論文頁數: | 45 |
中文關鍵詞: | 自動生成同義辭典 、同義詞生成 、提示工程 、生成式人工智慧 、大型 語言模型 |
外文關鍵詞: | Automatic Thesaurus Construction, Synonym Generation, Prompt Engineering, Generative AI, Large Language Model |
相關次數: | 點閱:81 下載:1 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一個自動構建學術同義詞典的方法。 我們使用了生成式人工智慧(Generative AI)來生成給定目標詞的同義詞列表,且確保這些同義詞具有足夠的學術語氣(Academic-sounding)。 該方法包括在對生成式人工智慧的提示中,加入經人類語言學專家認為是同義詞的詞對、學術詞彙列表,以及從學術語料庫中檢索出的常用的搭配詞及例句。 我們展示了一個系統,「In Other Words」,為總共855個字的學術用字列表產生出了總共13,573個同義詞,其中有3,613個動詞,2,917個形容詞,5,486個名詞,和1,557個副詞。 初步評估結果顯示,我們的方法在生成學術同義詞方面整體表現良好。
We introduce a new method for automatically constructing a thesaurus for academic writing. In our approach, words are converted to several prompts for using a generative AI to generate a list of academic-sounding synonyms. The method involves constructing prompts with word pairs that are considered synonymous by human linguist experts, a list of academic words, and lexical contexts of given target words retrieved from a scholarly corpus. We present a prototype system, In Other Words, that applies the method to an academic corpus and a generative AI tool, and generated a total of 13,573 academic synonyms for a word list that contains 855 academic words. This includes 3,613 verbs, 2,917 adjectives, 5,486 nouns, and 1,557 adverbs. Preliminary human evaluation shows that the proposed method performs reasonably well in generating academic-sounding synonyms.
Feras Al Tarouti and Jugal Kalita. Enhancing automatic Wordnet construction using word embeddings. In Dipanjan Das, Chris Dyer, Manaal Faruqui, and Yulia Tsvetkov, editors, Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP, pages 30–34, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/W16-1204. URL https://aclanthology.org/W16-1204.
Steven Bird and Edward Loper. NLTK: The natural language toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/P04-3031.
Lieven Buysse. Magali paquot: Academic vocabulary in learner writing. from extraction to analysis. Applied Linguistics, 32:356–359, 07 2011. doi: 10.1093/applin/amr012.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
Mehak Preet Dhaliwal, Hemant Tiwari, and Vanraj Vala. Automatic creation of a domain specific thesaurus using siamese networks. In 2021 IEEE 15th International Conference on Semantic Computing (ICSC), pages 355–361, Jan 2021. doi: 10.1109/ICSC50631.2021.00066.
Shizhe Diao, Yongyu Lei, Liangming Pan, Tianqing Fang, Wangchunshu Zhou, Sedrick Keh, Min-Yen Kan, and Tong Zhang. Doolittle: Benchmarks and corpora for academic writing formalization. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13093–13111, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.809. URL https://aclanthology.org/2023.emnlp-main.809.
T. Flati and R. Navigli. The cqc algorithm: Cycling in graphs to semantically enrich and enhance a bilingual dictionary. Journal of Artificial Intelligence Research, 43:135–171, February 2012. ISSN 1076-9757. doi: 10.1613/jair.3456. URL http://dx.doi.org/10.1613/jair.3456.
Gregory Grefenstette. Exploration in Automatic Thesaurus Discovery. 01 1994. ISBN 978-1-4613-6167-1. doi: 10.1007/978-1-4615-2710-7.
Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. spaCy: Industrial-strength Natural Language Processing in Python, 2020. URL https://doi.org/10.5281/zenodo.1212303.
Mustafa Jarrar, Eman Naser, Muhammad Khalifa, and Khaled Shaalan. Extracting synonyms from bilingual dictionaries. In Piek Vossen and Christiane Fellbaum, editors, Proceedings of the 11th Global Wordnet Conference, pages 215–222, University of South Africa (UNISA), January 2021. Global Wordnet Association. URL https://aclanthology.org/2021.gwc-1.25.
Akio Kobayashi, Hirofumi Nonaka, Shigeru Masuyama, and Hiroyuki Sakai. An automatic thesaurus construction method for technological terms in patent maps. In The 40th International Conference on Computers & Indutrial Engineering, pages 1–5, July 2010. doi: 10.1109/ICCIE.2010.5668231.
Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.447. URL https://www.aclweb.org/anthology/2020.acl-main.447.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space, 2013. URL https://arxiv.org/abs/1301.3781.
Ghassan Mohsen, Mahmoud Al-Ayyoub, Ismail Hmeidi, and Ahmad Al-Aiad. On the automatic construction of an arabic thesaurus. In 2018 9th International Conference on Information and Communication Systems (ICICS), pages 243–247, April 2018. doi: 10.1109/IACS.2018.8355431.
Eman Naser-Karajah, Nabil Arman, and Mustafa Jarrar. Current trends and approaches in synonyms extraction: Potential adaptation to arabic. In Proceedings of the 2021 International Conference on Information Technology (ICIT), pages 428–434, Amman, Jordan, 2021. Association for Computational Linguistics. doi: 10.1109/ICIT52682.2021.9491713. URL https://www.researchgate.net/publication/353485767_Current_Trends_
and_Approaches_in_Synonyms_Extraction_Potential_Adaptation_to_Arabic.
Gerda Ruge. Automatic detection of thesaurus relations for information retrieval applications. Lecture Notes in Computer Science, 02 1998. doi: 10.1007/BFb0052119.
Yuen-Hsien Tseng. Fast co-occurrence thesaurus construction for chinese news. In 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), volume 2, pages 853–858 vol.2, Oct 2001. doi: 10.1109/ICSMC.2001.973022.
Miao Wang and Wander Lowie. Understanding advanced level academic writing on syntactic complexity. In Kaibao Hu, Jong-Bok Kim, Chengqing Zong, and Emmanuele Chersoni, editors, Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pages 452–462, Shanghai, China, 11 2021. Association for Computational Lingustics. URL https://aclanthology.org/2021.paclic-1.48.
Dongqiang Yang and David M. Powers. Automatic thesaurus construction. In Proceedings of the Thirty-First Australasian Conference on Computer Science - Volume 74, ACSC ’08, page 147–156, AUS, 2008. Australian Computer Society, Inc. ISBN 9781920682552.
Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, and Ming Zhou. BERT-based lexical substitution. In Anna Korhonen, David Traum, and Llu´ıs M`arquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3368–3373, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1328. URL https://aclanthology.org/P19-1328.