簡易檢索 / 詳目顯示

研究生: 吳㛢慈
Wu, Hsiu-Tzu
論文名稱: GramaConc:英文語法與片語之檢索系統
GramaConc: A Concordancer for Grammar Patterns and Phrases
指導教授: 張俊盛
Chang, Jason S.
口試委員: 陳浩然
Chen, Hao-Jan Howard
許永真
Hsu, Jane Yung-Jen
廖柏森
Liao, Posen
蘇以文
Su, Lily I-Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 54
中文關鍵詞: 文法規則文法偵測文法檢索依存剖析
外文關鍵詞: grammar pattern, pattern recognition, pattern concordance, dependency parsing
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們提出了新的方法,從給定之英文語料庫中生成相應之文法規則索引典,此索引典包含了該語料庫中的常用字、文法規則、片語等資訊。在本研究中,我們透過現有的相依剖析器對句子進行初步的句法分析,分析步驟包括:從句子中剖析詞與詞之間的相依關係,參照知識庫將相依關係轉換為合乎文法規則的句法結構,最後再統計語料庫中的關鍵詞彙、文法規則及片語之次數。在本論文中,我們亦將此方法應用於大規模學術語料庫 S2ORC 上,並產生了的一個索引典雛形GramaConc。此索引典中含有超過百萬條之文法規則的統計結果以及參照出處,初步評估亦顯示我們的方法能夠相當正確地產生大規模文法規則的實例。總結而言,我們結合了既有的相依剖析器與文法規則知識庫,藉此推導出豐富的語言學資訊,未來或可應用於語言學習、文法改錯、輔助學術寫作、相依剖析器改進等研究之中。


    We introduce a method for generating a concordance of a given corpus with indexes of common words, phrases, and grammar patterns. In our approach, the sentences in the corpus are transformed into patterns using an existing dependency parser and a knowledge base of grammar patterns. The method involves parsing the sentences into dependency relations, converting dependency relations into grammar patterns consistent with the pattern knowledge base, calculating the indexes and frequency counts of words, phrases, and grammar patterns. We present a prototype concordance system, GramaConc, that applies the method to a large-scale scientific research corpus, S2ORC, resulting in 100 million instances of grammar patterns. Preliminary evaluation shows that the method produces comprehensive information about grammar patterns. Our methodology cleanly supports combining a statistical parser and a knowledge base to derive quantitative linguistic information, resulting in a comprehensive dataset potentially useful for language learning, grammatical error correction, assisted academic writing, and dependency parsing.

    Abstract i 摘要 ii 致謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 5 3 Methodology 9 3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Building a Pattern Concordance . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Analyzing Parse Trees from Sentences . . . . . . . . . . . . 11 3.2.2 Identifying Patterns in a Parse Tree . . . . . . . . . . . . . 14 3.2.3 Reformatting Patterns . . . . . . . . . . . . . . . . . . . . . 18 3.2.4 Building a Concordance . . . . . . . . . . . . . . . . . . . . 21 iv 3.3 Run-Time Parsing and Searching . . . . . . . . . . . . . . . . . . . 22 4 Experimental Setting 24 4.1 GramaConc Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Evaluation Word and Patterns . . . . . . . . . . . . . . . . . . . . . 28 5 Results and Discussion 31 5.1 Concordance Results of a Given Word . . . . . . . . . . . . . . . . 31 5.2 Evaluation on a Given Word . . . . . . . . . . . . . . . . . . . . . . 33 5.3 Evaluation on Sentences with Different Length . . . . . . . . . . . . 35 6 Conclusion and Future Work 38 References 40 Appendices 42 A Implemented Patterns 42 B Evaluation Sentences 44 B.1 Decide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.2 Short Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 B.3 Long Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    1. Baisa, V., & Suchomel, V. (2014). Skell: Web interface for english language learning. RASLAN .
    2. Boisson, J., Kao, T.-H., Wu, J.-C., Yen, T.-H., & Chang, J. S. (2013). Linggle: A web-scale linguistic search engine for words in context. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 139–144. https://aclanthology.org/P13-4024
    3. Chen, Y.-Z. (2020). Learning to extract bilingual grammar patterns (Master’s thesis). National Tsing Hua University. Hsinchu, ROC.
    4. Collins. (1996). English pattern grammar: Learn english grammar and basic sentence structure: Collins education. https://grammar.collinsdictionary.com/grammar-pattern
    5. Ellis, P. B., Hunston, S., & Manning, E. (1996). Collins COBUILD grammar patterns 1: Verbs. Collins CoBUILD.
    6. Hunston, S., & Francis, G. (2000). Pattern grammar. John Benjamins Publishing.
    7. Hunston, S., Manning, E., & Francis, G. (1998). Collins COBUILD grammar patterns 2: Nouns and adjectives. Collins CoBUILD.
    40
    8. Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., & Suchomel, V. (2014). The sketch engine: Ten years on. Lexicography 2197-4306, 1, 7–36. https://doi.org/10.1007/s40607-014-0009-9
    9. Lo, K., Wang, L. L., Neumann, M., Kinney, R., & Weld, D. (2020). S2ORC: The semantic scholar open research corpus. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4969–4983. https://doi.org/10.18653/v1/2020.acl-main.447
    10. Mason, O., & Hunston, S. (2004). The automatic recognition of verb patterns: A feasibility study. International Journal of Corpus Linguistics, 9(2), 253–270. https://doi.org/10.1075/ijcl.9.2.05mas
    11. Peng, C.-Q. (2019). Extracting chinese lexical grammar patterns using dependency parsing (Master’s thesis). National Tsing Hua University. Hsinchu, ROC.
    12. Römer, U., Brook O’Donnell, M., & Ellis, N. C. (2015). Chapter 2. using COBUILD grammar patterns for a large-scale analysis of verb-argument constructions. Corpora, grammar and discourse (pp. 43–72). John Benjamins Publishing Company.
    13. Yan, H., & Li, Y. (2021). A smart e-learning system for data-driven grammar learning. Smart education and e-learning 2021 (pp. 77–87). Springer Singapore. https://doi.org/10.1007/978-981-16-2834-4_7
    14. Yen, T.-H., Wu, J.-C., Chang, J., Boisson, J., & Chang, J. (2015). WriteAhead: Mining grammar patterns in corpora for assisted writing. Proceedings of ACL-IJCNLP 2015 System Demonstrations.

    QR CODE