簡易檢索 / 詳目顯示

研究生: 陳映竹
Chen, Ying-Zhu
論文名稱: 自動化產生同步雙語文法樣式
Learning to extract Bilingual Grammar Patterns
指導教授: 張俊盛
Chang, Jason S.
口試委員: 杜海倫
Tu, Hai-Lun
賴淑麗
Lai, Shu-Li
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 40
中文關鍵詞: 同步雙語文法樣式序列標註計算機輔助語言學習
外文關鍵詞: Bilingual Synchronous Grammar Patterns, Sequence Labeling, Computer Assisted Language Learning
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了一個利用序列標註模型自動化辨識英文文法規則,以及擷取同步雙語文法樣式的方法,可用於協助語言學習。在我們的方法中,我們將英文例句轉換為標記著文法規則符號的字集,作為序列標註模型的訓練資料。我們的方法包含了訓練一個序列標註模型來自動化辨識英文文法規則,產生人工標記資料,建立單詞翻譯表,以及設計一個利用標記資料和翻譯表來擷取雙語文法樣式的方法。在執行時,系統會依使用者查詢的單字,顯示根據使用頻率排序過的中英同步文法樣式,以及相關例句。我們提出了一個網站雛形 FamiliarPatterns ,幫助語言學習者學習正確的單字文法規則。我們使用隨機選取的例句進行初步評估,實驗結果顯示我們的方法有著不錯的準確性。


    We introduce a method for automatically identifying English grammar patterns using sequence labeling and extracting bilingual Synchronous Grammar Patterns (SGPs) to assist language learning. In our approach, English sentences are transformed into a set of words marked by grammar pattern labels, aimed at training a sequence labeling model. The method involves training a model to automatically identify English grammar patterns, generating annotated SGP data, creating a phrase table, and developing a method for extracting SGPs using phrase table. At run-time, queried words are submitted, and suggestion is performed on the corresponding synchronous grammar patterns of English and Chinese and the example sentences retrieved by frequency. We present a prototype, FamiliarPatterns, which applies the method to assist learners to adhere correct word usage. Blind evaluation on a set of randomly sampled sentences pairs shows that the method performs reasonably well.

    Abstract i 摘要 ii 致謝 iii Contents iv List of Figures vi List of Tables vii 1 Introduction 1 2 Related Work 5 3 Methodology 9 3.1 Problem Statement ............................................ 9 3.2 Extracting English Grammar Patterns .......................... 11 3.2.1 Transforming Raw Data into a Training Dataset .............. 11 3.2.2 Training a Sequence Labeling Model ......................... 12 3.3 Discovering Chinese Counterparts of English Grammar Patterns . 14 3.3.1 Building a Phrase Table .................................... 14 3.3.2 Generating Annotated Data................................... 15 3.3.3 Aligning English Grammar Pattern to Chinese ................ 17 3.4 Selecting Representative SGPs and Examples ................... 18 4 Experiments and Evaluation 21 4.1 Datasets...................................................... 22 4.2 Training process.............................................. 23 4.3 Evaluation and Discussion .................................... 24 4.3.1 EvaluationMetrics .......................................... 24 4.3.2 Evaluation Results and Discussion .......................... 25 5 Conclusion and Future Work 29 Reference 37

    Glenn Carroll and Eugene Charniak. Two experiments on learning probabilistic dependency grammars from corpora. Department of Computer Science, Univ., 1992.
    Jim Chang and Jason S Chang. Writeahead2: Mining lexical grammar patterns for assisted writing. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 106–110, 2015.
    Timothy Dozat and Christopher D Manning. Deep biaffine attention for neural dependency parsing. arXiv preprint arXiv:1611.01734, 2016.
    Chris Dyer, Victor Chahuneau, and Noah A Smith. A simple, fast, and effective reparameterization of ibm model 2. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 644–648, 2013.
    Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A Smith. Recurrent neural network grammars. arXiv preprint arXiv:1602.07776, 2016.
    Gill Francis, Susan Hunston, and Elizabeth Manning. Grammar patterns 1: verbs. NY: HarperCollins Publication, 1996.
    S Hunston and G Francis. Grammar patterns 2: Nouns and adjectives, 1998.
    Susan Hunston and Gill Francis. Pattern grammar: A corpus-driven approach to the lexical grammar of English, volume 4. John Benjamins Publishing, 2000.
    Adam Kilgarriff, Milos Husa ́k, Katy McAdam, Michael Rundell, and Pavel Rychly`. Gdex: Automatically finding good dictionary examples in a corpus. In Proceedings of the XIII EURALEX international congress, pages 425–432. Universitat Pompeu Fabra Barcelona, Spain, 2008.
    Yoon Kim, Alexander M Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, and Ga ́bor Melis. Unsupervised recurrent neural network grammars. arXiv preprint arXiv:1904.03746, 2019.
    Eliyahu Kiperwasser and Yoav Goldberg. Simple and accurate dependency parsing using bidirectional lstm feature representations. Transactions of the Association for Computational Linguistics, 4:313–327, 2016.
    Nikita Kitaev and Dan Klein. Constituency parsing with a self-attentive encoder. arXiv preprint arXiv:1805.01052, 2018.
    Dan Klein and Christopher D Manning. A generative constituent-context model for improved grammar induction. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 128–135. Association for Computational Linguistics, 2002.
    Karim Lari and Steve J Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer speech & language, 4(1):35–56, 1990.
    Peng-Hsuan Li, Tsu-Jui Fu, and Wei-Yun Ma. Why attention? analyze bilstm deficiency and its remedies in the case of ner.
    Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations, pages 55–60, 2014.
    Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: The penn treebank. 1993.
    Oliver Mason and Susan Hunston. The automatic recognition of verb patterns: A feasibility study. International Journal of Corpus Linguistics, 9(2):253–270, 2004.
    Yikang Shen, Zhouhan Lin, Chin-Wei Huang, and Aaron Courville. Neural language modeling by jointly learning syntax and lexicon. arXiv preprint arXiv:1711.02013, 2017.
    Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. Ordered neurons: Integrating tree structures into recurrent neural networks. arXiv preprint arXiv:1810.09536, 2018.
    Liang Tian, Derek F Wong, Lidia S Chao, Paulo Quaresma, Francisco Oliveira, and Lu Yi. Um-corpus: A large english-chinese parallel corpus for statistical machine translation. In LREC, pages 1837–1842, 2014.
    Chi-En Wu, Jhih-Jie Chen, Jim Chang, and Jason S Chang. Learning synchronous grammar patterns for assisted writing for second language learners. In Proceedings of the IJCNLP 2017, System Demonstrations, pages 53–56, 2017.
    Ching-Yu Yang, Ying-Zhu Chen, Yi-Chien Lin, Jason S Chang, and Wei-Tien Tsai. Annotating synchronous grammar patterns across english and chinese. pages 424– 433, 2019.

    QR CODE