簡易檢索 / 詳目顯示

研究生: 吳紫葦
Tzu-Wei Wu
論文名稱: 利用句法與統計之文法搭配與多字詞語之擷取
Extraction of Multiword Expressions related to Grammatical Collocation Based on Syntactic and Statistical Information
指導教授: 張俊盛
Jason S. Chang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 92
中文關鍵詞: 多字詞語文法搭配詞相互資訊值
外文關鍵詞: multiword expressions, grammatical collocations, mutual information
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本篇論文致力於研究文法搭配(grammatical collocation)的多字詞語,提出一個能從語料中自動地擷取出文法搭配詞的方法。本研究係利用詞性、底層片語(Base Phrase)分析,依據句法選取出符合特定結構的搭配組合,並以統計法測量搭配詞的相關性,篩選出可能構成有意義的文法搭配詞;接著,由已知之多字詞語學習其固定詞性樣式和長度分佈,以此從篩選出的搭配詞中,如(“at ”, “cost”)的文法搭配,進一步找出符合樣式且具有意義的「多字詞語」(multiword expressions),如“at cost”或是“at all costs”。

    在利用統計資訊驗證選取的候選者是否為有意義的組合,我們採用相互資訊(mutual information)測量法;計算出每一個搭配候選者的分數後,以此分數的高低代表一個搭配上的關聯性指標,並在實驗中,實際以訓練出的門檻值過濾掉相互資訊值低的組合,篩選出有意義的搭配詞來進行多字詞語的擷取。

    本論文可抽取出文法搭配詞的相關多字詞語,且基於字典上的定義,另外延伸一個「介詞-名詞-介詞」的樣式;此研究在英語學習上,可幫助學習者了解介詞搭配實詞的用法,並彌補字典中所沒有的常用文法搭配詞。未來,若能進一步從雙語語料庫擷取多字詞語的翻譯,不但能強化電腦輔助語言學習的效果,並可作為電腦輔助翻譯之用。


    This paper concentrates on the study of multiword expressions related to grammatical collocations. We propose a method to automatically extract grammatical collocations from a corpus. Our method involves selecting collocations in line with certain structure based on part of speech information and analyses of base phrases, extracting meaningful grammatical collocations by statistical analysis of associativity. In addition to statistics and linguistic knowledge, we also rely on syntactic patterns of multiword expressions. Take the collocate pattern of (“at”, “cost”) for example. Pattern of seed MWEs will enable us to obtain multiword expressions like “at cost” or “at all costs”.

    We exploit mutual information (MI) to evaluate each collocation candidate and filter out ones with low mutual information rate, which is a threshold trained on real data. Collocations with MI higher than the lower-bound are further used to assist in the extraction of multiword expressions.

    The grammatical collocations and related multiword expressions can be used in many Natural Language Processing applications, including computer assisted language learning, parsing, and machine translation.

    摘 要 i ABSTRACT ii 致謝辭 iii 目 錄 iv 圖 表 目 錄 v 第一章 序 論 1 第二章 相 關 研 究 8 第三章 方 法 12 3.1 問題之定義 12 3.2 詞性、底層片語分析 13 3.3 文法搭配多字詞語擷取 15 3.3.1 選取特定組合 16 3.3.2 統計資訊和訓練門檻值 17 3.3.3 多字詞語與分析長度分佈及詞性樣式 21 3.4 執行階段 24 第四章 實驗 與 評估 26 4.1 實驗的設計 26 4.2 分析詞性樣式及長度分布 31 4.3 評估 35 第五章 結論 與 未來展望 38 參 考 文 獻 40 附 錄 A -文法搭配詞 43 (a) 前100筆「介詞-名詞」搭配詞,其MI值大於門檻值(2.2) 43 (b) 前100筆「介詞-名詞-介詞」搭配詞,其MI值大於門檻值(5.2) 46 附 錄 B -評估文法搭配詞資料 49 (a) 500筆測試資料,共153筆機器標示正確的「介詞-名詞」搭配詞 49 附 錄 C -評估文法搭配的多字詞語資料 78 (a) 153筆「介詞-名詞」搭配詞,篩選出多字詞語4322筆 78 (b) 115筆「介詞-名詞-介詞」搭配詞,篩選出多字詞語563筆 84

    Benson, Morton., Benson, Evelyn., and Ilson, Robert. 1986. THE BBI COMBINATORY DICTIONARY OF ENGLISH: A Guide to Word Combinations. John Benjamins, Amsterdam, Netherlands.

    Choueka, Y., Klein, S.T. and Neuwitz, E. 1983. Automatic retrieval of frequent idiomatic and collocational experssions in a large corpus. Journal of Literary and Linguistic Computing, 4:34--38.

    Church, K. W. and Hanks, P. 1990. Word association norms, mutual information and lexicography. Computational Linguistics, 16(1):22--29

    Dunning. T. 1993. Accurate methods for the statistics of surprise and coincidence”, Computational Linguistics 19:1, pp.61-75.

    Evert, S. and Krenn, B. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France.

    Firth, J. R. 1951. “Modes of Meaning” in: Papers in Linguistics 1934-51. (London) 1957 (SS.190-215)

    Goldman, J.P., Nerima, L. and Wehrli, E. 2001. Collocation extraction using a syntactic parser. In 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL39), pages 61–66, CNRS - Institut de Recherche en Informatique de Toulouse, and Universit´e des Sciences Sociales, Toulouse, France, July.

    Kilgarriff, A. and Tugwell, D. 2001. Word sketch: Extraction and display of significant collocations for lexicography. In Proceedings of the Collocations Workshop in Association with ACL-2001.

    Kilgarriff, A. 2001. WASP-Bench_an MT Lexicographers Workstation Supporting State-of-the-art Lexical Disambiguation.

    Krenn, B. and Evert, S. 2001. Can we do better than frequency? a case study on extracting pp-verb collocations. In Proceedings of the ACL Workshop on Collocations, Toulouse, France, pp. 39--46.

    Lin, Dekang. 1998. Extracting collocation from Text corpora. First Workshop on Computational Terminology. pp. 57-63.

    Lü, Yajuan, Zhou, Ming 2004. Collocation Translation Acquisition Using Monolingual Corpora. ACL 2004, pp. 167-174.

    Manning, C.D. and Hinrich Sch¨utze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.

    Pearce, D. 2002. A comparative evaluation of collocation extraction techniques. In Proceedings of the 3rd Language Resources Evaluation Conference.

    Smadja, F. 1993. Retrieving Collocation From Text: Xtract, Computational Linguistics, Vol. 19, No. 1, pp. 143-178.

    Smadja, F., McKeown, K. R. and Hatzivassiloglou, V. 1996. Translating collocations for bilingual lexicons: A statistical approach, Computational Linguistics, 22(1), 1--38.

    Sag, I.A., Baldwin, T., Bond, F., Copestake, A. and Flickinger, D. 2002. Multiword Expressions: A Pain in the Neck for NLP, In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City.

    Seretan, Violeta., Nerima, Luka., Wehrli, Eric. 2003. Extraction of Multi-Word collocations using syntactic bigram composition. International Conference on Recent Advances in NLP. pp. 424-431.

    Seretan, Violeta., Nerima, Luka. Wehrli, Eric. 2004. A Tool for Multi-Word Collocation Extraction and Visualization in Multilingual Corpora. In Proceedings of the Eleventh EURALEX International Congress, pp.755-766, Lorient, France.

    Seretan, Violeta. 2005. Induction of Syntactic Collocation Patterns from Generic Syntactic Relations. In Proceedings of Nineteenth International Joint Conference on Artificial Intelligence (IJCAI 2005), pp.1698-1699, Edinburgh, Scotland.

    Spears, R.A. 2001. NTC's Pocket Dictionary of Words and Phrases.

    Thomas C. Chuang, Jia-Yan Jian, Jason S. Chang. 2005. Collocational Translation Memory Extraction Based on Statistical and Linguistic Information. Computational Linguistics and Chinese Language Processing, pp.329-346.

    TOTALrecall(http://candle.cs.nthu.edu.tw/Counter/Counter.asp?funcID=1).

    Wu J.C., and Jason S. Chang. 2003. Bilingual Collocation Extraction Based on Syntactic and
    Statistical Analyses, ROCLING XV.

    解志強. 2002. 中譯英時的詞彙搭配問題, 長榮學報,5 (2), 135-149。

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE