研究生: |
吳紫葦 Tzu-Wei Wu |
---|---|
論文名稱: |
利用句法與統計之文法搭配與多字詞語之擷取 Extraction of Multiword Expressions related to Grammatical Collocation Based on Syntactic and Statistical Information |
指導教授: |
張俊盛
Jason S. Chang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 92 |
中文關鍵詞: | 多字詞語 、文法搭配詞 、相互資訊值 |
外文關鍵詞: | multiword expressions, grammatical collocations, mutual information |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文致力於研究文法搭配(grammatical collocation)的多字詞語,提出一個能從語料中自動地擷取出文法搭配詞的方法。本研究係利用詞性、底層片語(Base Phrase)分析,依據句法選取出符合特定結構的搭配組合,並以統計法測量搭配詞的相關性,篩選出可能構成有意義的文法搭配詞;接著,由已知之多字詞語學習其固定詞性樣式和長度分佈,以此從篩選出的搭配詞中,如(“at ”, “cost”)的文法搭配,進一步找出符合樣式且具有意義的「多字詞語」(multiword expressions),如“at cost”或是“at all costs”。
在利用統計資訊驗證選取的候選者是否為有意義的組合,我們採用相互資訊(mutual information)測量法;計算出每一個搭配候選者的分數後,以此分數的高低代表一個搭配上的關聯性指標,並在實驗中,實際以訓練出的門檻值過濾掉相互資訊值低的組合,篩選出有意義的搭配詞來進行多字詞語的擷取。
本論文可抽取出文法搭配詞的相關多字詞語,且基於字典上的定義,另外延伸一個「介詞-名詞-介詞」的樣式;此研究在英語學習上,可幫助學習者了解介詞搭配實詞的用法,並彌補字典中所沒有的常用文法搭配詞。未來,若能進一步從雙語語料庫擷取多字詞語的翻譯,不但能強化電腦輔助語言學習的效果,並可作為電腦輔助翻譯之用。
This paper concentrates on the study of multiword expressions related to grammatical collocations. We propose a method to automatically extract grammatical collocations from a corpus. Our method involves selecting collocations in line with certain structure based on part of speech information and analyses of base phrases, extracting meaningful grammatical collocations by statistical analysis of associativity. In addition to statistics and linguistic knowledge, we also rely on syntactic patterns of multiword expressions. Take the collocate pattern of (“at”, “cost”) for example. Pattern of seed MWEs will enable us to obtain multiword expressions like “at cost” or “at all costs”.
We exploit mutual information (MI) to evaluate each collocation candidate and filter out ones with low mutual information rate, which is a threshold trained on real data. Collocations with MI higher than the lower-bound are further used to assist in the extraction of multiword expressions.
The grammatical collocations and related multiword expressions can be used in many Natural Language Processing applications, including computer assisted language learning, parsing, and machine translation.
Benson, Morton., Benson, Evelyn., and Ilson, Robert. 1986. THE BBI COMBINATORY DICTIONARY OF ENGLISH: A Guide to Word Combinations. John Benjamins, Amsterdam, Netherlands.
Choueka, Y., Klein, S.T. and Neuwitz, E. 1983. Automatic retrieval of frequent idiomatic and collocational experssions in a large corpus. Journal of Literary and Linguistic Computing, 4:34--38.
Church, K. W. and Hanks, P. 1990. Word association norms, mutual information and lexicography. Computational Linguistics, 16(1):22--29
Dunning. T. 1993. Accurate methods for the statistics of surprise and coincidence”, Computational Linguistics 19:1, pp.61-75.
Evert, S. and Krenn, B. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France.
Firth, J. R. 1951. “Modes of Meaning” in: Papers in Linguistics 1934-51. (London) 1957 (SS.190-215)
Goldman, J.P., Nerima, L. and Wehrli, E. 2001. Collocation extraction using a syntactic parser. In 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL39), pages 61–66, CNRS - Institut de Recherche en Informatique de Toulouse, and Universit´e des Sciences Sociales, Toulouse, France, July.
Kilgarriff, A. and Tugwell, D. 2001. Word sketch: Extraction and display of significant collocations for lexicography. In Proceedings of the Collocations Workshop in Association with ACL-2001.
Kilgarriff, A. 2001. WASP-Bench_an MT Lexicographers Workstation Supporting State-of-the-art Lexical Disambiguation.
Krenn, B. and Evert, S. 2001. Can we do better than frequency? a case study on extracting pp-verb collocations. In Proceedings of the ACL Workshop on Collocations, Toulouse, France, pp. 39--46.
Lin, Dekang. 1998. Extracting collocation from Text corpora. First Workshop on Computational Terminology. pp. 57-63.
Lü, Yajuan, Zhou, Ming 2004. Collocation Translation Acquisition Using Monolingual Corpora. ACL 2004, pp. 167-174.
Manning, C.D. and Hinrich Sch¨utze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
Pearce, D. 2002. A comparative evaluation of collocation extraction techniques. In Proceedings of the 3rd Language Resources Evaluation Conference.
Smadja, F. 1993. Retrieving Collocation From Text: Xtract, Computational Linguistics, Vol. 19, No. 1, pp. 143-178.
Smadja, F., McKeown, K. R. and Hatzivassiloglou, V. 1996. Translating collocations for bilingual lexicons: A statistical approach, Computational Linguistics, 22(1), 1--38.
Sag, I.A., Baldwin, T., Bond, F., Copestake, A. and Flickinger, D. 2002. Multiword Expressions: A Pain in the Neck for NLP, In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2002), Mexico City.
Seretan, Violeta., Nerima, Luka., Wehrli, Eric. 2003. Extraction of Multi-Word collocations using syntactic bigram composition. International Conference on Recent Advances in NLP. pp. 424-431.
Seretan, Violeta., Nerima, Luka. Wehrli, Eric. 2004. A Tool for Multi-Word Collocation Extraction and Visualization in Multilingual Corpora. In Proceedings of the Eleventh EURALEX International Congress, pp.755-766, Lorient, France.
Seretan, Violeta. 2005. Induction of Syntactic Collocation Patterns from Generic Syntactic Relations. In Proceedings of Nineteenth International Joint Conference on Artificial Intelligence (IJCAI 2005), pp.1698-1699, Edinburgh, Scotland.
Spears, R.A. 2001. NTC's Pocket Dictionary of Words and Phrases.
Thomas C. Chuang, Jia-Yan Jian, Jason S. Chang. 2005. Collocational Translation Memory Extraction Based on Statistical and Linguistic Information. Computational Linguistics and Chinese Language Processing, pp.329-346.
TOTALrecall(http://candle.cs.nthu.edu.tw/Counter/Counter.asp?funcID=1).
Wu J.C., and Jason S. Chang. 2003. Bilingual Collocation Extraction Based on Syntactic and
Statistical Analyses, ROCLING XV.
解志強. 2002. 中譯英時的詞彙搭配問題, 長榮學報,5 (2), 135-149。