簡易檢索 / 詳目顯示

研究生: 張裕嘉
Chang, Yu-Chia
論文名稱: 學術論文自動搭配詞建議之研究
Automated Collocation Suggestion in Academic Writing
指導教授: 張俊盛
Chang, Jason S.
劉顯親
Liou, Hsien-Chin
口試委員:
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 150
中文關鍵詞: 學術論文寫作搭配詞自動建議機器學習分類器
外文關鍵詞: academic writing, collocation, automatic suggestion, machine learning, classifier
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近幾年來,搭配詞(Collocation)的使用在外語教學的領域中已經被廣泛的探討,搭配詞也常被指稱為是有效提升語言學習者語言能力的關鍵。而在學術論文寫作中,不少學者也開始注意到此語言現象,並發現此項知識會連帶影響到整體學術寫作的品質。

    過去的研究指出有效的搭配詞學習,需要學習者在字彙的學習過程中自我意識到搭配詞的重要性,才能逐漸依此強化其搭配詞的學習。但是這樣的策略在實際的課堂中有時卻很難實行,語言教學者本身不但需要有足夠的搭配詞知識,同時語言教學者也被期望在教學過程中,能適時地給與學生們搭配詞使用上的建議,但此類建議的給予對於教學者而言卻是相當費時費工。除此之外,如果希望透過語言科技來幫助搭配詞建議,有時卻有窒礙難行之處。一個適切的搭配詞建議,不但需要準確的瞭解其語義上的限制,在語用上也需要依實際情形斟酌考慮,因此在目前語言科技的發展上,搭配詞的建議問題仍舊是未解並需要進一步深入探究的。

    在本論文中,為期能幫助學習者在學術論文的撰寫過程中能獲得適當的搭配詞協助,我們試圖去探討是否能透過機器學習的方法來建立一個輔助寫作工具。以一個需要搭配詞建議的文句作為輸入,我們透過資料的訓練建立了分類器,並將其分類的結果視為該搭配詞的建議依據。而為了建立一個有效的搭配詞分類器,如何選擇分類所需的特徵值即顯重要。因此,我們透過學術文類的語料庫收集,並將語料中相關建議字詞的上下文資訊整合訓練該分類器,以期能透過分類器自動選取相對應的字詞作為建議。

    我們針對學生常犯錯誤的動名搭配詞進行實驗,實驗結果顯示利用上下文資訊所訓練的分類器的確能有效的提供搭配詞建議,並能提供良好的建議排名。此結果也顯示我們針對學術論文所提出的寫作輔助架構,能確實多面向地提供學術論文中所需的搭配詞建議。


    The concept of collocation has been widely discussed in the field of language teaching for decades. It has been shown that collocation is important in helping language learners achieve native-like fluency. In the field of English for academic purpose, there are also more and more researchers recognizing this important feature in academic writing. It is often argued that collocation can influence the effectiveness of a piece of writing and the lack of such knowledge might cause cumulative loss of precision.

    Previous research indicates effective collocation acquisition needs learners’ awareness while they learn vocabulary. However, this strategy might not be easy to apply in a real-life classroom. We not only need to equip language instructors with rich knowledge of collocation but also need to help instructors correct students collocation errors, which is labor intensive and time consuming. In addition, to automate collocation suggestion via language technology requires considerable efforts. A proper collocation suggestion might involve knowing the correct semantic as well as pragmatic usages. It is thus still an unresolved issue in need of particular attention.

    In our thesis, we prove the feasibility of using a machine learning method to build a writing assistant which is aimed at automatically prompting learners with collocation suggestions in academic writing. Given an input sentence, which requires collocation suggestions, we build a data-driven classifier and treat the outcome of the classification as suggested substitutions in question. Moreover, for a robust classifier, feature selection is the key component. We make use of the target’s contextual linguistic clues to elicit the most relevant suggestions from the reference corpus of scholarly texts.

    We carried out an experiment focusing on one of the major types of collocation problems, verb-noun collocations. The proposed classifier along with contextual information can satisfactorily return suggestions with the best hit rank in the experiment. Our framework of computer-assisted academic writing can facilitate learner-writers’ collocation uses and help to transfer that knowledge to their future writing.

    Table of Contents 摘要 i ABSTRACT iii Acknowledgements v Table of Contents vii List of Figures x List of Tables xi CHAPTER 1 INTRODUCTION 1 1.1 Background 2 1.2 Motivation 4 1.3 Research Questions 7 1.4 Organization of the Dissertation 8 CHAPTER 2 COLLOCATION RESERACH 9 2.1 Notions of Collocation 9 2.2 Collocation and Language Proficiency 11 2.3 Collocation Error Analysis 13 2.4 Word Usage in Academic Writing 14 2.5 Collocation in Academic Writing 16 2.6 Collocation Extraction 18 2.7 Word Prediction Research 20 CHAPTER 3 COLLOCATION WRITING ASSISTANT 23 3.1 Collocation Tutor 24 3.2 Lexical Assistant 25 3.3 Collocation Checker 26 3.4 AwkChecker 27 3.5 Educational Testing Service (ETS) 28 3.6 Liu 29 3.7 Microsoft 29 3.8 JustTheWord 30 3.9 Summary 31 CHAPTER 4 METHOD 35 4.1 Problem Statement 36 4.2 Procedure of Training Academic Collocation Checker 37 4.2.1 Building Academic Writing Corpus from the Web 38 4.2.2 Data Parsing and Collocation Extraction 39 4.2.3 Using a Classifier for Suggestion Problem 45 4.2.4 Feature Selection for Machine Learning 47 4.2.5 Training a Machine Learning Model 49 4.3 Automated Collocation Suggestion at Run-time 53 CHAPTER 5 EVALUATION AND DISCUSSION 55 5.1 Experimental Setting 55 5.1.1 Training and Testing 55 5.1.2 The Selected Training Method in the Experiment 56 5.2 Experimental Data 58 5.3 Evaluation Metrics 60 5.4 Evaluation 61 5.4.1 Evaluation on Models of Feature Combinations 61 5.4.2 Evaluation on Different Machine Learning Methods 68 5.4.3 Evaluation on the Different Data Sizes 69 5.4.4 Evaluation on Data from Different Disciplines 74 5.4.5 Evaluation on Different Collocation Types 80 5.4.6 Evaluation on the Learner Corpus 83 5.5 Discussion 88 5.5.1 Limitation 88 5.5.2 Review of Research Questions 89 CHAPTER 6 APPLICATIONS 91 6.1 Contextual Collocation Checker 92 6.2 Collocation in Vocabulary Assessment 94 CHAPTER 7 CONCLUSION 98 7.1 Pedagogical Implication 98 7.2 Future Research 99 REFERENCE 102 APPENDIX A 109 APPENDIX B 149

    REFERENCE

    Al-Mubaid, H. (2007). A Learning-Classification Based Approach for Word Prediction, International Arab Journal on Information Technology IAJIT, 4(3)

    Benson, M., Benson, E. & Ilson, R. (1986). The BBI Combinatory Dictionary of English: A Guide to Word Combinations. Philadelphia: John Benjamins.

    Benson, M. (1990). Collocations and General-Purpose Dictionaries. International Journal of Lexicography, 3(1), 23-35.

    Conzett, J. (2000). Integrating collocation into a reading and writing course. In Lewis, M. (Ed.), Teaching collocation: Further developments in the lexical approach, 70-86. London: Language Teaching Publications.

    Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213-238. Averil Coxhead’s website: http://language.massey.ac.nz/staff/awl/index.shtml

    Chang, Y., Chang, J., Chen, H., & Liou, H. (2008). An automatic collocation writing assistant for Taiwanese EFL learners: A case of corpus-based NLP technology. Computer Assisted Language Learning, 21(3), 283-299.

    Chen, Q., & Ge, G.C. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word families in medical research articles (RAs). English for Specific Purpose, 26, 502-514.

    Chung, M., & Nation, P. (2003). Technical vocabulary in specialized texts. Reading in a Foreign Language, 15, 103-116.

    Dong, Y. (1998). Non-native graduate students’ thesis/dissertation writing in science: Self-reports by students and their advisors from two US institutions. English for Specific Purposes, 17(4), 369–390.

    Dunning, T (1993). Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19(1), 61-75.

    Electronic Theses and Dissertations System. (2010). Retrieved from http://etds.ncl.edu.tw/

    Ellis, R. (1999). Learning a second language through interaction. Amsterdam: John Benjamins.

    Even-Zohar, Y. & Roth, D. (2000). A Classification Approach to Word Prediction. NAACL 00’, Seattle, Washington.

    Farghal, M. & Obiedat, H. (1995). Collocations: A neglected variables in EFL. International Review of Applied Linguistics, 33, 313-331.

    Firth, J. R. (1957). Modes of meaning. In Papers in linguistics, 1934-1951. Oxford: Oxford University Press.

    Futagi, Y., Deane, P., Chodorow, M., & Tetreault, J. (2008). A computational approach to detecting collocation errors in the writing of non-native speakers of English. Computer Assisted Language Learning, 21(4), 353-367.

    Gao, J., B., Dolan, W. B., Hon, H., & Zhou, M. (2008). U.S. Patent No. 2008/0133444. Washington, DC: U.S. Patent and Trademark Office.

    Gledhill, C. (2000). The Discourse Function of Collocation in Research Article Introductions. English for Specific Purposes, 19, 115-135.

    Granger, S. (1998). Prefabricated patterns in advanced EFL writing: collocations and formulae. In Cowie, A. (ed.) Phraseology: theory, analysis and applications. Oxford University Press, Oxford, pp. 145-160.

    Hanks, P. and Church, K. W. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), pp. 22-29.

    Hawking, D. & Craswell, N. (2002). Overview of the TREC-2001 Web Track. In Proceedings of the 10th Text Retrieval Conference (TREC-10), 61-67.

    Henriksen, Birgit (1999). Three Dimensions of Vocabulary Development. SSLA, 21, pp. 303-317. Cambridge University Press.

    Hoey, M. (1991). Patterns of lexis in text. Oxford: Oxford University Press.

    Hon, H., Gao, J., & Zhou, M. (2007). U.S. Patent No. 2007/0010992. Washington, DC: U.S. Patent and Trademark Office.

    Howarth, P. A. (1996). Phraseology in English Academic Writing. Tu‥ bingen: Max Niemeyer Verlag.

    Howarth, P. (1998). The phraseology of learner’s academic writing. In A. Cowie (Ed.), Phraseology: Theory, analysis, and application, 161-186. Oxford: Oxford University Press.

    JustTheWord. (2010). Retrieved from http://193.133.140.102/JustTheWord/.

    Jian, J. Y., Chang, Y. C., & Chang, J. S. (2004). Collocational Translation Memory Extraction Based on Statistical Linguistic Information, Paper presented in ROCLING 2004, Conference on Computational Linguistics and Speech Processing, Taipei.

    Justeson, J. S. and Slava M. Katz (1995). Technical Terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1(1), 9-27.

    Kilgarriff, A and Tugwell, D. (2001). WORD SKETCH: Extraction and Display of Significant Collocations for Lexicography, Proceedings of ACL 2001, 32-38.

    Klein, D. and Manning, C. D. (2003). Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems 15 (NIPS 2002), Cambridge, MA: MIT Press, pp. 3-10.

    Lafferty, J., McCallum, A., & Pereira, F. (2001) Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001).

    Lee, S. H. (2003). ESL learners’ vocabulary use in writing and the effects of explicit vocabulary instruction, System, 31, 537-561.

    Lewis, M. (2000). Teaching collocation: Further development in lexical approach. London: Language Teaching Publications.

    Liere, R. and Tadepalli, P. (1996). The use of active leaning in text categorization. Working Notes of the AAAI 1996 Spring Symposium on Machine Learning n Information Access, Stanford, CA.

    Lin, Y. H. (2009). Automatically Identify Moves in Academic Abstracts. Unpublished master’s thesis, National Tsing Hua University, Hsinchu, June.

    Liu, C. P. (1999). An analysis of collocational errors in EFL writings. The proceedings of the Eighth International Symposium on English Teaching, 483-494. Taipei: Crane.

    Liu, C. P. (2000). A study of strategy use in producing lexical collocations. Selected Papers from the Ninth International Symposium on English Teaching, 481-492. Taipei: Crane.

    Liu, L. E. (2002). A corpus-based lexical semantic investigation of verb-noun miscollocations in Taiwan learners’ English. Unpublished master’s thesis, Tamkang University, Taipei, January.

    Liu, A. L., Wible, D., & Tsao, N. L. (2009). Automated suggestions for miscollocations. In Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications, 47-50.

    Manning, C. and Schutze, H. (1999) Foundations of Statistical Natural Language Processing. MIT press, Cambridge, MA.

    Mitchell, T. M. (1997). Machine Learning. Boston, MA: McGraw-Hill.

    Moore, K. L., & Dalley, A. F. (1999). Clinically oriented anatomy. (4th ed.) Philadelphia: Lippincott, Williams & Wilkins.

    Mudraya, O. (2006). Engineering English: A lexical frequency instructional model. English for Specific Purposes, 25, 235-256.

    Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge Press.

    Nattinger, J. R., & DeCarrico, J. D. (1992). Lexical phrase and language teaching. Oxford: Oxford University Press.

    Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24, 223-242.

    Quinlan, J. (1993). C4.5:Programs for Machine Learning. San Francisco, CA: Morgan Kaufmann.

    Ratnaparkhi, A. (1998). Maximum Entropy Models for natural language ambiguity resolution. Ph.D. thesis, University of Pennsylvania.

    Shei, C. C., & Pain, H. (2000). An ESL writer’s collocational aid. Computer Assisted Language Learning, 13, 167-182.

    Stuart K. & Trelis, A. B. (2006). Collocation and knowledge production in an academic discourse community. Proceedings of the 5th International AELFE Conference. Zaragoza, Prensas. Universitarias de Zaragoza, 238-245.

    Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational Linguistics, 19(1), 143-177.

    Support Vector Machine. (2010). In Wikipedia, the free encyclopedia. Retrieved July 30, 2010, from http://en.wikipedia.org/wiki/Support_vector_machine

    Park, T., Lank, E., Poupart, P., & Terry, M. (2008). “Is the Sky Pure Today?” AwkChecker: An Assistive Tool for Detecting and Correcting Collocation Errors. In UIST '08: Proceedings of the 21st annual ACM symposium on User interface software and technology (2008), 121-130.

    West, M. (1953). A general service list of English words. London: Longmans, Green.

    Yang, B. Z. (2009). WriteAhead: An Abstracts Writing Assistant System for Academic Writing. Unpublished master’s thesis, National Tsing Hua University, Hsinchu, June.

    Zhang, X. (1993). English collocations and their effect on the writing of native and non-native college freshmen. Ph.D. thesis, Indiana University of Pennsylvania.

    Zinkgraf, M. (2008). V+N Miscollocations in the written production university level students. ELIA, 8, 91-116.

    Zhou, M., & Liu, T. (2006). U.S. Patent No. 7,031,911. Washington, DC: U.S. Patent and Trademark Office.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE