簡易檢索 / 詳目顯示

研究生: 林映秀
Lin, Ying-Hsiu
論文名稱: 學術論文摘要的自動文步分析
Automatically Identify Moves in Academic Abstracts
指導教授: 張俊盛
Chang, Jason S.
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 46
中文關鍵詞: 文步結構摘要特徵值機器學習模型
外文關鍵詞: Move Structure, Abstract, Feature, Maximum Entropy model
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在本論文中,我們利用督導式機器學習方法,經由自動抽取論文摘要中重要的特徵值,自動建立文步標示與特徵值間的關係並以此訓練機器學習模型。以此機器學習模型可以自動標示論文摘要中的文步結構。
    在訓練階段,我們利用少量人工標記文步的論文摘要,從中取出重要的特徵值,再訓練機器學習模型學習特徵值與文步之間的關係,以進行對論文摘要的自動文步分析。在執行階段,應用訓練好的機器學習模型,我們分別針對電腦科學領域與應用語言學領域標示研究者所撰寫的論文摘要文步,我們利用人工評估的方法,在兩個不同領域下,皆可達到平均超過80%的正確率,顯示本方法可以成功標記論文摘要文步。


    This paper presents a novel method for automatically identifying the move structure in academic abstracts to assist non-native speaker of English in academic writing. In our approach, we use a small set of manually tagged abstracts as training corpus and analyze the significant features. Maximum Entropy model (ME) is employed to classify the move structure in the given abstracts. It involves automatically learning of the syntactic features, and automatically building a statistical model. The proposed method outperforms the previous research with a significantly higher accuracy. Our methodology clearly shows that the ME could suitably model the abstract structure, and implies that a more flexible move tagger can be easily applied to different research domains using a small set of manually tagged abstracts.

    摘要 I ABSTRACT II ACKNOWLEDGEMENT III TABLE OF CONTENTS IV CHAPTER 1 INTRODUCTION 1 CHAPTER 2 RELATED WORK 5 2.1 MACROSTRUCTURE OF RAS 5 2.2 STRUCTURAL REPRESENTATION OF ABSTRACTS 7 2.3 IDENTIFYING MOVES AS TEXT CLASSIFICATION 8 CHAPTER 3 METHOD 11 3.1 PROBLEM STATEMENT 11 3.2 LEARNING MOVE SENTENCE RELATION 13 3.2.1 Collecting Abstracts from the web 13 3.2.2 Manually label abstracts with moves 14 3.2.3 Select the features for machine learning 16 3.2.4 Train a machine learning model 20 3.3 RUN-TIME AUTOMATIC MOVE-TAGGING 22 CHAPTER 4 EXPERIMENTAL SETTING 25 4.1 EXPERIMENTAL SETTING 25 4.2 EVALUATION METRICS 27 CHAPTER 5 EVALUATION RESULTS 31 5.1 EVALUATION RESULTS 31 5.2 DISCUSSION AND ERROR ANALYSIS 35 CHAPTER 6 CONCLUSION AND FUTURE WORK 39 REFERENCES 40 APPENDIX A –GUIDELINES FOR HUMAN ANNOTATION OF ABSTRACT 43

    ANSI. (1979). American national standard for writing abstracts. Z39.14-1979,
    American National Standards Institute (ANSI).

    Anthony, L. & Lashkia, G. V. (2003). Mover: A machine learning tool to assist in the
    reading and writing of technical papers. IEEE Trans. Prof. Commun., 46, pp.
    185-193.

    Bhatia, V. K. (1993). Analysing genre: Language use in professional settings. Applied
    Linguistics and Language Studies Series, London & NY: Longman.

    Dos Santos, M. B. (1996). The textual organization of research paper abstracts in
    applied linguistics., Text,16, 481-500.

    Della Pietra, S., Della Pietra, V., Lafferty, J., Technol, R., & Brook, S. (1997).
    Inducing features of random fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19(4), 380-393.

    Edmundson, H. P. (1969). New Methods in Automatic Extracting. Journal of the
    Association for Computing: Machinery, 16(2), 264-285.

    Hill, S. S., Soppelsa, B. F., & West, G. K. (1982). Teaching ESL students to read and
    write experimental-research papers. TESOL Quarterly, 333-347.

    Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M., & Biocentre, M. I. (2008).
    Identifying Sections in Scientific Abstracts using Conditional Random Fields.

    Lin, J., Karakos, D., Demner-Fushman, D., & Khudanpur, S. (2006). Generative
    Content Models for Structural Analysis of Medical Abstracts. In Proceedings of the HLT/NAACL 2006 Workshop on Biomedical Natural Language Processing
    (BioNLP’06), pages 65–72, New York City, USA.

    Larry McKnight and Padmini Arinivasan. (2003). Categorization of sentence types in
    medical abstracts. In AMIA 2003 Symposium Proceedings, pages 440–444.

    Lau, H. H. (2004). The structure of academic journal abstracts written by Taiwanese
    PhD students. Taiwan Journal of TESOL, 1(1), 1-25.

    Naomi Graetz. (1985). Teaching EFL students to extract structural information from abstracts. In Jan M. Ulijn and Anthony K. Pugh, editors, Reading for Professional Purposed: Methods and Materials in Teaching Languages, pages 123–135. Acco, Leuven, Belgium.

    Patrick Ruch, Celia Boyer, Christine Chichester, Imad Tbahriti, Antoine Geissb¨uhler,
    Paul Fabry, Julien Gobeill, Violaine Pillet, Dietrich Rebholz-Schuhmann,
    Christian Lovis, and Anne-Lise Veuthey. (2007). Using argumentation to extract
    key sentences from biomedical abstracts. International Journal of Medical
    Informatics, 76(2–3):195–200.

    Salehi, J. D., Kurose, J. F., & Towsley, D. (1996). The effectiveness of affinity-based
    scheduling in multiprocessor network protocol processing (extended version). IEEE/ACM Transactions on Networking (TON), 4(4), 516-530.

    Swales, J.M. (1981). Aspects of article introductions: Language Studies Unit,
    University of Aston in Birmingham.

    Swales, J.M. (1990). Genre analysis: English in Academic and Research Settings.
    Cambridge University Press.

    Shimbo, M., Yamasaki, T., & Matsumoto, Y. (2003). Using sectioning information
    for text retrieval: a case study with the MEDLINE abstracts.

    Samraj, B. (2005). An exploration of a genre set: Research article abstracts and
    introductions in two disciplines. English for specific purposes, 24(2), 141-156.

    Teufel, S. (1999). Argumentative Zoning: Information Extraction from Scientific
    Text.Unpublished PhD thesis, University of Edinburgh.

    Teufel, S., & Moens, M. (2002). Summarizing Scientific Articles: Experiments with
    Relevance and Rhetorical Status. Computational Linguistics, 28(4), 409-445.

    Ulla Connor & Anna Mauranen (1999). Linguistic Analysis of Grant Proposals:
    EuropeanUnion Research Grants

    Wu, J. C., Chang, Y. C., Liou, H. C., & Chang, J. S. (2006). Computational analysis
    of move structures in academic abstracts.

    Yamamoto, Y., & Takagi, T. (2005). A sentence classification system for multi-document summarization in the biomedical domain.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE