學術論文摘要的自動文步分析｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	林映秀 Lin, Ying-Hsiu
論文名稱：	學術論文摘要的自動文步分析 Automatically Identify Moves in Academic Abstracts
指導教授：	張俊盛 Chang, Jason S.
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2009
畢業學年度：	97
語文別：	英文
論文頁數：	46
中文關鍵詞：	文步結構、摘要、特徵值、機器學習模型
外文關鍵詞：	Move Structure, Abstract, Feature, Maximum Entropy model
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本論文中，我們利用督導式機器學習方法，經由自動抽取論文摘要中重要的特徵值，自動建立文步標示與特徵值間的關係並以此訓練機器學習模型。以此機器學習模型可以自動標示論文摘要中的文步結構。
在訓練階段，我們利用少量人工標記文步的論文摘要，從中取出重要的特徵值，再訓練機器學習模型學習特徵值與文步之間的關係，以進行對論文摘要的自動文步分析。在執行階段，應用訓練好的機器學習模型，我們分別針對電腦科學領域與應用語言學領域標示研究者所撰寫的論文摘要文步，我們利用人工評估的方法，在兩個不同領域下，皆可達到平均超過８０%的正確率，顯示本方法可以成功標記論文摘要文步。

This paper presents a novel method for automatically identifying the move structure in academic abstracts to assist non-native speaker of English in academic writing. In our approach, we use a small set of manually tagged abstracts as training corpus and analyze the significant features. Maximum Entropy model (ME) is employed to classify the move structure in the given abstracts. It involves automatically learning of the syntactic features, and automatically building a statistical model. The proposed method outperforms the previous research with a significantly higher accuracy. Our methodology clearly shows that the ME could suitably model the abstract structure, and implies that a more flexible move tagger can be easily applied to different research domains using a small set of manually tagged abstracts.

摘要    I
ABSTRACT    II
ACKNOWLEDGEMENT    III
TABLE OF CONTENTS    IV
CHAPTER 1  INTRODUCTION    1
CHAPTER 2  RELATED WORK    5
2.1  MACROSTRUCTURE OF RAS    5
2.2  STRUCTURAL REPRESENTATION OF ABSTRACTS    7
2.3  IDENTIFYING MOVES AS TEXT CLASSIFICATION    8
CHAPTER 3  METHOD    11
3.1  PROBLEM STATEMENT    11
3.2  LEARNING MOVE SENTENCE RELATION    13
3.2.1  Collecting Abstracts from the web    13
3.2.2  Manually label abstracts with moves    14
3.2.3  Select the features for machine learning    16
3.2.4  Train a machine learning model    20
3.3  RUN-TIME AUTOMATIC MOVE-TAGGING    22
CHAPTER 4  EXPERIMENTAL SETTING    25
4.1  EXPERIMENTAL SETTING    25
4.2  EVALUATION METRICS    27
CHAPTER 5  EVALUATION RESULTS    31
5.1  EVALUATION RESULTS    31
5.2  DISCUSSION AND ERROR ANALYSIS    35
CHAPTER 6  CONCLUSION AND FUTURE WORK    39
REFERENCES    40
APPENDIX A –GUIDELINES FOR HUMAN ANNOTATION OF ABSTRACT  43

                                

ANSI. (1979). American national standard for writing abstracts. Z39.14-1979,
American National Standards Institute (ANSI).

Anthony, L. & Lashkia, G. V. (2003). Mover: A machine learning tool to assist in the
reading and writing of technical papers. IEEE Trans. Prof. Commun., 46, pp.
185-193.

Bhatia, V. K. (1993). Analysing genre: Language use in professional settings. Applied
Linguistics and Language Studies Series, London & NY: Longman.

Dos Santos, M. B. (1996). The textual organization of research paper abstracts in
applied linguistics., Text,16, 481-500.

Della Pietra, S., Della Pietra, V., Lafferty, J., Technol, R., & Brook, S. (1997).
Inducing features of random fields. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19(4), 380-393.

Edmundson, H. P. (1969). New Methods in Automatic Extracting. Journal of the
Association for Computing: Machinery, 16(2), 264-285.

Hill, S. S., Soppelsa, B. F., & West, G. K. (1982). Teaching ESL students to read and
write experimental-research papers. TESOL Quarterly, 333-347.

Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M., & Biocentre, M. I. (2008).
Identifying Sections in Scientific Abstracts using Conditional Random Fields.

Lin, J., Karakos, D., Demner-Fushman, D., & Khudanpur, S. (2006). Generative
Content Models for Structural Analysis of Medical Abstracts. In Proceedings of the HLT/NAACL 2006 Workshop on Biomedical Natural Language Processing
(BioNLP’06), pages 65–72, New York City, USA.

Larry McKnight and Padmini Arinivasan. (2003). Categorization of sentence types in
medical abstracts. In AMIA 2003 Symposium Proceedings, pages 440–444.

Lau, H. H. (2004). The structure of academic journal abstracts written by Taiwanese
PhD students. Taiwan Journal of TESOL, 1(1), 1-25.

Naomi Graetz. (1985). Teaching EFL students to extract structural information from abstracts. In Jan M. Ulijn and Anthony K. Pugh, editors, Reading for Professional Purposed: Methods and Materials in Teaching Languages, pages 123–135. Acco, Leuven, Belgium.

Patrick Ruch, Celia Boyer, Christine Chichester, Imad Tbahriti, Antoine Geissb¨uhler,
Paul Fabry, Julien Gobeill, Violaine Pillet, Dietrich Rebholz-Schuhmann,
Christian Lovis, and Anne-Lise Veuthey. (2007). Using argumentation to extract
key sentences from biomedical abstracts. International Journal of Medical
Informatics, 76(2–3):195–200.

Salehi, J. D., Kurose, J. F., & Towsley, D. (1996). The effectiveness of affinity-based
scheduling in multiprocessor network protocol processing (extended version). IEEE/ACM Transactions on Networking (TON), 4(4), 516-530.

Swales, J.M. (1981). Aspects of article introductions: Language Studies Unit,
University of Aston in Birmingham.

Swales, J.M. (1990). Genre analysis: English in Academic and Research Settings.
Cambridge University Press.

Shimbo, M., Yamasaki, T., & Matsumoto, Y. (2003). Using sectioning information
for text retrieval: a case study with the MEDLINE abstracts.

Samraj, B. (2005). An exploration of a genre set: Research article abstracts and
introductions in two disciplines. English for specific purposes, 24(2), 141-156.

Teufel, S. (1999). Argumentative Zoning: Information Extraction from Scientific
Text.Unpublished PhD thesis, University of Edinburgh.

Teufel, S., & Moens, M. (2002). Summarizing Scientific Articles: Experiments with
Relevance and Rhetorical Status. Computational Linguistics, 28(4), 409-445.

Ulla Connor & Anna Mauranen (1999). Linguistic Analysis of Grant Proposals:
EuropeanUnion Research Grants

Wu, J. C., Chang, Y. C., Liou, H. C., & Chang, J. S. (2006). Computational analysis
of move structures in academic abstracts.

Yamamoto, Y., & Takagi, T. (2005). A sentence classification system for multi-document summarization in the biomedical domain.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文