研究生: |
卡拉度 Gerardo O. Figueroa |
---|---|
論文名稱: |
HYBRIDRANK: A COLLABORATION BETWEEN SUPERVISED AND UNSUPERVISED APPROACHES FOR KEYPHRASE EXTRACTION 混合式關鍵詞選取法 |
指導教授: |
陳宜欣
Chen, Yi-Shin |
口試委員: | Soo, Von-Wun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 英文 |
論文頁數: | 32 |
中文關鍵詞: | Keyword extraction 、Keyphrase extraction 、Supervised method 、Unsupervised method 、Hybrid |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Traditionally, keyphrases (or keywords) have been manually assigned to documents by
their authors or by human indexers. This, however, has become impractical due to
the massive growth of documents|particularly short articles (e.g. microblogs, abstracts,
snippets)|on the Internet each day, thus creating a need for systems that automatically
extract keyphrases from documents. Automatic keyphrase extraction methods have generally
taken either supervised or unsupervised approaches. Supervised methods extract
keyphrases by using a training document set, thus acquiring knowledge from a global
collection of texts. Conversely, unsupervised methods rank phrases by their importance
in a single-document context, without prior learning. We present a hybrid keyphrase
extraction method for short articles, HybridRank, which leverages the benets of both
approaches. Our system implements modied versions of the TextRank [6] (unsupervised)
and KEA [16] (supervised) methods, and applies a merging algorithm to produce
an overall list of keyphrases. We have tested HybridRank on more than 900 abstracts belonging
to a wide variety of subjects, including engineering, science, physics and IT, and
show its superior eectiveness. It is observed that knowledge collaboration between supervised
and unsupervised methods can produce higher-quality keyphrases than applying
these methods individually.
[1] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine.
Computer networks and ISDN systems, 30(1-7):107{117, 1998.
[2] E. Frank, G. W. Paynter, and I. H. Witten. Domain-specic keyphrase extraction.
IJCAI, 1999.
[3] A. Hulth. Improved automatic keyword extraction given more linguistic knowledge.
Proceedings of the 2003 conference on Empirical methods in natural language process-
ing, pages 216{223, 2003.
[4] Institute of Electrical and Electronics Engineers (IEEE). IEEE Xplore. http://
ieeexplore.ieee.org/. [Online; accessed March-2011].
[5] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated
corpus of english: The penn treebank. Computational Linguistics, 19(2):313{330,
1993.
[6] R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Proceedings of
EMNLP, pages 404{411, 2004.
[7] T. Nguyen and M. Kan. Keyphrase extraction in scientic publications. Proceedings
of ICADL2007, 2007.
[8] Princeton University. Wordnet: A lexical database for English. http://wordnet.
princeton.edu/. [Online; accessed March-2011].
[9] Ranks Webmaster Tools. English Stopwords. http://www.ranks.nl/resources/
stopwords.html. [Online; accessed March-2011].
[10] The Institution of Engineering and Technology. Inspec Direct. http://
inspecdirect-service.theiet.org/private/home.aspx. [Online].
[11] K. Toutanova and C. D. Manning. Enriching the knowledge sources used in a maximum
entropy part-of-speech tagger. pages 63{70, 2000.
[12] P. D. Turney. Coherent keyphrase extraction via web mining. Proceedings of the
Eighteenth Research Council, 1999.
31
[13] P. D. Turney. Learning algorithms for keyphrase extraction. Inf. Retr., 2(4):303{336,
2000.
[14] X. Wan and J. Xiao. Collabrank: towards a collaborative approach to singledocument
keyphrase extraction. Proceedings of the 22nd International Conference
on Computational Linguistics, 1:969{976, 2008.
[15] X. Wan, J. Yang, and J. Xiao. Towards an iterative reinforcement approach for
simultaneous document summarization and keyword extraction. Annual Meeting-
Association for Computational Linguistics, 45(1):552, 2007.
[16] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea:
practical automatic keyphrase extraction. DL '99: Proceedings of the fourth ACM
conference on Digital libraries, pages 254{255, 1999.
[17] H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement
principle and sentence clustering. Proceedings of SIGIR2003, pages 113{120, 2002.