HYBRIDRANK: A COLLABORATION BETWEEN SUPERVISED AND UNSUPERVISED APPROACHES FOR KEYPHRASE EXTRACTION｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	卡拉度 Gerardo O. Figueroa
論文名稱：	HYBRIDRANK: A COLLABORATION BETWEEN SUPERVISED AND UNSUPERVISED APPROACHES FOR KEYPHRASE EXTRACTION 混合式關鍵詞選取法
指導教授：	陳宜欣 Chen, Yi-Shin
口試委員:	Soo, Von-Wun
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2011
畢業學年度：	99
語文別：	英文
論文頁數：	32
中文關鍵詞：	Keyword extraction 、Keyphrase extraction 、Supervised method 、Unsupervised method 、Hybrid
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Traditionally, keyphrases (or keywords) have been manually assigned to documents by
their authors or by human indexers. This, however, has become impractical due to
the massive growth of documents|particularly short articles (e.g. microblogs, abstracts,
snippets)|on the Internet each day, thus creating a need for systems that automatically
extract keyphrases from documents. Automatic keyphrase extraction methods have generally
taken either supervised or unsupervised approaches. Supervised methods extract
keyphrases by using a training document set, thus acquiring knowledge from a global
collection of texts. Conversely, unsupervised methods rank phrases by their importance
in a single-document context, without prior learning. We present a hybrid keyphrase
extraction method for short articles, HybridRank, which leverages the benets of both
approaches. Our system implements modied versions of the TextRank [6] (unsupervised)
and KEA [16] (supervised) methods, and applies a merging algorithm to produce
an overall list of keyphrases. We have tested HybridRank on more than 900 abstracts belonging
to a wide variety of subjects, including engineering, science, physics and IT, and
show its superior eectiveness. It is observed that knowledge collaboration between supervised
and unsupervised methods can produce higher-quality keyphrases than applying
these methods individually.

Contents
Summary 1
Acknowledgments 2
List of Tables 5
List of Figures 6
Introduction 7
Related Work 9
Background 12
1 The KEA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.1 Candidate phrase generation . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 The TextRank Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1 Graph construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Framework 16
1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Supervised Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Candidate phrase generation . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3
2.4 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Unsupervised Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Graph construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1 Keyphrase list merging . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Experiments 24
1 The Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.1 Document collections . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2 Corpora statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Conclusions and Future Work 30
Bibliography 32

                                

[1] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine.
Computer networks and ISDN systems, 30(1-7):107{117, 1998.
[2] E. Frank, G. W. Paynter, and I. H. Witten. Domain-specic keyphrase extraction.
IJCAI, 1999.
[3] A. Hulth. Improved automatic keyword extraction given more linguistic knowledge.
Proceedings of the 2003 conference on Empirical methods in natural language process-
ing, pages 216{223, 2003.
[4] Institute of Electrical and Electronics Engineers (IEEE). IEEE Xplore. http://
ieeexplore.ieee.org/. [Online; accessed March-2011].
[5] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated
corpus of english: The penn treebank. Computational Linguistics, 19(2):313{330,
1993.
[6] R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Proceedings of
EMNLP, pages 404{411, 2004.
[7] T. Nguyen and M. Kan. Keyphrase extraction in scientic publications. Proceedings
of ICADL2007, 2007.
[8] Princeton University. Wordnet: A lexical database for English. http://wordnet.
princeton.edu/. [Online; accessed March-2011].
[9] Ranks Webmaster Tools. English Stopwords. http://www.ranks.nl/resources/
stopwords.html. [Online; accessed March-2011].
[10] The Institution of Engineering and Technology. Inspec Direct. http://
inspecdirect-service.theiet.org/private/home.aspx. [Online].
[11] K. Toutanova and C. D. Manning. Enriching the knowledge sources used in a maximum
entropy part-of-speech tagger. pages 63{70, 2000.
[12] P. D. Turney. Coherent keyphrase extraction via web mining. Proceedings of the
Eighteenth Research Council, 1999.
31
[13] P. D. Turney. Learning algorithms for keyphrase extraction. Inf. Retr., 2(4):303{336,
2000.
[14] X. Wan and J. Xiao. Collabrank: towards a collaborative approach to singledocument
keyphrase extraction. Proceedings of the 22nd International Conference
on Computational Linguistics, 1:969{976, 2008.
[15] X. Wan, J. Yang, and J. Xiao. Towards an iterative reinforcement approach for
simultaneous document summarization and keyword extraction. Annual Meeting-
Association for Computational Linguistics, 45(1):552, 2007.
[16] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea:
practical automatic keyphrase extraction. DL '99: Proceedings of the fourth ACM
conference on Digital libraries, pages 254{255, 1999.
[17] H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement
principle and sentence clustering. Proceedings of SIGIR2003, pages 113{120, 2002.

無法下載圖示

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

QR CODE

相關論文