簡易檢索 / 詳目顯示

研究生: 卡拉度
Gerardo O. Figueroa
論文名稱: HYBRIDRANK: A COLLABORATION BETWEEN SUPERVISED AND UNSUPERVISED APPROACHES FOR KEYPHRASE EXTRACTION
混合式關鍵詞選取法
指導教授: 陳宜欣
Chen, Yi-Shin
口試委員: Soo, Von-Wun
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2011
畢業學年度: 99
語文別: 英文
論文頁數: 32
中文關鍵詞: Keyword extractionKeyphrase extractionSupervised methodUnsupervised methodHybrid
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Traditionally, keyphrases (or keywords) have been manually assigned to documents by
    their authors or by human indexers. This, however, has become impractical due to
    the massive growth of documents|particularly short articles (e.g. microblogs, abstracts,
    snippets)|on the Internet each day, thus creating a need for systems that automatically
    extract keyphrases from documents. Automatic keyphrase extraction methods have generally
    taken either supervised or unsupervised approaches. Supervised methods extract
    keyphrases by using a training document set, thus acquiring knowledge from a global
    collection of texts. Conversely, unsupervised methods rank phrases by their importance
    in a single-document context, without prior learning. We present a hybrid keyphrase
    extraction method for short articles, HybridRank, which leverages the bene ts of both
    approaches. Our system implements modi ed versions of the TextRank [6] (unsupervised)
    and KEA [16] (supervised) methods, and applies a merging algorithm to produce
    an overall list of keyphrases. We have tested HybridRank on more than 900 abstracts belonging
    to a wide variety of subjects, including engineering, science, physics and IT, and
    show its superior e ectiveness. It is observed that knowledge collaboration between supervised
    and unsupervised methods can produce higher-quality keyphrases than applying
    these methods individually.


    Contents Summary 1 Acknowledgments 2 List of Tables 5 List of Figures 6 1 Introduction 7 2 Related Work 9 3 Background 12 3.1 The KEA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1.1 Candidate phrase generation . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.4 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 The TextRank Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2.1 Graph construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Framework 16 4.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Supervised Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2.1 Candidate phrase generation . . . . . . . . . . . . . . . . . . . . . . 17 4.2.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 4.2.4 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 Unsupervised Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3.1 Graph construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4.1 Keyphrase list merging . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4.2 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5 Experiments 24 5.1 The Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1.1 Document collections . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1.2 Corpora statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6 Conclusions and Future Work 30 Bibliography 32

    [1] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine.
    Computer networks and ISDN systems, 30(1-7):107{117, 1998.
    [2] E. Frank, G. W. Paynter, and I. H. Witten. Domain-speci c keyphrase extraction.
    IJCAI, 1999.
    [3] A. Hulth. Improved automatic keyword extraction given more linguistic knowledge.
    Proceedings of the 2003 conference on Empirical methods in natural language process-
    ing, pages 216{223, 2003.
    [4] Institute of Electrical and Electronics Engineers (IEEE). IEEE Xplore. http://
    ieeexplore.ieee.org/. [Online; accessed March-2011].
    [5] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated
    corpus of english: The penn treebank. Computational Linguistics, 19(2):313{330,
    1993.
    [6] R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. Proceedings of
    EMNLP, pages 404{411, 2004.
    [7] T. Nguyen and M. Kan. Keyphrase extraction in scienti c publications. Proceedings
    of ICADL2007, 2007.
    [8] Princeton University. Wordnet: A lexical database for English. http://wordnet.
    princeton.edu/. [Online; accessed March-2011].
    [9] Ranks Webmaster Tools. English Stopwords. http://www.ranks.nl/resources/
    stopwords.html. [Online; accessed March-2011].
    [10] The Institution of Engineering and Technology. Inspec Direct. http://
    inspecdirect-service.theiet.org/private/home.aspx. [Online].
    [11] K. Toutanova and C. D. Manning. Enriching the knowledge sources used in a maximum
    entropy part-of-speech tagger. pages 63{70, 2000.
    [12] P. D. Turney. Coherent keyphrase extraction via web mining. Proceedings of the
    Eighteenth Research Council, 1999.
    31
    [13] P. D. Turney. Learning algorithms for keyphrase extraction. Inf. Retr., 2(4):303{336,
    2000.
    [14] X. Wan and J. Xiao. Collabrank: towards a collaborative approach to singledocument
    keyphrase extraction. Proceedings of the 22nd International Conference
    on Computational Linguistics, 1:969{976, 2008.
    [15] X. Wan, J. Yang, and J. Xiao. Towards an iterative reinforcement approach for
    simultaneous document summarization and keyword extraction. Annual Meeting-
    Association for Computational Linguistics, 45(1):552, 2007.
    [16] I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. Kea:
    practical automatic keyphrase extraction. DL '99: Proceedings of the fourth ACM
    conference on Digital libraries, pages 254{255, 1999.
    [17] H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement
    principle and sentence clustering. Proceedings of SIGIR2003, pages 113{120, 2002.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE