簡易檢索 / 詳目顯示

研究生: 胡家瑜
Hu, Jia-Yu
論文名稱: 追蹤進行中新聞議題產生事件主軸摘要
Monitoring the Progressive news topic with Storyline-based Summarization
指導教授: 林福仁
Lin, Fu-ran
口試委員:
學位類別: 碩士
Master
系所名稱: 科技管理學院 - 科技管理研究所
Institute of Technology Management
論文出版年: 2009
畢業學年度: 97
語文別: 英文
論文頁數: 81
中文關鍵詞: 新聞事件回顧漸進式分群事件緒摘要系統
外文關鍵詞: News topic retrospection, incremental clustering, event threading, summarization
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • This thesis proposes a method for monitoring a progressive news topic with storyline-based summarization. An ever-increasing volume of news for the progressive news topic, hence we need an incremental clustering to update clusters or events without frequently performing complete re-clustering.
    There are three major components of the news topic retrospection proposed in this paper. First, we use incremental clustering to identify events. Second, identify the relationship between events and evaluate the relevance of these events with the main storyline. Third, extract the representative sentences and cluster them by Chameleon to compose the summary under the main theme.
    Experimental results show that incremental clustering has good performance and quality, which can help the news readers comprehend the evolution of the progressive news topics.


    1 Introduction 1 1.1 Research Background 1 1.2 Research Motivation 3 1.3 Research Objectives 4 1.4 Thesis Framework 4 2 Literature Review 6 2.1 Topic Detection and Tracking 6 2.2 Clustering Algorithm 7 2.2.1 Hierarchical Clustering 8 2.2.2 Partitional Clustering 9 2.2.3 Incremental Clustering 11 2.3 Text Summarization 11 2.4 Self-organizing Maps 14 2.5 Chameleon 17 2.6 News Topic Retrospection 20 3 Research Methodology 23 3.1 Definition 23 3.2 System Framework 24 3.3 Preprocess 25 3.4 Event Identification 27 3.5 Main Storyline Construction 29 3.6 Storyline-based Summarization 33 4 System Implementation and Results 36 4.1 Data Source 36 4.2 System Implementation 37 4.3 System Results 39 5 Experimental Design 43 5.1 Experimental Design 43 5.2 Experimental results 45 5.2.1 The Comparison of Clustering Results 45 5.2.2 The Evaluation of Experimental Results by Human Subjects 52 5.3 Discussions 57 6 Conclusion and Future Work 59 References 61 Appendix 65

    Allan, J., Papka, R. and Lavrenko, V. (1998). On-line new event detection and tracking. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
    Allan, J. Topic Detection And Tracking: Event-based Information Organization.KluwerAcademic Publishers, 2002.
    Azcarraga, A.P. and Yap, T.N., Jr. (2001). Extracting meaningful labels for WEBSOM text archives. Proceedings of the tenth international conference on Information and knowledge management.
    Baeza-Yates, R. A. 1992. Introduction to data structures and algorithms related to information retrieval. In Information Retrieval:Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice Hall, Inc., Upper Saddle River, NJ, 13–27.
    Barzilay, R., Elhada, No. and Mckeown K.R. (2002). Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research 17 (2002) 35-55
    Berghel, H. (1997). Cyberspace 2000: Dealing with information overload. Communications of the ACM February 1997/Vol. 40, No. 2
    Can, F. (1993). Incremental clustering for dynamic information processing. ACM Transaction for Information Systems, 11, 143–164.
    Cole, A. J. & Wishart, D. (1970). An improved algorithm for the Jardine-Sibson method of generating overlapping clusters. The Computer Journal 13(2):156-163.
    Doran, W.P., Stokes, N., Newman, E., Dunnion, J. and Carthy, J. (2004) A hybrid statistical/linguistic model for generating news story gists. Proceedings of the 27th annual international conference on Research and development in information retrieval.
    Dubes, R. C. 1987. How many clusters are best?—an experiment. Pattern Recogn. 20, 6 (Nov. 1, 1987), 645–663.
    Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., & Xu, X. (1998). Incremental clustering for mining in a data warehousing environment. In Proceedings of the 24th international conference on very large data bases (VLDB) (pp. 323–333).
    Franz, M. and McCarley, J.S. (2001). Unsupervised and supervised clustering for topic tracking. In Topic Detection and Tracking Workshop
    Goldstein, J., Mittal, V., Carbonell, J. and Callan, J. (2000) Creating and evaluatingmulti-document sentence extract summaries. In Eighth International Conference on Information Knowledge Management (CIKM'00)
    Guha, S., R. Rastogi, et al. (2000). "Rock: A robust clustering algorithm for categorical attributes." Information Systems 25(5): 345-366.
    Guha, S., R. Rastogi, et al. (2001). "Cure: an efficient clustering algorithm for large databases." Information Systems 26(1): 35-58.
    Hatzivassiloglou, V., Gravano, L. and Maganti, A. (2000). An investigation of linguistic features and clustering algorithms for topical document clustering. In Proceedings of the 23rd Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-00), pages 224—231
    Ho, J., and Tang, R. "Towards an optimal resolution to information overload: an infomediary approach," Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work, 2001, pp. 91-96.
    Hovy, E. and Marcu, D. (1998) "Automated Text Summarization," Tutorial in COLING/ACL98, 1998.
    Hsueh, J.-F. (2002). Learning ontology from web documents for supporting web query. Master Dissertation, Department of Information Management, National Sun Yat-Sen University
    Hsu, C.-C., &Wang, S.-H. (2005). An integrated framework for visualized and exploratory pattern discovery in mixed data. IEEE Transactions on Knowledge and Data Engineering, 18(2), 161–173.
    Jardine, N. & Sibson, R. (1968). The construction of hierarchic and non-hierarchic classifications. The Computer Journal 11:177
    Jardine, N. and Rijsbergen C.J. Van (1971). The use of hierarchical clustering in information retrieval, Information Storage and Retrieval, 7:217-240
    Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Upper Saddle River, NJ.
    Karypis, G., E. H. Han, V. Kumar (1999). "Chameleon: A hierarchical clustering algorithm using dynamic modeling." IEEE Computer 32(8): 68-75.
    Kleinberg, J. (2002). Bursty and hierarchical structure in streams. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    KING, B. 1967. Step-wise clustering procedures. J. Am. Stat. Assoc. 69, 86–101.
    Ku, L.-W. (2000). A study on the multilingual topic detection of news articles. Master Dissertation, Department of Computer Science and Information Engineering, National Taiwan University
    Lin, C.-Y. and Hovy, E. (2000). The Automated Acquisition of Topic Signatures for Text Summarization. In Proceedings of the 17th conference on Computational linguistic
    Lin, F.-R. and Liang, C.-H. (2008) Storyline-based summarization for news topic retrospection, Decision Support Systems, 45(3), 473-490.
    Martin Franz, J. Scott McCarley, Todd Ward, Wei-Jing Zhu: Unsupervised and Supervised Clustering for Topic Tracking. SIGIR 2001: 310-317
    Mani. 2001. Audomatic Summarization. John Benjamins.
    M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. KDD-2000 Workshop on TextMining, August 2000.
    McKeown, K., R. J. Passonneau, et al. (2005). "Do summaries help?" Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval.
    Morinaga, S. and Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. International conference on knowledge discovery and data mining 2004
    Murtage, F. 1984. A survey of recent advances in hierarchical clustering algorithms which use cluster centers. Comput. J. 26, 354–359.
    Nallapati, R., Feng, A., Peng, F. and Allan J. (2004). Event Threading within News Topics. In Proceedings of the Conference on Information and Knowledge Management (CIKM)
    Nagy, G. 1968. State of the art in pattern recognition. Proc. IEEE 56, 836–862.
    Rijsbergen, C.J. van (1999). Information Retrieval 2nd edition. Butterworth press.
    Sahay, S. Study and Implementation of CHAMELEON algorithm for Gene Clustering, www-static.cc.gatech.edu/~ssahay/7001Report.pdf.
    Salton, G., Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management Vol. 24, No. 5, pages. 513—523
    Shih, J.-Y., Chang, Y.-J, Chen, W.-H., Ho, J.-H. and Kao, C.-Y. (2004) Constructing securities and futures markets legal maps of Taiwan using GHSOM. International conference on digital archive technologies.
    Senath P. H. A. amd Sokal, R. R. 1973. Numerical Taxonomy. Freeman, London, UK.
    Smith, D.A. (2002). Detecting and browsing events in unstructured text. In Proceedings of the 25th Annual ACM SIGIR Conference, pages 73—80.
    Somlo, G.-L. & Adele, E.-H. (2001). Incremental clustering for profile maintenance in information gathering web agents. In Proceedings of the fifth international conference on autonomous agents (pp. 262–269).
    Swan, R. and Allan, J. (2000). Automatic generation of overview timelines, Technical Report IR-198, University of Massachusetts, Department of Computer Science (CIIR)
    Uramoto, N. and Takeda, K. (1998). A Method for Relateing Multiple Newspaper Articles by Using Graph, and Its Appliation to Webcasting. Proceedings of 36th conference on Association for computational linguistics.
    Ward, J. H. JR. 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244.
    Wayne, C. L. (2000). Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation. Conference of Language resources and evaluation
    Wei, C.-P. and Lee, Y.-H. (2003). Event detection from online news documents for supporting environmental scanning. Decision Support Systems 36 (2004) ,pages 385—401
    Witten, A. Moffat, and T.C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. VanNos-trand Reinhold, New York, NY, 1994.
    Yang, Y., Pierce, T. and Carbonell, J. (1998). A Study on Retrospective and On-Line Event Detection. In Proceedings of the Document Understanding Workshop (DUC)

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE