簡易檢索 / 詳目顯示

研究生: 朱瑞琪
Zhu-Chi Chu
論文名稱: The Summarization of Chinese News Articles by Temporal or Themed Sequences
摘要中文新聞之報導-以時間或主題排序
指導教授: 林福仁
Fu-ran Lin
口試委員:
學位類別: 碩士
Master
系所名稱: 科技管理學院 - 科技管理研究所
Institute of Technology Management
論文出版年: 2008
畢業學年度: 96
語文別: 英文
論文頁數: 82
中文關鍵詞: Text summarizationintra-paragraphinter-paragraphtemporalthemednews topic summarization
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Most of summarization can extract important sentences, but few of them concern the readability. This thesis proposes a summarization system which considers the sentences coherence and orders the sentences by the news features to facilitate readers to comprehend the news topics.
    There are three major components of the summarization system proposed in this thesis. First, the event clustering module identifies the events by Self-Organized Map (SOM) and the episodes by Chameleon in every event. Second, the intra-paragraph sequencing module extracts the features of every event in a news topic, and selects the composition strategy either in temporal, themed, or hybrid to compose sentences for an event as a paragraph. Third, the inter-paragraph sequencing module orders the paragraphs and calculates the topic temporal dependence to decide inter-paragraph sequence. It can order inter-paragraph by temporal or by themed based on the feature of topic temporal dependence.
    Experimental results show that different users may prefer different summaries using different composition methods, and there is a need of the mechanism to order sentences by different methods and choose suitable methods depending on the event’s features either in temporal, themed sequence, or both.


    Table of Contents iv Table of Figures vi Table of Tables vii 1 Introduction 1 1.1 Research Background 1 1.2 Research Motivation 2 1.3 Research Objectives 3 1.4 Thesis Framework 3 2 Literature Review 5 2.1 Summarization System 5 2.1.1 Typical 6 2.1.2 Storyline-based 7 2.1.3 Graph-based 7 2.1.4 Ontology-based 8 2.1.5 Relationship-based 8 2.1.6 Chinese-summarization 9 2.2 Self-organizing Maps(SOM) 10 2.3 Chameleon 13 2.4 Summarization by Informative and Event Words 16 3 System Framework and Methodology 21 3.1 Definition 21 3.2 Research Framework 24 3.3 Preprocess 26 3.4 Event and Episode Identification 27 3.5 Extract Event Features 31 3.6 Ordering Inter-paragraph and Intra-paragraph 35 4 System Implementation and Results 38 4.1 System Implementation 38 4.2 News Topic Summarization Results 39 5 Experimental Design and Results 44 5.1 Experimental Design 44 5.2 Experimental Results 45 5.2.1 Intra-paragraph Results 45 5.2.2 Inter-paragraph Results 49 5.3 Discussions 50 6 Conclusion and Future work 52 6.1 Conclusion 52 6.2 Research Limitation 53 6.3 Future Work 54 References 55 Appendix A. Examples of NPD 58 Appendix B. Summarization Results 74 Appendix C. Snapshot of the User Interface in Experimentation 81

    Aonet, C., M. E. Okurowski, et al. (1997). "A Scalable Summarization System Using Robust NLP." In Proceedings of the workshop on intelligent scalable text summarization at the 35th meeting of the association for computional linguistics, and the 8th conference of the European chapter of the association for computional linguistics(pp. 66-73).
    Bollegala, D., N. Okazaki, and M. Ishizuka (2006). "A bottom-up approach to sentence ordering for multi-document summarization." Proceedings of COLING/ACL.
    Chandrasekaran, B., J.R.Josephson, and V.R. Benjamins. (1999). "What Are Ontologies, and Why Do We Need Them?" IEEE Intelligent Systems 14(1): 20-26.
    Chen, H. H., et al. (2003). "A summarization system for Chinese news from multiple sources." Journal of the American Society for Information Science and Technology 54(13): 1224-1236.
    Goldstein, J., V. Mittal, et al. (2000). "Multi-document summarization by sentence extraction." NAACL-ANLP 2000 Workshop on Automatic summarization - Volume 4.
    Gong, Y. and X. Liu (2001). "Generic text summarization using relevance measure and latent semantic analysis." Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.
    Gruber, T. (1992). "Ontology Definition." from http://www-ksl.Stanford.edu/kst/what-is-an-ontology.html.
    Guha, S., R. Rastogi, et al. (2000). "Rock: A robust clustering algorithm for categorical attributes." Information Systems 25(5): 345-366.
    Guha, S., R. Rastogi, et al. (2001). "Cure: an efficient clustering algorithm for large databases." Information Systems 26(1): 35-58.
    Han, J. and M. Kamber (2006). Data Mining: Concepts and Techniques, Morgan Kaufmann.
    Harabagiu, S. and F. Lacatusu (2005). "Topic themes for multi-document summarization." Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval: 202-209.
    Hsueh, J. F. (2003). "Learning ontology from Web documents for supporting Web query."
    Karypis, G., E. H. Han, V. Kumar (1999). "Chameleon: A hierarchical clustering algorithm using dynamic modeling." IEEE Computer 32(8): 68-75.
    Kohonen, T. (2001). Self-Organizing Maps, Springer.
    Kuo, J.-J. and H.-H. Chen (2008). "Multidocument Summary Generation: Using Informative and Event Words." ACM Transactions on Asian Language Information Processing (TALIP) 7(1): 1-23.
    Lin, F. and C. H. Liang (2008). "Storyline-based summarization for news topic retrospection." Decision Support Systems 45(3): 473-490.
    McKeown, K., R. J. Passonneau, et al. (2005). "Do summaries help?" Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval.
    Mihalcea, R. and P. Tarau (2005). "A Language Independent Algorithm for Single and Multiple Document Summarization." In Proceedings of IJCNLP2005.
    Okazaki, N., Y. Matsuo, and M. Ishizuka (2004). "Improving chronological sentence ordering by precedence relation." Proceedings of 20th International Conference on Computational Linguistics (COLING 04): 750–756.
    Radev, D. R., H. Jing, et al. (2004). "Centroid-based summarization of multiple documents." Information Processing and Management 40(6).
    Radev, D. R., Z. Zhang, J. Otterbacher (2004). "Cross-document relationship classification for text summarization." Association for Computational Linguistics.
    Sahay, S. Study and Implementation of CHAMELEON algorithm for Gene Clustering, www-static.cc.gatech.edu/~ssahay/7001Report.pdf.
    Salton, G. and C. Buckley (1988). "Term-weighting approaches in automatic text retrieval." Information Processing and Management: an International Journal 24(5): 513-523.
    Salton, G. and M. J. McGill "Introduction to Modern Information Retrieval."
    Tan, P. N., M. Steinbach, V. Kumar (2005). Introduction to Data Mining, Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
    Van Rijsbergen, C. J. (1979). Information Retrieval, Butterworth-Heinemann Newton, MA, USA.
    Wan, X. and J. Yang (2007). "CollabSum: exploiting multiple document clustering for collaborative single document summarizations." Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval.
    Wang, G. B. D. and Z. D. Y. Zhu (2005). "Automatic Chinese Summarization Method Based on the HowNet and Clustering Algorithm." Journal of Chinese Information Processing.
    Wang, G. B. D. and Z. D. Y. Zhu (2005). "Automatic Chinese Text Summarization System Based on Conceptual Vector Space Model." Journal of Chinese Information Processing.
    Wu(吳家威), G. J. W. and J. L. Liou(劉昭麟) (2002). An Ontology-Based Article Summarization System. 2002 民生電子研討會論文集: 41-46.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE