應用文字探勘之自動化新聞文本分析以探討社會對新聞事件之反應

簡易檢索 / 詳目顯示

回結果列表

研究生：	戴瑜廷 Tai, Yu Ting
論文名稱：	應用文字探勘之自動化新聞文本分析以探討社會對新聞事件之反應 Automatic Content Analysis Using Text Mining to Investigate How News Events Trigger the Response of Society
指導教授：	林福仁 Lin, Fu Ren
口試委員:	雷松亞 Ray, Soumya 徐茉莉 Shmueli, Galit
學位類別：	碩士 Master
系所名稱：	科技管理學院 - 服務科學研究所 Institute of Service Science
論文出版年：	2015
畢業學年度：	103
語文別：	英文
論文頁數：	90
中文關鍵詞：	新聞摘要、文字探勘、文本分析、食品安全、焦點訪談、社會學習
外文關鍵詞：	Automatic Content Analysis, Focused Conversation Method, ORID
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來食品安全問題層出不窮，接連爆發塑化劑、毒澱粉、假油一連串的事件。然而，相關報導的數量龐大，一般民眾難以有效閱讀完所有資訊；再者，一連串的事件都與食品安全相關，社會是否從過去事件中學習到經驗，並在類似事件發生時做出不同的因應也是值得探討的議題。
但新聞閱聽者難以從非結構化的訊息中了解事件之間社會反應的差異，因此本研究的目的在於自動化分析同一主題的多個事件，探討社會對新聞事件的反應。
本研究旨在提出一個自動化的文本分析系統，自動分析隸屬同一主題的多個新聞事件。首先，本研究透過分群技術(Clustering)，以事件發展階段及利害關係人二維向度，呈現各利害關係人在事件各階段的言論內容。再者，系統將透過摘要技術(summarization)萃取事件發展重點以提供單一事件發展的新聞摘要。最後，以焦點訪談法(ORID)衡量系統的有效性，並同時探索讀者對於事件的反應。
藉由本研究提出的自動化文本分析系統，一般民眾可以更快速及有效的了解新聞事件的發展，回顧事件發生當下的感受、想法與行動。

In recent years, the crisis of food safety events continued happened in interval. There are three main food safety events, in sequence, “Plasticizer”, “Poison starch” and “Fake oil”. However, the related news reports are too enormous to be digested efficiently by the readers. In addition, it’s interested to know if similar events happen again, would they learn something from the past experiences and responds in a different way.
This study aimed to propose a system that can automatic analyze the related news belonging to the same topic. First, this study presents the opinions of each stakeholder on each period of the news development by clustering. Second, this system extracts the important content of news reports using summarization and provides the summarization of each news event to readers. Finally, this study combines the system with Focused Conversation Method (ORID) to evaluate the effective of the system and to explore the response of readers to the news events.
With the facility of the system that we proposed, the readers can understand the development of news event efficiently and recall their feeling, thought, and reaction for the news events at the moment that the event happened.

Chapter 1    Introduction    1
1 Research Background    1
2 Research Motivation    3
3 Research Objectives    4
Chapter 2    Literature Review    5
1 Automatic Content Analysis    5
2 Text Summarization    6
3 Clustering Algorithm    8
3.1 Hierarchical Cluster Analysis (HCA)    8
3.2 Other Clustering Methods    9
4 Focused Conversation Method (ORID)    11
Chapter 3    System Framework and Methodology    14
1 Definition    15
2 System Architecture    16
3 Data Acquisition    18
4    Preprocessing    19
4.1 Word segmentation    19
4.2 Term Aggregation    19
4.3 Feature Selection    21
5 Opinion Extraction    22
6 Clustering    27
7 Summarization    28
8 Content Analysis    28
Chapter 4    System Implementation and Results    30
1 Data Source    30
2 System Implementation    30
3 Results    33
Chapter 5    Evaluation and Results    39
1 Evaluation Design    39
2 Evaluation Results    42
2.1 The understanding of news events    42
2.2 The change of response of each reader for three events    44
2.3 The change of response of stakeholders for three events    46
3 Discussions    50
Chapter 6    Conclusion and Future Work    51
References    53
Appendix A. Contents Presented to Subject in Round 1    57
Appendix B. Summarization Results and Contents Presented to Subject in Round 2    60
Appendix C. ORID Interview Transcript in Round 1    64
Appendix D. ORID Interview Transcript in Round 2    72
Appendix E. Opinions of Stakeholders Cross Three Events    84

                                

Alguliev, R. M., Aliguliyev, R. M., & Mehdiyev, C. A. (2011). Sentence selection for generic document summarization using an adaptive differential evolution algorithm. Swarm and Evolutionary Computation, 1(4), 213-222.
Allan, J., Gupta, R., & Khandelwal, V. (2001, September). Temporal summaries of new topics. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 10-18). ACM.
Baptiste, N. (1995). Professional development Always growing and learning: The ORID—A technique to enhance communication. Early Childhood Education Journal, 22(4), 39-40.
Berghel, H. (1997). Cyberspace 2000: Dealing with information overload. Communications of the ACM, 40(2), 19-24.
Cheney, D. (2013). Text mining newspapers and news content: new trends and research methodologies.
Chang, Y. H., Chang, C. Y., & Tseng, Y. H. (2010). Trends of science education research: An automatic content analysis. Journal of Science Education and Technology, 19(4), 315-331.
Carbonell, J., & Goldstein, J. (1998, August). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 335-336). ACM.
Feldman, R., & Sanger, J. (2007). The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press.
Hsu, C. H. (2004). Automatically Constructing Ontology on Semantic Web (Doctoral dissertation, MS thesis, Fu Jen Catholic University, Taiwan).
Hu, J. Y. (2009). 追蹤進行中新聞議題產生事件主軸摘要. 清華大學科技管理研究所學位論文, 1-81.
Han, J., Kamber, M., & Pei, J. (2011). Data mining: concepts and techniques: concepts and techniques. Elsevier.
Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. Mis Quarterly, 28(1), 75-105.
Hsieh, H. F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative health research, 15(9), 1277-1288.
Ilango, M. R., & Mohan, V. (2010). A survey of grid based clustering algorithms. International Journal of Engineering Science and Technology, 2(8), 3441-3446.
King, B. (1967). Step-wise clustering procedures. Journal of the American Statistical Association, 62(317), 86-101.
Kriegel, H. P., Kröger, P., Sander, J., & Zimek, A. (2011). Density‐based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(3), 231-240.
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
Lin, F. R., & Liang, C. H. (2008). Storyline-based summarization for news topic retrospection. Decision Support Systems, 45(3), 473-490.
Lai, Y. S., & Wang, R. J. (2003, October). Towards automatic knowledge acquisition from text based on ontology-centric knowledge representation and acquisition. In Proceeding of the SemAnnot 2003 Workshop.
Mani, I. (2001, October). Recent developments in text summarization. In Proceedings of the tenth international conference on Information and knowledge management (pp. 529-531). ACM.
McKeown, K. R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J. L., Nenkova, A., ... & Sigelman, S. (2002, March). Tracking and summarizing news on a daily basis with Columbia's Newsblaster. In Proceedings of the second international conference on Human Language Technology Research (pp. 280-285). Morgan Kaufmann Publishers Inc..Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational linguistics, 28(4), 399-408.
Mani, I., & Maybury, M. T. (Eds.). (1999). Advances in automatic text summarization (Vol. 293). Cambridge, MA: MIT press.
Moretti, F., van Vliet, L., Bensing, J., Deledda, G., Mazzi, M., Rimondini, M., ... & Fletcher, I. (2011). A standardized approach to qualitative content analysis of focus group discussions from different countries. Patient education and counseling, 82(3), 420-428.
Radev, D. R., & Fan, W. (2000, October). Automatic summarization of search engine hit lists. In Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 11 (pp. 99-109). Association for Computational Linguistics.
Radev, D. R., Jing, H., Styś, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing & Management, 40(6), 919-938.
Radev, D., Otterbacher, J., Winkel, A., & Blair-Goldensohn, S. (2005). NewsInEssence: summarizing online news topics. Communications of the ACM, 48(10), 95-98.
Schilling, J. (2006). On the pragmatics of qualitative assessment. European Journal of Psychological Assessment, 22(1), 28-37.
Spee, J. C. (2005). Using focused conversation in the classroom. Journal of Management Education, 29(6), 833-851.
Spangler, W. D., Gupta, A., Kim, D. H., & Nazarian, S. (2012). Developing and validating historiometric measures of leader individual differences by computerized content analysis of documents. The Leadership Quarterly, 23(6), 1152-1172.
Stanfield, R. B. (2000). The art of focused conversation. Gabriola Island, BC: New Society Publishers, 17-29.
Salvador, S., & Chan, P. (2004, November). Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on (pp. 576-584). IEEE.
Sneath, P. H., & Sokal, R. R. (1973). Numerical taxonomy. The principles and practice of numerical classification.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
Ward Jr, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301), 236-244.
Wang, W. M., Cheung, C. F., Lee, W. B., & Kwok, S. K. (2008). Mining knowledge from natural language texts using fuzzy associated concept mapping. Information Processing & Management, 44(5), 1707-1719.
Wu, S. H., Day, M. Y., Tsai, T. H., & Hsu, W. L. (2002). FAQ-centered organizational memory. In Knowledge Management and Organizational Memories (pp. 103-112). Springer US.
Xue, N. (2003). Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 8(1), 29-48.
Yang, Y. (1995, July). Noise reduction in a statistical approach to text categorization. In Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 256-263). ACM.
Zou, F., Wang, F. L., Deng, X., Han, S., & Wang, L. S. (2006, April). Automatic construction of Chinese stop word list. In Proceedings of the 5th WSEAS international conference on Applied computer science (pp. 1010-1015).

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文