研究生: |
魏展君 Wei, Chan-Chun |
---|---|
論文名稱: |
應用機器學習及情緒分析偵測憂鬱高風險之網路文章 Applying machine learning and sentiment analysis to detect online posts with high depressive tendencies |
指導教授: |
區國良
Ou, Kuo-Liang 唐文華 Tarng, Wern-Huar |
口試委員: |
吳怡珍
Wu, Yi-Chen 曾秋蓉 TSENG, CHIU-JUNG |
學位類別: |
碩士 Master |
系所名稱: |
竹師教育學院 - 學習科學與科技研究所 Institute of Learning Sciences and Technologies |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 58 |
中文關鍵詞: | 憂鬱 、自我傷害 、社群媒體 、文字探勘 、情緒分析 、機器學習 |
外文關鍵詞: | depression, self-harm, social media, text mining, sentiment analysis, machine learning |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
憂鬱的早期風險識別擔任自我傷害早期預防議題中的重要角色,相關研究發現許多人在發生自我傷害行為以前沒有接觸過心理健康服務,但是可能會通過社群媒體表達自身想法,使得應用文字探勘技術找出潛藏於社群媒體中的憂鬱高風險族群成為重要的議題。本研究提供助人工作者一個快速的自動化網路文章判讀工具,通過結合機器學習以及情緒分析技術篩選出潛藏於網路社群中的憂鬱高風險文章,以協助實務工作上能夠有效率地掌握網路中的高風險族群。本研究使用資料來自Dcard心情版,採專家效度驗證,憂鬱文章及一般文章共750篇,通過量化比較SVM、Naive Bayes、Random Forest三種機器學習分類模型之預測效能並以獨立測試集確保模型可移植性,且以質性分析方式探討憂鬱文章中特有的文字和情緒特徵。本研究成果未來可實際應用於學校輔導工作系統中,並延伸應用於行銷、社評等具有特定情緒傾向之領域。
Early detection of depression plays an important role in the early stage of self-harm prevention. Related researches reveal that many people do not receive any mental health services before their self-harm occurred, but they may express their negative thought via social media instead. Therefore, applying text mining technology to discover people with high depressive tendencies in social media becomes an emerging task for social workers. This thesis provides a method of helping social workers with a fast and automated online article filter, which employs machine learning and sentiment analysis techniques to detect online posts with high depressive tendencies hidden in the online community. Totally 750 expert-validated articles were collected from Dcard Online Bulletin Board of Mood. The performance of three machine learning classifiers: SVM, Naive Bayes, and Random Forest were also be compared quantitatively and qualitatively at the end. The results of this thesis can be applied to the school counseling service and extended to related fields with specific emotional tendencies, such as marketing and social opinions in the future.
參考文獻
Aggarwal, C. C., & Zhai, C. (2012). Mining text data: Springer Science & Business Media.
Aiserman, M., Braverman, E. M., & Rozonoer, L. (1964). Theoretical foundations of the potential function method in pattern recognition. Avtomat. i Telemeh, 25, 917-936.
Aman, S., & Szpakowicz, S. (2007). Identifying expressions of emotion in text. Paper presented at the International Conference on Text, Speech and Dialogue.
Andreevskaia, A., & Bergler, S. J. P. o. A.-H. (2008). When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. 290-298.
Binali, H., Wu, C., & Potdar, V. (2010). Computational approaches for emotion detection in text. Paper presented at the Proceedings of the IEEE international conference on digital ecosystems and technologies (DEST 2010).
Cavaioni, M. (Producer). (2017, 2 2). Retrieved from https://medium.com/machine-learning-bites/machine-learning-decision-tree-classifier-9eb67cad263e
Cournapeau, D. (2007).
Cruz, F. L., Troyano, J. A., Enríquez, F., Ortega, F. J., & Vallejo, C. G. J. E. S. w. A. (2013). ‘Long autonomy or long delay?’The importance of domain in opinion mining. 40(8), 3174-3184.
Day, M.-Y., & Lee, C.-C. (2016). Deep learning for financial sentiment analysis on finance news providers. Paper presented at the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).
Day, M.-Y., & Lin, Y.-D. (2017). Deep learning for sentiment analysis on google play consumer review. Paper presented at the 2017 IEEE International Conference on Information Reuse and Integration (IRI).
Dunlop, S. M., More, E., & Romer, D. (2011). Where do youth learn about suicides on the Internet, and what influence does this have on suicidal ideation? Journal of Child Psychology and Psychiatry, 52(10), 1073-1080.
Ellison, N. B., Steinfield, C., & Lampe, C. (2007). The benefits of Facebook “friends:” Social capital and college students’ use of online social network sites. Journal of Computer‐Mediated Communication, 12(4), 1143-1168.
Gómez, J. M. (2014). Language technologies for suicide prevention in social media. Paper presented at the Proceedings of the Workshop on Natural Language Processing in the 5th Information Systems Research Working Days (JISIC).
García-Rábago, H., Sahagún-Flores, J. E., Ruiz-Gómez, A., Sánchez-Ureña, G. M., Tirado-Vargas, J. C., & González-Gámez, J. G. (2010). Factores de riesgo, asociados a intento de suicidio, comparando factores de alta y baja letalidad. Revista de salud pública, 12, 713-721.
Hancock, J. T., Landrigan, C., & Silver, C. (2007). Expressing emotion in text-based communication. Paper presented at the Proceedings of the SIGCHI conference on Human factors in computing systems.
Hao, P.-Y., Chiang, J.-H., & Tu, Y.-K. (2007). Hierarchically SVM classification based on support vector clustering method and its application to document categorization. Expert Systems with applications, 33(3), 627-635.
Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Paper presented at the Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics.
HE KD, Z., & CHENG, Y. (2016). A research on text classification method based on improved TF-IDF algorithm. Journal of Guangdong University of Technology, 33(5), 49-53.
https://github.com/fxsjy/jieba.
https://www.opview.com.tw/.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.
Huang, Y.-H., Wei, L.-H., & Chen, Y.-S. (2017). Detection of the Prodromal Phase of Bipolar Disorder from Psychological and Phonological Aspects in Social Media. arXiv preprint arXiv:1712.09183.
Hunter, J. D. (2003).
Isometsä, E. (2001). Psychological autopsy studies–a review. European psychiatry, 16(7), 379-385.
Joachims, T. (1996). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Retrieved from
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Paper presented at the European conference on machine learning.
Jones, N. J., & Bennell, C. J. A. o. S. R. (2007). The development and validation of statistical prediction rules for discriminating between genuine and simulated suicide notes. 11(2), 219-233.
Kaya, T., & Bicen, H. (2016). The effects of social media on students’ behaviors; Facebook as a case study. Computers in Human Behavior, 59, 374-379.
Kim, S.-M., & Hovy, E. (2004). Determining the sentiment of opinions. Paper presented at the Proceedings of the 20th international conference on Computational Linguistics.
Ko, Y., Park, J., & Seo, J. (2004). Improving text categorization using the importance of sentences. Information processing & management, 40(1), 65-79.
Ko, Y., & Seo, J. (2000). Automatic text categorization by unsupervised learning. Paper presented at the Proceedings of the 18th conference on Computational linguistics-Volume 1.
Korde, V., & Mahender, C. N. (2012). Text classification and classifiers: A survey. International Journal of Artificial Intelligence & Applications, 3(2), 85.
Ku, L.-W., Liang, Y.-T., & Chen, H.-H. (2006). Tagging heterogeneous evaluation corpora for opinionated tasks. Paper presented at the Conference on Language Resources and Evaluation (LREC).
Lan, M., Tan, C. L., & Low, H.-B. (2006). Proposing a new term weighting scheme for text categorization. Paper presented at the AAAI.
Leenaars, A., Cantor, C., Connolly, J., EchoHawk, M., Gailiene, D., He, Z. X., . . . Rodriguez, M. J. T. C. J. o. P. (2000). Controlling the environment to prevent suicide: international perspectives. 45(7), 639-644.
Lin, g. (Producer). (2010, 1 6). Retrieved from https://zh.wikipedia.org/wiki/File:%E7%B0%A1%E5%96%AE%E7%9A%84%E8%B2%9D%E6%B0%8F%E7%B6%B2%E8%B7%AF%E4%BE%8B%E5%AD%90.jpg
Ma, B. L. W. H. Y., & Liu, B. (1998). Integrating classification and association rule mining. Paper presented at the Proceedings of the fourth international conference on knowledge discovery and data mining.
Mai, S.-C. (2017). 利用線上評論分析產品屬性–以化妝品為例. National Central University,
McKinney, W. (2008).
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.
Mejova, Y., & Srinivasan, P. (2011). Exploring Feature Definition and Selection for Sentiment Classifiers. Paper presented at the ICWSM.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. J. I. j. o. l. (1990). Introduction to WordNet: An on-line lexical database. 3(4), 235-244.
Mohammad, S., Dunne, C., & Dorr, B. (2009). Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. Paper presented at the Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2.
Moreno Gea, P., & Blanco Sánchez, C. J. P. c. (2012). Suicidio e Internet. Medidas preventivas y de actuación. 16.
Owen, G., Belam, J., Lambert, H., Donovan, J., Rapport, F., & Owens, C. (2012). Suicide communication events: Lay interpretation of the communication of suicidal ideation and intent. Social science & medicine, 75(2), 419-428.
Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., & Leenaars, A. (2010). Suicide note classification using natural language processing: A content analysis. Biomedical informatics insights, 3, BII. S4706.
Qiu, G., He, X., Zhang, F., Shi, Y., Bu, J., & Chen, C. J. E. S. w. A. (2010). DASA: dissatisfaction-oriented advertising based on sentiment analysis. 37(9), 6182-6191.
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
Quinlan, J. R. J. M. l. (1986). Induction of decision trees. 1(1), 81-106.
Read, J., & Carroll, J. (2009). Weakly supervised techniques for domain-independent sentiment classification. Paper presented at the Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion.
Ruder, T. D., Hatch, G. M., Ampanozi, G., Thali, M. J., & Fischer, N. (2011). Suicide announcement on Facebook. Crisis.
Rudestam, K. E. (1971). Stockholm and Los Angeles: A cross-cultural study of the communication of suicidal intent. Journal of Consulting and clinical Psychology, 36(1), 82.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
Sun, J. (2012). ‘Jieba’Chinese word segmentation tool. In.
Tan, S., Cheng, X., Ghanem, M. M., Wang, B., & Xu, H. (2005). A novel refinement approach for text categorization. Paper presented at the Proceedings of the 14th ACM international conference on Information and knowledge management.
Ting, S., Ip, W., & Tsang, A. H. (2011). Is Naive Bayes a good classifier for document classification. International Journal of Software Engineering and Its Applications, 5(3), 37-46.
Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th annual meeting on association for computational linguistics.
Vapnik, V. (2013). The nature of statistical learning theory: Springer science & business media.
WHO. (2002). Suicide prevention in Europe: World Health Organization.
Wu, C.-H., Chuang, Z.-J., & Lin, Y.-C. (2006). Emotion recognition from text using semantic labels and separable mixture models. ACM transactions on Asian language information processing (TALIP), 5(2), 165-183.
Xianghua, F., Guo, L., Yanyan, G., & Zhiqiang, W. J. K.-B. S. (2013). Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. 37, 186-195.
ZackWeinberg (Producer). (2012, 11 26). Retrieved from https://zh.wikipedia.org/wiki/File:Svm_separating_hyperplanes_(SVG).svg
林盈宏, 陳毅晟, 莊凱翔, & 許乙清. (2017). 結合 Jieba 與 k-NN 於政府開放資料集推薦之臉書文章. TANET2017 臺灣網際網路研討會, 938-943.
游和正, 黃挺豪, & 中文計算語言學期刊, 陳. J. (2012). 領域相關詞彙極性分析及文件情緒分類之研究. 17(4), 33-47.
趙妤瑄, & 王豐緒. (2017). 情緒詞權重計算與分類演算法對於情緒分析結果之影響-以臉書粉絲團議題分析為例. Electronic Commerce Studies, 15(2), 147-166.