簡易檢索 / 詳目顯示

研究生: 魏展君
Wei, Chan-Chun
論文名稱: 應用機器學習及情緒分析偵測憂鬱高風險之網路文章
Applying machine learning and sentiment analysis to detect online posts with high depressive tendencies
指導教授: 區國良
Ou, Kuo-Liang
唐文華
Tarng, Wern-Huar
口試委員: 吳怡珍
Wu, Yi-Chen
曾秋蓉
TSENG, CHIU-JUNG
學位類別: 碩士
Master
系所名稱: 竹師教育學院 - 學習科學與科技研究所
Institute of Learning Sciences and Technologies
論文出版年: 2019
畢業學年度: 108
語文別: 中文
論文頁數: 58
中文關鍵詞: 憂鬱自我傷害社群媒體文字探勘情緒分析機器學習
外文關鍵詞: depression, self-harm, social media, text mining, sentiment analysis, machine learning
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 憂鬱的早期風險識別擔任自我傷害早期預防議題中的重要角色,相關研究發現許多人在發生自我傷害行為以前沒有接觸過心理健康服務,但是可能會通過社群媒體表達自身想法,使得應用文字探勘技術找出潛藏於社群媒體中的憂鬱高風險族群成為重要的議題。本研究提供助人工作者一個快速的自動化網路文章判讀工具,通過結合機器學習以及情緒分析技術篩選出潛藏於網路社群中的憂鬱高風險文章,以協助實務工作上能夠有效率地掌握網路中的高風險族群。本研究使用資料來自Dcard心情版,採專家效度驗證,憂鬱文章及一般文章共750篇,通過量化比較SVM、Naive Bayes、Random Forest三種機器學習分類模型之預測效能並以獨立測試集確保模型可移植性,且以質性分析方式探討憂鬱文章中特有的文字和情緒特徵。本研究成果未來可實際應用於學校輔導工作系統中,並延伸應用於行銷、社評等具有特定情緒傾向之領域。


    Early detection of depression plays an important role in the early stage of self-harm prevention. Related researches reveal that many people do not receive any mental health services before their self-harm occurred, but they may express their negative thought via social media instead. Therefore, applying text mining technology to discover people with high depressive tendencies in social media becomes an emerging task for social workers. This thesis provides a method of helping social workers with a fast and automated online article filter, which employs machine learning and sentiment analysis techniques to detect online posts with high depressive tendencies hidden in the online community. Totally 750 expert-validated articles were collected from Dcard Online Bulletin Board of Mood. The performance of three machine learning classifiers: SVM, Naive Bayes, and Random Forest were also be compared quantitatively and qualitatively at the end. The results of this thesis can be applied to the school counseling service and extended to related fields with specific emotional tendencies, such as marketing and social opinions in the future.

    目錄 1. 研究背景與動機 5 1.1. 研究目的 7 1. 透過情緒辭典建立憂鬱高風險文章預測模型 8 2. 透過專家知識建立憂鬱高風險文章預測模型 8 3. 整合情緒辭典及專家知識建立之憂鬱高風險文章預測模型 8 1.2. 研究範圍 8 2. 相關研究 9 2.1. 自我傷害傾向中的情緒因素 9 2.2. 文字情緒的分類系統 10 2.3. 建立情緒辭典的方法 12 2.4. 機器學習應用於文字探勘 14 2.4.1. 特徵工程 15 2.5. 分類演算法 17 3. 研究方法 24 3.1. 研究流程 24 3.2. 建立資料集 26 3.2.1. 來源與蒐集 26 3.2.2. 專家標註 28 3.3. Jieba斷詞系統 30 3.4. 文字情緒分析 31 3.5. 憂鬱詞彙分析 32 3.6. 訓練及整合預測模型 34 4. 研究結果及分析 35 4.1. 基於情緒辭典之憂鬱高風險文章預測模型 36 4.2. 基於專家知識之憂鬱高風險文章預測模型 38 4.3. 整合情緒辭典及專家知識之憂鬱高風險文章預測模型 41 5. 結論及未來應用 46 5.1. 結論 46 5.2. 未來應用 48 參考文獻 49 附錄:模型篩選文章範例 56

    參考文獻
    Aggarwal, C. C., & Zhai, C. (2012). Mining text data: Springer Science & Business Media.
    Aiserman, M., Braverman, E. M., & Rozonoer, L. (1964). Theoretical foundations of the potential function method in pattern recognition. Avtomat. i Telemeh, 25, 917-936.
    Aman, S., & Szpakowicz, S. (2007). Identifying expressions of emotion in text. Paper presented at the International Conference on Text, Speech and Dialogue.
    Andreevskaia, A., & Bergler, S. J. P. o. A.-H. (2008). When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. 290-298.
    Binali, H., Wu, C., & Potdar, V. (2010). Computational approaches for emotion detection in text. Paper presented at the Proceedings of the IEEE international conference on digital ecosystems and technologies (DEST 2010).
    Cavaioni, M. (Producer). (2017, 2 2). Retrieved from https://medium.com/machine-learning-bites/machine-learning-decision-tree-classifier-9eb67cad263e
    Cournapeau, D. (2007).
    Cruz, F. L., Troyano, J. A., Enríquez, F., Ortega, F. J., & Vallejo, C. G. J. E. S. w. A. (2013). ‘Long autonomy or long delay?’The importance of domain in opinion mining. 40(8), 3174-3184.
    Day, M.-Y., & Lee, C.-C. (2016). Deep learning for financial sentiment analysis on finance news providers. Paper presented at the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).
    Day, M.-Y., & Lin, Y.-D. (2017). Deep learning for sentiment analysis on google play consumer review. Paper presented at the 2017 IEEE International Conference on Information Reuse and Integration (IRI).
    Dunlop, S. M., More, E., & Romer, D. (2011). Where do youth learn about suicides on the Internet, and what influence does this have on suicidal ideation? Journal of Child Psychology and Psychiatry, 52(10), 1073-1080.
    Ellison, N. B., Steinfield, C., & Lampe, C. (2007). The benefits of Facebook “friends:” Social capital and college students’ use of online social network sites. Journal of Computer‐Mediated Communication, 12(4), 1143-1168.
    Gómez, J. M. (2014). Language technologies for suicide prevention in social media. Paper presented at the Proceedings of the Workshop on Natural Language Processing in the 5th Information Systems Research Working Days (JISIC).
    García-Rábago, H., Sahagún-Flores, J. E., Ruiz-Gómez, A., Sánchez-Ureña, G. M., Tirado-Vargas, J. C., & González-Gámez, J. G. (2010). Factores de riesgo, asociados a intento de suicidio, comparando factores de alta y baja letalidad. Revista de salud pública, 12, 713-721.
    Hancock, J. T., Landrigan, C., & Silver, C. (2007). Expressing emotion in text-based communication. Paper presented at the Proceedings of the SIGCHI conference on Human factors in computing systems.
    Hao, P.-Y., Chiang, J.-H., & Tu, Y.-K. (2007). Hierarchically SVM classification based on support vector clustering method and its application to document categorization. Expert Systems with applications, 33(3), 627-635.
    Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. Paper presented at the Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics.
    HE KD, Z., & CHENG, Y. (2016). A research on text classification method based on improved TF-IDF algorithm. Journal of Guangdong University of Technology, 33(5), 49-53.
    https://github.com/fxsjy/jieba.
    https://www.opview.com.tw/.
    Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.
    Huang, Y.-H., Wei, L.-H., & Chen, Y.-S. (2017). Detection of the Prodromal Phase of Bipolar Disorder from Psychological and Phonological Aspects in Social Media. arXiv preprint arXiv:1712.09183.
    Hunter, J. D. (2003).
    Isometsä, E. (2001). Psychological autopsy studies–a review. European psychiatry, 16(7), 379-385.
    Joachims, T. (1996). A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. Retrieved from
    Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Paper presented at the European conference on machine learning.
    Jones, N. J., & Bennell, C. J. A. o. S. R. (2007). The development and validation of statistical prediction rules for discriminating between genuine and simulated suicide notes. 11(2), 219-233.
    Kaya, T., & Bicen, H. (2016). The effects of social media on students’ behaviors; Facebook as a case study. Computers in Human Behavior, 59, 374-379.
    Kim, S.-M., & Hovy, E. (2004). Determining the sentiment of opinions. Paper presented at the Proceedings of the 20th international conference on Computational Linguistics.
    Ko, Y., Park, J., & Seo, J. (2004). Improving text categorization using the importance of sentences. Information processing & management, 40(1), 65-79.
    Ko, Y., & Seo, J. (2000). Automatic text categorization by unsupervised learning. Paper presented at the Proceedings of the 18th conference on Computational linguistics-Volume 1.
    Korde, V., & Mahender, C. N. (2012). Text classification and classifiers: A survey. International Journal of Artificial Intelligence & Applications, 3(2), 85.
    Ku, L.-W., Liang, Y.-T., & Chen, H.-H. (2006). Tagging heterogeneous evaluation corpora for opinionated tasks. Paper presented at the Conference on Language Resources and Evaluation (LREC).
    Lan, M., Tan, C. L., & Low, H.-B. (2006). Proposing a new term weighting scheme for text categorization. Paper presented at the AAAI.
    Leenaars, A., Cantor, C., Connolly, J., EchoHawk, M., Gailiene, D., He, Z. X., . . . Rodriguez, M. J. T. C. J. o. P. (2000). Controlling the environment to prevent suicide: international perspectives. 45(7), 639-644.
    Lin, g. (Producer). (2010, 1 6). Retrieved from https://zh.wikipedia.org/wiki/File:%E7%B0%A1%E5%96%AE%E7%9A%84%E8%B2%9D%E6%B0%8F%E7%B6%B2%E8%B7%AF%E4%BE%8B%E5%AD%90.jpg
    Ma, B. L. W. H. Y., & Liu, B. (1998). Integrating classification and association rule mining. Paper presented at the Proceedings of the fourth international conference on knowledge discovery and data mining.
    Mai, S.-C. (2017). 利用線上評論分析產品屬性–以化妝品為例. National Central University,
    McKinney, W. (2008).
    Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.
    Mejova, Y., & Srinivasan, P. (2011). Exploring Feature Definition and Selection for Sentiment Classifiers. Paper presented at the ICWSM.
    Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. J. I. j. o. l. (1990). Introduction to WordNet: An on-line lexical database. 3(4), 235-244.
    Mohammad, S., Dunne, C., & Dorr, B. (2009). Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. Paper presented at the Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2.
    Moreno Gea, P., & Blanco Sánchez, C. J. P. c. (2012). Suicidio e Internet. Medidas preventivas y de actuación. 16.
    Owen, G., Belam, J., Lambert, H., Donovan, J., Rapport, F., & Owens, C. (2012). Suicide communication events: Lay interpretation of the communication of suicidal ideation and intent. Social science & medicine, 75(2), 419-428.
    Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., & Leenaars, A. (2010). Suicide note classification using natural language processing: A content analysis. Biomedical informatics insights, 3, BII. S4706.
    Qiu, G., He, X., Zhang, F., Shi, Y., Bu, J., & Chen, C. J. E. S. w. A. (2010). DASA: dissatisfaction-oriented advertising based on sentiment analysis. 37(9), 6182-6191.
    Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
    Quinlan, J. R. J. M. l. (1986). Induction of decision trees. 1(1), 81-106.
    Read, J., & Carroll, J. (2009). Weakly supervised techniques for domain-independent sentiment classification. Paper presented at the Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion.
    Ruder, T. D., Hatch, G. M., Ampanozi, G., Thali, M. J., & Fischer, N. (2011). Suicide announcement on Facebook. Crisis.
    Rudestam, K. E. (1971). Stockholm and Los Angeles: A cross-cultural study of the communication of suicidal intent. Journal of Consulting and clinical Psychology, 36(1), 82.
    Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
    Sun, J. (2012). ‘Jieba’Chinese word segmentation tool. In.
    Tan, S., Cheng, X., Ghanem, M. M., Wang, B., & Xu, H. (2005). A novel refinement approach for text categorization. Paper presented at the Proceedings of the 14th ACM international conference on Information and knowledge management.
    Ting, S., Ip, W., & Tsang, A. H. (2011). Is Naive Bayes a good classifier for document classification. International Journal of Software Engineering and Its Applications, 5(3), 37-46.
    Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th annual meeting on association for computational linguistics.
    Vapnik, V. (2013). The nature of statistical learning theory: Springer science & business media.
    WHO. (2002). Suicide prevention in Europe: World Health Organization.
    Wu, C.-H., Chuang, Z.-J., & Lin, Y.-C. (2006). Emotion recognition from text using semantic labels and separable mixture models. ACM transactions on Asian language information processing (TALIP), 5(2), 165-183.
    Xianghua, F., Guo, L., Yanyan, G., & Zhiqiang, W. J. K.-B. S. (2013). Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon. 37, 186-195.
    ZackWeinberg (Producer). (2012, 11 26). Retrieved from https://zh.wikipedia.org/wiki/File:Svm_separating_hyperplanes_(SVG).svg
    林盈宏, 陳毅晟, 莊凱翔, & 許乙清. (2017). 結合 Jieba 與 k-NN 於政府開放資料集推薦之臉書文章. TANET2017 臺灣網際網路研討會, 938-943.
    游和正, 黃挺豪, & 中文計算語言學期刊, 陳. J. (2012). 領域相關詞彙極性分析及文件情緒分類之研究. 17(4), 33-47.
    趙妤瑄, & 王豐緒. (2017). 情緒詞權重計算與分類演算法對於情緒分析結果之影響-以臉書粉絲團議題分析為例. Electronic Commerce Studies, 15(2), 147-166.

    QR CODE