簡易檢索 / 詳目顯示

研究生: 林姸均
Lin, Yen-Chun
論文名稱: 以文字探勘技術支援社群媒體自我傷害高風險訊息偵測
Using Text Mining Techniques for High-Risk Suicide Messages Detection on Social Media
指導教授: 區國良
Ou, Kuo-Liang
唐文華
Tarng, Wern-Huar
口試委員: 李昆樺
Lee, Kun-Hua
劉奕蘭
Liu, Yih-Lan
學位類別: 碩士
Master
系所名稱: 竹師教育學院 - 學習科學與科技研究所
Institute of Learning Sciences and Technologies
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 70
中文關鍵詞: 文字探勘機器學習自殺防治自動標記異常檢測
外文關鍵詞: Automatic Labeling
相關次數: 點閱:76下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 根據2020年死因統計結果發現,蓄意自我傷害為國人死因第11名,這行為不只受到個人生理、心理以及外在環境的影響。儘管從事此相關工作者在預防自殺極具有挑戰性,但仍可以通過社群網路用戶的發文瞭解發文者的內心狀態。然而,以人工的方式閱讀大量社群文章容易耗費人力與時間成本。因此,本論文運用文字探勘分析社群媒體潛在高自我傷害風險的訊息特徵外,並以BERT自動化標記結合one-class SVM機器學習的方式,偵測社群媒體高自殺風險自我傷害文章,且建立視覺化網頁介面提供及早預防使用,最後訪談領域專家評估系統使用成效。本論文資料來源以2664篇2019年臺灣匿名社交平台Dcard文章,由本校心諮教授與四位碩士級心諮系學生標記,並在情緒認知標記使用文本增強方式解決資料不平衡問題。研究結果顯示,透過文字探勘可以發現不同文章危險程度下社群用戶討論內容之差異。BERT結合文本增強的自動標記模型於accuracy(0.84)、precision(0.85)、F1-score(0.85)相較未使用BERT自動標記模型有明顯提升。此外,專家們對於自動標記模型的視覺化網頁表示滿意,認為這種技術可以應用在工作需求中,並減少人工誤判的發生。


    According to the 2020 cause of death statistics, it is found that suicide is the 11th cause of death in Taiwan. This behavior is affected by physiology, psychology, and the external environment. Although it is challenging to do this work in suicide prevention, it is possible to understand the poster’s inner state through social media users’ posts. However, manually reading several community articles consume manpower and time costs. Therefore, using text mining to analyze the information characteristics of potential high suicide risks in social media, this paper uses BERT automatic labeling combined with one-class SVM to detect high-suicide risk articles in social media and establish the visual web interface that provides early prevention. Finally, interviews experts in the field to evaluate the effectiveness of the system. The data source is 2,664 articles on Taiwan’s anonymous social platform Dcard in 2019, marked by our school’s professor of Educational Psychology and Counseling and four master’s students of Educational Psychology and Counseling, and oversampling is used to solve the problem of data imbalance in emotional cognition marking. The results of the study show that the differences in the discussion content of community users under different article risk levels could be found by text mining. The automatic labeling model of BERT combined with oversampling has significantly improved accuracy (0.84), precision (0.85), and F1-score (0.85), compared with the automatic labeling model without BERT. In addition, experts expressed satisfaction with the visual web page of the automatic labeling model and believed that this technology can be applied to work requirements and reduce the occurrence of manual misjudgment.

    第一章、緒論 1 1.1研究背景與動機 1 1.2研究目的 4 第二章、文獻探討 5 2.1 自殺與社群媒體 5 2.2機器學習、深度學習與自殺預防 7 2.3 文字探勘與情感分析 11 2.4 自動標記 13 2.5 異常檢測 18 第三章、研究方法 20 3.1 研究流程 20 3.2 資料來源與專家標記 22 3.3 資料處理 23 3.4詞頻評估文章危險程度模型 25 3.5 文章危險程度分組評估模型 25 3.6 自動標記模型 27 3.7 模型部署與訪談 29 第四章、研究結果 32 4.1 資料分析 32 4.1.1 專家標記統計分析 32 4.1.2 LIWC文章危險程度分析結果 38 4.1.3 LDA文章危險程度分析結果 39 4.2 模型比較 41 4.2.1 詞頻評估文章危險程度模型比較 41 4.2.2 文章危險程度分組評估模型結果 43 4.2.3 自動標記模型結果 44 4.3 網頁視覺化回饋分析 47 第五章、研究結論與未來展望 52 5.1 研究結論 52 5.1.1 分析潛在高自我傷害風險之社群媒體訊息特徵 52 5.1.2 進行社媒體路文章的自動化標記 53 5.1.3 以網頁視覺化介面呈現文章自動標記結果與文章危險性 53 5.2 未來研究建議 54 參考文獻 56 附錄一、專家標記說明 65 附錄二、情緒認知自動標記結果 66 附錄三、文章危險程度自動標記結果 70

    衛生福利部(2021)。109 年死因統計結果分析。https://www.mohw.gov.tw/cp-5017-61533-1.html。
    Dcard(2023)。Dcard 公司官網 - 徵才介紹。https://about.dcard.tw/career
    Artstein, R., & Poesio, M. (2008). Inter-Coder Agreement for Computational Linguistics. Computational Linguistics, 34(4), 555-596. https://doi.org/10.1162/coli.07-034-R2
    Barker, E., O’Gorman, J., & De Leo, D. (2014). Suicide around public holidays. Australasian Psychiatry, 22(2), 122-126. https://doi.org/10.1177/1039856213519293
    Barrera-Animas, A. Y., Trejo, L. A., Medina-Pérez, M. A., Monroy, R., Camiña, J. B., & Godínez, F. (2017). Online personal risk detection based on behavioural and physiological patterns. Information Sciences, 384, 281-297. https://doi.org/https://doi.org/10.1016/j.ins.2016.08.006
    Bernert, R. A., Hilberg, A. M., Melia, R., Kim, J. P., Shah, N. H., & Abnousi, F. (2020). Artificial Intelligence and Suicide Prevention: A Systematic Review of Machine Learning Investigations. International Journal of Environmental Research and Public Health, 17(16). https://doi.org/10.3390/ijerph17165929
    Bernert, R. A., Kim, J. S., Iwata, N. G., & Perlis, M. L. (2015). Sleep disturbances as an evidence-based suicide risk factor. Current psychiatry reports, 17, 1-9.
    Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
    Camiña, J. B., Hernández-Gracidas, C., Monroy, R., & Trejo, L. (2014). The Windows-Users and -Intruder simulations Logs dataset (WUIL): An experimental framework for masquerade detection mechanisms. Expert Systems with Applications, 41(3), 919-930. https://doi.org/https://doi.org/10.1016/j.eswa.2013.08.022
    Camiña, J. B., Medina-Pérez, M. A., Monroy, R., Loyola-González, O., Villanueva, L. A. P., & Gurrola, L. C. G. (2019). Bagging-RandomMiner: a one-class classifier for file access-based masquerade detection. Machine Vision and Applications, 30(5), 959-974. https://doi.org/10.1007/s00138-018-0957-4
    Chen, Y.-h., Bai, J., Wu, D., Yu, S.-f., Qiang, X.-l., Bai, H., Wang, H.-n., & Peng, Z.-w. (2020). Corrigendum to'Association between fecal microbiota and generalized anxiety disorder: Severity and early treatment response'[Journal of Affective Disorders 259 (2019) 56-66]. Journal of affective disorders, 260, 489.
    Davis, F. D., Bagozzi, R. P., & Warshaw, P. R. (1989). User acceptance of computer technology: A comparison of two theoretical models. Management science, 35(8), 982-1003.
    Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
    Duch, W., Matykiewicz, P., & Pestian, J. (2008). Neurolinguistic approach to natural language processing with applications to medical text analysis. Neural Networks, 21(10), 1500-1510. https://doi.org/https://doi.org/10.1016/j.neunet.2008.05.008
    Eke, C. I., Norman, A. A., & Shuib, L. (2021). Context-Based Feature Technique for Sarcasm Identification in Benchmark Datasets Using Deep Learning and BERT Model. IEEE Access, 9, 48501-48518. https://doi.org/10.1109/ACCESS.2021.3068323
    Emmott, A., Das, S., Dietterich, T., Fern, A., & Wong, W.-K. (2015). A meta-analysis of the anomaly detection problem. arXiv preprint arXiv:1503.01158.
    Espelage, D. L., & Holt, M. K. (2013). Suicidal Ideation and School Bullying Experiences After Controlling for Depression and Delinquency. Journal of Adolescent Health, 53(1, Supplement), S27-S31. https://doi.org/https://doi.org/10.1016/j.jadohealth.2012.09.017
    https://doi.org/https://doi.org/10.1016/j.jadohealth.2012.09.017
    Fahey, R. A., Matsubayashi, T., & Ueda, M. (2018). Tracking the Werther Effect on social media: Emotional responses to prominent suicide deaths on twitter and subsequent increases in suicide. Social Science & Medicine, 219, 19-29. https://doi.org/https://doi.org/10.1016/j.socscimed.2018.10.004
    Frost, M., & Casey, L. (2016). Who Seeks Help Online for Self-Injury? Arch Suicide Res, 20(1), 69-79. https://doi.org/10.1080/13811118.2015.1004470
    Ghosh, S., Ekbal, A., & Bhattacharyya, P. (2020). CEASE, a Corpus of Emotion Annotated Suicide notes in English.
    Hassan Yousef, A., Medhat, W., & Mohamed, H. (2014). Sentiment Analysis Algorithms and Applications: A Survey. Ain Shams Engineering Journal, 5. https://doi.org/10.1016/j.asej.2014.04.011
    Hertz, M. F., Donato, I., & Wright, J. (2013). Bullying and Suicide: A Public Health Approach. Journal of Adolescent Health, 53(1), S1-S3. https://doi.org/10.1016/j.jadohealth.2013.05.002
    Huang, W. J., Chen, W. W., & Zhang, X. (2017). Multiple sclerosis: Pathology, diagnosis and treatments. Experimental and therapeutic medicine, 13(6), 3163-3166.
    Iii, L., & Robinson. (2020). Seeing the invisible: Extracting signs of depression and suicidal ideation from college students’ writing using LIWC a computerized text analysis. International Journal of Research Studies in Education, 9. https://doi.org/10.5861/ijrse.2020.5007
    Isometsä, E. T. (2001). Psychological autopsy studies – a review. European Psychiatry, 16(7), 379-385. https://doi.org/https://doi.org/10.1016/S0924-9338(01)00594-6
    Ji, S., Yu, C. P., Fung, S.-f., Pan, S., & Long, G. (2018). Supervised Learning for Suicidal Ideation Detection in Online User Content. Complexity, 2018, 6157249. https://doi.org/10.1155/2018/6157249
    Jones, N. J., & Bennell, C. (2007). The development and validation of statistical prediction rules for discriminating between genuine and simulated suicide notes. Archives of Suicide Research, 11(2), 219-233.
    Jose, R., Matero, M., Sherman, G., Curtis, B., Giorgi, S., Schwartz, H. A., & Ungar, L. H. (2022). Using Facebook language to predict and describe excessive alcohol use. Alcoholism: Clinical and Experimental Research, 46(5), 836-847. https://doi.org/https://doi.org/10.1111/acer.14807
    Kang, S. (2022). Using binary classifiers for one-class classification. Expert Systems with Applications, 187, 115920. https://doi.org/https://doi.org/10.1016/j.eswa.2021.115920
    Kidger, J., Heron, J., Lewis, G., Evans, J., & Gunnell, D. (2012). Adolescent self-harm and suicidal thoughts in the ALSPAC cohort: a self-report survey in England. BMC Psychiatry, 12(1), 69. https://doi.org/10.1186/1471-244X-12-69
    Kietzmann, J. H., Hermkens, K., McCarthy, I. P., & Silvestre, B. S. (2011). Social media? Get serious! Understanding the functional building blocks of social media. Business horizons, 54(3), 241-251.
    Le, Q., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents Proceedings of the 31st International Conference on Machine Learning, Proceedings of Machine Learning Research. https://proceedings.mlr.press/v32/le14.html
    Linehan, M. M. (1997). Behavioral Treatments of Suicidal Behaviors. Definitional Obfuscation and Treatment Outcomes a. Annals of the New York Academy of Sciences, 836(1), 302-328.
    Marchant, A., Hawton, K., Stewart, A., Montgomery, P., Singaravelu, V., Lloyd, K., Purdy, N., Daine, K., & John, A. (2017). A systematic review of the relationship between internet use, self-harm and suicidal behaviour in young people: The good, the bad and the unknown. PLoS One, 12(8), e0181722. https://doi.org/10.1371/journal.pone.0181722
    Marengo, D., Azucar, D., Giannotta, F., Basile, V., & Settanni, M. (2019). Exploring the association between problem drinking and language use on Facebook in young adults. Heliyon, 5(10), e02523. https://doi.org/https://doi.org/10.1016/j.heliyon.2019.e02523
    Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., & Eetemadi, S. (2020, 17-20 Nov. 2020). Bottom-Up and Top-Down: Predicting Personality with Psycholinguistic and Language Model Features. 2020 IEEE International Conference on Data Mining (ICDM),
    Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
    Moreno, M., Jelenchick, L., Egan, K., Ph.D, E., Young, H., B.S, K., & Becker, T. (2011). Feeling Bad on Facebook: Depression disclosures by college students on a Social Networking Site. Depression and Anxiety, 28, 447-455. https://doi.org/10.1002/da.20805
    Obar, J. A., & Wildman, S. S. (2015). Social media definition and the governance challenge-an introduction to the special issue. Obar, JA and Wildman, S.(2015). Social media definition and the governance challenge: An introduction to the special issue. Telecommunications policy, 39(9), 745-750.
    Pawar, K., & Attar, V. (2022). Deep learning model based on cascaded autoencoders and one‐class learning for detection and localization of anomalies from surveillance videos. IET Biometrics. https://doi.org/10.1049/bme2.12064
    Pennebaker, J., Booth, R., Boyd, R., & Francis, M. (2015). Linguistic Inquiry and Word Count: LIWC2015.
    Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015.
    Perera, P., Oza, P., & Patel, V. M. (2021). One-class classification: A survey. arXiv preprint arXiv:2101.03064.
    Pestian, J., Matykiewicz, P., Grupp-Phelan, J., Lavanier, S., Combs, J., & Kowatch, R. (2008). Using Natural Language Processing to classify suicide notes. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 1091. https://doi.org/10.3115/1572306.1572327
    Pestian, J., Nasrallah, H., Matykiewicz, P., Bennett, A., & Leenaars, A. (2010). Suicide Note Classification Using Natural Language Processing: A Content Analysis. Biomedical Informatics Insights, 3, BII.S4706. https://doi.org/10.4137/BII.S4706
    Pestian, J., Sorter, M., Connolly, B., Cohen, K., McCullumsmith, C., Gee, J., Morency, L.-P., Scherer, S., & Rohlfs, L. (2016). A Machine Learning Approach to Identifying the Thought Markers of Suicidal Subjects: A Prospective Multicenter Trial. Suicide & life-threatening behavior, 47. https://doi.org/10.1111/sltb.12312
    Pfeffer, C. R., Newcorn, J., Kaplan, G., Mizruchi, M. S., & Plutchik, R. (1988). Suicidal Behavior in Adolescent Psychiatric Inpatients. Journal of the American Academy of Child & Adolescent Psychiatry, 27(3), 357-361. https://doi.org/https://doi.org/10.1097/00004583-198805000-00015
    Pourmand, A., Roberson, J., Caggiula, A., Monsalve, N., Rahimi, M., & Torres-Llenza, V. (2018). Social Media and Suicide: A Review of Technology-Based Epidemiology and Risk Assessment. Telemedicine and e-Health, 25(10), 880-888. https://doi.org/10.1089/tmj.2018.0203
    Preiss, J., & Stevenson, M. (2004). Introduction to the special issue on word sense disambiguation. Computer Speech & Language, 18(3), 201-207. https://doi.org/https://doi.org/10.1016/j.csl.2004.05.005
    Rai, N., Kumar, D., Kaushik, N., Raj, C., & Ali, A. (2022). Fake News Classification using transformer based enhanced LSTM and BERT. International Journal of Cognitive Computing in Engineering, 3. https://doi.org/10.1016/j.ijcce.2022.03.003
    Roy, S. D., & Debbarma, S. (2022). A novel OC-SVM based ensemble learning framework for attack detection in AGC loop of power systems. Electric Power Systems Research, 202, 107625. https://doi.org/https://doi.org/10.1016/j.epsr.2021.107625
    Ruder, T., Hatch, G., Ampanozi, G., Thali, M., & Fischer, N. (2011). Suicide Announcement on Facebook. Crisis, 32, 280-282. https://doi.org/10.1027/0227-5910/a000086
    Ruff, L., Kauffmann, J. R., Vandermeulen, R. A., Montavon, G., Samek, W., Kloft, M., Dietterich, T. G., & Müller, K.-R. (2021). A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE, 109(5), 756-795.
    Schnabel, T., Labutov, I., Mimno, D., & Joachims, T. (2015). Evaluation methods for unsupervised word embeddings. Proceedings of the 2015 conference on empirical methods in natural language processing,
    Sedgwick, R., Epstein, S., Dutta, R., & Ougrin, D. (2019). Social media, internet use and suicide attempts in adolescents. Curr Opin Psychiatry, 32(6), 534-541. https://doi.org/10.1097/yco.0000000000000547
    Shain, B., ADOLESCENCE, C. O., Braverman, P. K., Adelman, W. P., Alderman, E. M., Breuner, C. C., Levine, D. A., Marcell, A. V., & O’Brien, R. F. (2016). Suicide and Suicide Attempts in Adolescents. Pediatrics, 138(1). https://doi.org/10.1542/peds.2016-1420
    Swain, D., Khandelwal, A., Joshi, C., Gawas, A., Roy, P., & Zad, V. (2021). A Suicide Prediction System Based on Twitter Tweets Using Sentiment Analysis and Machine Learning. In (pp. 45-58). https://doi.org/10.1007/978-981-33-4859-2_5
    Tadesse, M. M., Lin, H., Xu, B., & Yang, L. (2020). Detection of Suicide Ideation in Social Media Forums Using Deep Learning. Algorithms, 13(1), 7. https://www.mdpi.com/1999-4893/13/1/7
    Tang, X., Mou, H., Liu, J., & Du, X. (2021). Research on automatic labeling of imbalanced texts of customer complaints based on text enhancement and layer-by-layer semantic matching. Scientific Reports, 11(1), 11849. https://doi.org/10.1038/s41598-021-91189-0
    Tax, D. M. J. (2002). One-class classification: Concept learning in the absence of counter-examples.
    Tayyar Madabushi, H., Kochkina, E., & Castelle, M. (2020). Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data.
    Villa-Pérez, M. E., Álvarez-Carmona, M. Á., Loyola-González, O., Medina-Pérez, M. A., Velazco-Rossell, J. C., & Choo, K.-K. R. (2021). Semi-supervised anomaly detection algorithms: A comparative summary and future research directions. Knowledge-Based Systems, 218, 106878. https://doi.org/https://doi.org/10.1016/j.knosys.2021.106878
    Wahid, J. A., Hussain, S., Wang, H., Wu, Z., Shi, L., & Gao, Y. (2021, 27-29 Aug. 2021). Aspect oriented Sentiment classification of COVID-19 Twitter data; an enhanced LDA based text analytic approach. 2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI),
    Wang, N., Luo, F., Shivtare, Y., Badal, V. D., Subbalakshmi, K., Chandramouli, R., & Lee, E. (2021). Learning models for suicide prediction from social media posts. arXiv preprint arXiv:2105.03315.
    WHO. (2019). Suicide data. https://www.who.int/teams/mental-health-and-substance-use/data-research/suicide-data
    Zhao, Y., Zhang, J., & Wu, M. (2019). Finding Users' Voice on Social Media: An Investigation of Online Support Groups for Autism-Affected Users on Facebook. Int J Environ Res Public Health, 16(23). https://doi.org/10.3390/ijerph16234804

    QR CODE