研究生: |
趙仁豪 Chao, Jen-Hao |
---|---|
論文名稱: |
以文字探勘分析小說文本人物性格之研究 Research of Character Analysis in Novel Based on Text Mining |
指導教授: |
區國良
Ou, Kuo-Liang 唐文華 Tang, Wern-Huar |
口試委員: |
王鼎銘
張慈宜 |
學位類別: |
碩士 Master |
系所名稱: |
竹師教育學院 - 學習科學與科技研究所 Institute of Learning Sciences and Technologies |
論文出版年: | 2022 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 69 |
中文關鍵詞: | 機器學習 、自然語言處理 、性格分析 、劇本篩選 、對立原則 |
外文關鍵詞: | machine learning, natural language processing, personality analysis, script filtering, principle of antagonism |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
對於影視作品創作來說,優秀的故事劇本是一部作品成功重要的因素之一。在過去製片與導演挑選劇本需要耗費大量的時間與人力,且難以統一標準與增加效率。因此本論文透過機器學習中BERT-base 和 MLP的組合演算法模型,對小說人物性格進行大五人格模型的分析,並採用「對立原則」詮釋角色在故事發展中性格的變化,以篩選出引人入勝的角色塑造,並透過成對樣本T檢定,檢測出在此方法下角色性格確實產生了顯著性的變化。最後再使用Tableau 雲端視覺化分析工具,將小說性格分析結果以互動式視覺化介面展現,提供一個能夠快速了解角色與故事發展,以達到輔助故事創作與篩選的工具。本論文同時透過深度訪談的方式,確認了這套研究工具確實替實務工作者帶來工作上的幫助,可作為未來相關研究及數位人文應用發展之參考。
An excellent story script is one of the essential factors for the success of the entertainment industry. However, producers and directors usually spend a lot of time and effort determining scripts from candidates. This paper employs a learning model with BERT-base and MLP to analyze the character in the novel with Big Five personality traits and use the principle of antagonism to interpret the personality of character changes in the story's development. The paired sample T-test shows the proposed method of this thesis can detect the character's personality and their differences in the novel. Then, an interactive visualized dashboard supported by Tableau was used at the end of this thesis to provide the readers to comprehend the development of characters rapidly and assist the producers and directors in determining a suitable story before reading the entire novel.
Amalvy, A. (2020). 應用自然語言處理技術分析文學小說角色 之關係: 以互動視覺化呈現 National Central University].
Bengio, Y., Ducharme, R., & Vincent, P. (2000). A neural probabilistic language model. Advances in Neural Information Processing Systems, 13.
Bestgen, Y., & Vincze, N. (2012). Checking and bootstrapping lexical norms by means of word similarity indexes. Behavior research methods, 44(4), 998-1006.
Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS quarterly, 319-340.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Eaves, L. J., Eysenck, H. J., & Martin, N. G. (1989). Genes, culture and personality: An empirical approach. Academic Press.
Egloff, M., Picca, D., & Curran, K. (2016). How IBM watson can help us understand character in Shakespeare: a cognitive computing approach to the plays. igital Humanities 2016: Conference Abstracts,
Freud, S. (1953). On aphasia; a critical study.
Gao, Z., Malic, V., Ma, S., & Shih, P. (2019). How to make a successful movie: Factor analysis from both financial and critical perspectives. International Conference on Information,
Gill, A. J., Nowson, S., & Oberlander, J. (2006). Language and Personality in Computer-Mediated Communication: A cross-genre comparison. Journal of Computer Mediated Communication.
Gottschalk, L. A., Gleser, G. C., Daniels, R. S., & Block, S. (1958). The speech patterns of schizophrenic patients: A method of assessing relative degree of personal disorganization and social alienation. Journal of nervous and mental disease.
Hawkins II, R. C., & Boyd, R. L. (2017). Such stuff as dreams are made on: Dream language, LIWC norms, and personality correlates. Dreaming, 27(2), 102.
Hu, K., Wu, H., Qi, K., Yu, J., Yang, S., Yu, T., Zheng, J., & Liu, B. (2018). A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model. Scientometrics, 114(3), 1031-1068.
Jacobs, A. M. (2019). Sentiment Analysis for Words and Fiction Characters From the Perspective of Computational (Neuro-)Poetics. Front Robot AI, 6, 53. https://doi.org/10.3389/frobt.2019.00053
Jiang, H., Zhang, X., & Choi, J. D. (2020). Automatic text-based personality recognition on monologues and multiparty dialogues using attentive networks and contextual embeddings (student abstract). Proceedings of the AAAI Conference on Artificial Intelligence,
Kuperman, V., Estes, Z., Brysbaert, M., & Warriner, A. B. (2014). Emotion and language: valence and arousal affect word recognition. Journal of Experimental Psychology: General, 143(3), 1065.
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. International conference on machine learning,
Mairesse, F., & Walker, M. (2006). Automatic recognition of personality in conversation. Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers,
Mairesse, F., Walker, M. A., Mehl, M. R., & Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of artificial intelligence research, 30, 457-500.
Majumder, N., Poria, S., Gelbukh, A., & Cambria, E. (2017). Deep learning-based document modeling for personality detection from text. IEEE Intelligent Systems, 32(2), 74-79.
McCrae, R. R., Costa, J., Paul T, & Martin, T. A. (2005). The NEO–PI–3: A more readable revised NEO personality inventory. Journal of personality assessment, 84(3), 261-270.
Mckee, R. (1997). STORY:Substance, Structure, Style and the Principles of Screenwriting. HarperCollins.
Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., & Eetemadi, S. (2020). Bottom-Up and Top-Down: Predicting Personality with Psycholinguistic and Language Model Features 2020 IEEE International Conference on Data Mining (ICDM),
Mohammad, S. M. (2012). From once upon a time to happily ever after: Tracking emotions in mail and books. Decision Support Systems, 53(4), 730-741.
Nalisnick, E. T., & Baird, H. S. (2013). Character-to-character sentiment analysis in Shakespeare’s plays. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers),
Neuman, Y., Perlovsky, L., Cohen, Y., & Livshits, D. (2016). The personality of music genres. Psychology of Music, 44(5), 1044-1057.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in information retrieval, 2(1–2), 1-135.
Pennebaker, J. W., & Beall, S. K. (1986). Confronting a traumatic event: toward an understanding of inhibition and disease. Journal of abnormal psychology, 95(3), 274.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71(2001), 2001.
Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: language use as an individual difference. Journal of personality and social psychology, 77(6), 1296.
Ramsay, S. (2011). Reading Machines: Toward and Algorithmic Criticism. University of Illinois Press.
Robert, M. (1997). Story: style, structure, substance, and the principles of screenwriting. Harper Collins.
Roccas, S., & Brewer, M. B. (2002). Social identity complexity. Personality and social psychology review, 6(2), 88-106.
Roccas, S., Sagiv, L., Schwartz, S. H., & Knafo, A. (2002). The big five personality factors and personal values. Personality and social psychology bulletin, 28(6), 789-801.
Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics, 8, 842-866.
Samuel, A. L. (1967). Some studies in machine learning using the game of checkers. II—Recent progress. IBM Journal of research and development, 11(6), 601-617.
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
Schultz, D. P., & Schultz, S. E. (2016). Theories of personality. Cengage Learning.
Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In Advances in experimental social psychology (Vol. 25, pp. 1-65). Elsevier.
Stachl, C., Au, Q., Schoedel, R., Gosling, S. D., Harari, G. M., Buschek, D., Völkel, S. T., Schuwerk, T., Oldemeier, M., & Ullmann, T. (2020). Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences, 117(30), 17680-17687.
Svensson, P. (2010). The landscape of digital humanities. Digital humanities quarterly, 4(1).
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
Tighe, E. P., Ureta, J. C., Pollo, B. A. L., Cheng, C. K., & de Dios Bulos, R. (2016). Personality Trait Classification of Essays with the Application of Feature Reduction. SAAIP@ IJCAI,
Turney, P. D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. European conference on machine learning,
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv preprint cs/0212032.
Vogler, C. (2007). The writer's journey. Michael Wiese Productions Studio City, CA.
Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences, 112(4), 1036-1040.
丁興祥, & 賴誠斌. (2001). 心理傳記學的開展與應用: 典範與方法. 應用心理研究(12), 77-106.
孔儀. (2019). 基於文本信息的人物性格分析算法的研究與實現. Computer Science and Application, 9, 2191.
林清山. (1981). 心理與教育統計學. 東華.
張曉珍. (2013). 運用文字探勘技術在社群行為上之人格預測.
游秀雲. (2018). 大數據對人文研究的助益—以《紅樓夢》為例 自由軟體與教育科技研討會,
項潔, & 涂豐恩. (2011). 導論——什麼是數位人文. 從保存到創造: 開啟數位人文研究》, 頁, 9-28.
項潔, & 翁稷安. (2011). 導論—關於數位人文的思考: 理論與方法. 數位人文研究的新視野: 基礎與想像》, 臺北: 臺大出版中心, 頁, 9-18.
葉光輝. (2007). 性格心理學:理論與研究. 雙葉書廊.
蔡震邦. (2017). 如何提升心理測驗的效益-從受測者因素影響自陳性量表結果的正確性談起. 矯政期刊, 6(1), 40-60.