簡易檢索 / 詳目顯示

研究生: 陳怡汝
Chen, Yi-Ju
論文名稱: 結合基於變換器模型的聲學和語言特徵進行中文語音的失智症評估
Dementia Assessment on Mandarin Speech Using Transformer-based Acoustic and Linguistic Features
指導教授: 郭柏志
Kuo, Po-Chih
口試委員: 高宏宇
Kao, Hung-Yu
黃立楷
Huang, Li-Kai
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 46
中文關鍵詞: 失智症自動語音辨識圖片描述任務變換器大型語言模型機器學習
外文關鍵詞: Dementia, Automatic Speech Recognition, Picture Description Tasks, Transformer, Large Language Model, Machine Learning
相關次數: 點閱:83下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 失智症是一種全球性的病症,早期發現治療至關重要,因為可以防止疾病進一步的惡化。由於失智症是一群症狀的組合,需要藉由一系列不同的測試方法進行診斷,其過程既複雜、耗時又耗費。其中,圖片描述任務中的Cookie Theft測驗是失智症早期診斷中被公認為低成本和具有高特異性以及高敏感性的方法之一,亦是臨床上常被使用的方式。
    隨著現今大型語言模型的出現,語音診斷或輔助判斷變得更為便利,且針對語言分析上也有顯著提升。變換器作為現今大型語言模型常用的架構,廣泛應用於語音處理、自然語言處理以及電腦視覺等任務,其多頭自注意力機制能夠更加有效地捕捉聲學特徵以及語言特徵。其中,聲學特徵是從原始語音訊號中提取而來,反映語音訊號中頻率以及時間等資訊;語言特徵是從文字中提取而來,可以反映出語法以及語意等資訊。將兩者特徵結合能夠更為全面地分析和理解語音和文字所提供的訊息,為失智症的早期診斷提供了新的可能性。
    因此,本項研究主旨在於使用多個以變換器為架構的語音辨別模型,包含利用Whipser取得聲學特徵和將語音轉換成文字,並透過BERT取得文字特徵。我們利用台灣中文Cookie Theft測驗的純語音數據,其中的86名受試者中有28名失智症患者和58名非失智症患者,開發失智症評估系統。實驗使用多種模型評估指標來評估對失智症的判別和相關分數的預測,最終模型表現結果F1-score可以達到85%,在失智症相關分數預測方面mean squred error可以達到7%。另外,透 過相同的模型架構,我們也針對國際上阿茲海默比賽的英文資料做訓練測試以及對全失智症患者的中文語音做額外的驗證,最終模型表現結果F1-score分別可以達到86%和99%。


    Dementia is a global disease and early detection and treatment are crucial as it can prevent further deterioration. The Cookie Theft picture description test (CTT) is recognized as one of the low-cost and effective methods with high specificity and high sensitivity for dementia and is commonly used in clinical practice.
    The transformer, a commonly used architecture for large-scale language models, is widely used in tasks such as speech processing, natural language processing, and computer vision. The transformer effectively captures acoustic and language features due to its multi-head self-attention mechanism. The acoustic features are extracted from the raw speech signal and the information such as frequency and time can be acquired, while the linguistic features are extracted from the text and the information such as grammar and semantics can be parsed. Combining two features can more comprehensively analyze and understand the information provided by speech and text, giving new possibilities for early diagnosis of dementia.
    Therefore, our research aims to develop a dementia assessment system utilizing multiple transformer-based s language models, including Whisper to obtain acoustic features and transcribe them into text, and BERT to obtain linguistic features. We used only speech data from the Taiwanese Mandarin CTT, in which 86 subjects with 28 dementia and 58 non-dementia. The experiment uses a variety of model evaluation indicators to evaluate the dementia classification and the prediction of Dementia-related scores. The final model performance result F1-score can reach 85%, and the mean squared error can reach 7% in predicting dementia-related scores. Additionally, through the same model architecture, we also trained and tested on the English data of the international Alzheimer’s competition and conducted additional verification on the Chinese speech of dementia patients, and reached F1-score 86% and 99% respectively.

    Contents Abstract (Chinese) I Abstract II Acknowledgements (Chinese) III Contents IV List of Figures VII List of Tables VIII List of Algorithms IX 1 Introduction 1 2 Related Work 5 2.1 Traditional Method ........................... 5 2.2 Automatic Speech Recognition ..................... 6 2.3 Large Language Models ........................ 8 2.4 Alzheimer’s Disease Classification Competition ............................ 8 3 Dataset 10 3.1 Local Data ................................ 10 3.2 Lu Corpus of DementiaBank ...................... 11 3.3 ADReSSo Dataset ............................ 12 4 Methodology 13 4.1 Data Pre-processing .......................... 13 4.2 Modeling ................................. 14 4.2.1 Whisper Model Architecture .................. 16 4.2.2 BERT Model Architecture ................... 17 4.3 Model Evaluation ............................ 18 4.3.1 5-fold Cross-Validation ..................... 18 4.3.2 Evaluation Metrics ....................... 19 4.4 Model Interpretation .......................... 20 4.4.1 Whisper Attention Maps .................... 20 4.4.2 BERT Attention Maps ..................... 21 5 Results 22 5.1 Results of Dementia Detection ..................... 22 5.1.1 Overview ............................ 22 5.1.2 Experiment Settings ...................... 22 5.1.3 Results .............................. 22 5.2 Results of Score Prediction ....................... 23 5.2.1 Overview ............................ 23 5.2.2 Experiment Settings ...................... 23 5.2.3 Results .............................. 23 5.3 English Data ........................... 23 5.3.1 ADResSo Chanllenge ...................... 23 5.4 External Validation ........................... 28 5.4.1 Lu Corpus ............................ 28 5.5 Model Interpretation and Visualization ............................29 5.5.1 Whisper attention maps .................... 29 5.5.2 BERT attention maps ..................... 32 6 Discussion 34 6.1 Model Version Selection ........................ 34 6.2 Discussion of Classification Performance ........................... 34 6.3 Discussion of Regression Performance................. 37 6.4 Discussion of English Data....................... 37 6.5 Discussion of ExternalValidation ................... 38 6.6 Discussion of Model Interpretation .................. 38 7 Conclusion 39 Bibliography 41

    [1] Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. wav2vec 2.0: A framework for self-supervised learning of speech represen- tations, 2020.
    [2] Jeong-Uk Bang, Seung-Hoon Han, and Byung-Ok Kang. Alzheimer’s disease recognition from spontaneous speech using large language models. ETRI J., 46(1):96–105, February 2024.
    [3] Balamurali B.T and Jer-Ming Chen. Performance assessment of chatgpt ver- sus bard in detecting alzheimer’s dementia. Diagnostics, 14(8), 2024.
    [4] Gerasimos Chatzoudis, Manos Plitsis, Spyridoula Stamouli, Athanasia–Lida Dimou, Nassos Katsamanis, and Vassilis Katsouros. Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition. In Proc. Interspeech 2022, pages 2178–2182, 2022.
    [5] Jun Chen, Jieping Ye, Fengyi Tang, and Jiayu Zhou. Automatic detection of alzheimer’s disease using spontaneous speech only. In Interspeech 2021, ISCA, August 2021. ISCA.
    [6] Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, and Chao Zhang. Transfer- ring speech-generic and depression-specific knowledge for alzheimer’s disease detection. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 1–8, 2023.
    [7] Louise Cummings. Describing the cookie theft picture: Sources of breakdown in alzheimer’s dementia. Pragmatics and Society, 10:151–174, 03 2019.
    [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019.
    [9] Mohamad El Haj, Claire Boutoleau-Bretonni`ere, Karim Gallouj, Nathalie Wagemann, Pascal Antoine, Dimitrios Kapogiannis, and Guillaume Chapelet. ChatGPT as a diagnostic aid in alzheimer’s disease: An exploratory study. J. Alzheimers Dis. Rep., 8(1):495–500, March 2024.
    [10] Florian Eyben, Klaus R. Scherer, Bjo ̈rn W. Schuller, Johan Sundberg, Elisa- beth Andr ́e, Carlos Busso, Laurence Y. Devillers, Julien Epps, Petri Laukka, Shrikanth S. Narayanan, and Khiet P. Truong. The geneva minimalistic acous- tic parameter set (gemaps) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2):190–202, 2016.
    [11] Elif Eyigoz, Sachin Mathur, Mar Santamaria, Guillermo Cecchi, and Melissa Naylor. Linguistic markers predict onset of alzheimer’s disease. EClini- calMedicine, 28(100583):100583, November 2020.
    [12] H. Goodglass, E. Kaplan, and B. Barresi. Boston Diagnostic Aphasia Exam- ination: Short Form Record Booklet. Lippincott Williams & Wilkins, 2000.
    [13] Ga ́bor Gosztolya, Veronika Vincze, L ́aszlo ́ T ́oth, Magdolna Pa ́ka ́ski, J ́anos Ka ́lm ́an, and Ildiko ́ Hoffmann. Identifying mild cognitive impairment and mild alzheimer’s disease based on spontaneous speech using asr and linguistic features. Computer Speech & Language, 53:181–197, 2019.
    [14] Jiancheng Gui, Yikai Li, Kai Chen, Joanna Siebert, and Qingcai Chen. End- to-end asr-enhanced neural network for alzheimer’s disease diagnosis. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8562–8566, 2022.
    [15] Luc ́ıa Go ́mez-Zaragoza ́, Simone Wills, Cristian Tejedor-Garcia, Javier Mar ́ın- Morales, Mariano Alcan ̃iz, and Helmer Strik. Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses. In Proc. INTERSPEECH 2023, pages 2403–2407, 2023.
    [16] Xiaoquan Ke, Man-Wai Mak, and Helen M. Meng. Feature selection and text embedding for detecting dementia from spontaneous cantonese. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
    [17] Bai Li, Yi-Te Hsu, and Frank Rudzicz. Detecting dementia in Mandarin Chi- nese using transfer learning from a parallel corpus. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1991–1997, Minneapolis, Minnesota, June 2019. Association for Computa- tional Linguistics.
    [18] Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xun- ying Liu, and Helen Meng. Leveraging pretrained representations with task- related keywords for alzheimer’s disease detection. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
    [19] Jiamin Liu, Fan Fu, Li Liang, Junxiao Yu, Dacheng Zhong, Songsheng Zhu, Yuxuan Zhou, Bin Liu, and Jianqing Li. Efficient pause extraction and encode strategy for alzheimer’s disease detection using only acoustic features from spontaneous speech. Brain Sciences, 13:477, 03 2023.
    [20] Zhaoci Liu, Zhiqiang Guo, Zhenhua Ling, Shijin Wang, Lingjing Jin, and Yunxia Li. Dementia detection by analyzing spontaneous mandarin speech. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 289–296, 2019.
    [21] Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, and Brian MacWhinney. Alzheimer’s dementia recognition through spontaneous speech: The ADReSS Challenge. In Proceedings of INTERSPEECH 2020, Shanghai, China, 2020.
    [22] Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, and Brian MacWhinney. Detecting Cognitive Decline Using Speech Only: The ADReSSo Challenge. In Proc. Interspeech 2021, pages 3780–3784, 2021.
    [23] Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, and Brian MacWhinney. Detecting cognitive decline using speech only: The adresso challenge. medRxiv, 2021.
    [24] Saturnino Luz, Fasih Haider, Davida Fromm, Ioulietta Lazarou, Ioannis Kom- patsiaris, and Brian MacWhinney. Multilingual alzheimer’s dementia recog- nition through spontaneous speech: a signal processing grand challenge, 2023.
    [25] Brian Macwhinney, Davida Fromm, Margaret Forbes, and Audrey Holland. Aphasiabank: Methods for studying discourse. Aphasiology, 25:1286–1307, 11 2011.
    [26] Felicity Meakins. Computerized language analysis (clan). 2007.
    [27] Raghavendra Pappagari, Jaejin Cho, Sonal Joshi, Laureano Moro-Vela ́zquez, Piotr Z ̇elasko, Jesu ́s Villalba, and Najim Dehak. Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios. In Proc. Interspeech 2021, pages 3825–3829, 2021.
    [28] Ying Qin, Wei Ming Liu, Zhiyuan Peng, Si-Ioi Ng, Jingyu Li, Haibo Hu, and Tan Lee. Exploiting pre-trained asr models for alzheimer’s disease recognition through spontaneous speech. In Medicine, Computer Science, 2021.
    [29] Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. Robust speech recognition via large-scale weak supervi- sion, 2022.
    [30] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. Distil- bert, a distilled version of bert: smaller, faster, cheaper and lighter, 2020.
    [31] Bjo ̈rn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, and Keelan Evanini. The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language. In Proc. Interspeech 2016, pages 2001–2005, 2016.
    [32] Zafi Sherhan Syed, Muhammad Shehram Shah Syed, Margaret Lech, and Elena Pirogova. Tackling the ADRESSO Challenge 2021: The MUET-RMIT System for Alzheimer’s Dementia Recognition from Spontaneous Speech. In Proc. Interspeech 2021, pages 3815–3819, 2021.
    [33] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023.
    [34] Tianzi Wang, Jiajun DENG, Mengzhe Geng, Zi Ye, Shoukang Hu, Yi Wang, Mingyu Cui, Zengrui Jin, Xunying Liu, and Helen Meng. Conformer Based Elderly Speech Recognition System for Alzheimer’s Disease Detection. In Proc. Interspeech 2022, pages 4825–4829, 2022.

    QR CODE