研究生: |
張薾云 Chang, Erh-Yun |
---|---|
論文名稱: |
基於台灣本地年長者之語音特徵偵測早期認知功能障礙 Early Detection for Cognitive Impairment in Taiwanese Elderly People based on Features of Speech |
指導教授: |
劉奕汶
Liu, Yi-Wen |
口試委員: |
王新民
Wang, Hsin-Min 王道維 Wang, Daw-Wei 李祈均 Lee, Chi-Chun 蘇宜青 Su, Yi-Ching |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 113 |
語文別: | 英文 |
論文頁數: | 57 |
中文關鍵詞: | 輕微認知障礙 、聲學特徵 、語速 、元音空間特徵 、特徵選取 |
外文關鍵詞: | mild cognitive impairment, acoustic features, speech rate, vowel space characteristics, feature selection |
相關次數: | 點閱:78 下載:3 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
失智症屬於一種腦部疾病,被視為是一種神經認知障礙,此疾病會導致思考及記憶力逐漸地退化,其常見症狀包含語言、理解力、行動能力降低等等,使得患者及其家人日常生活受到嚴重影響。失智症目前仍無治癒的可能,若能在認知功能障礙早期就發現並治療的話才有機會痊癒。本研究使用受試者在認知功能測驗中的錄音,從語音的角度去探討失智症患者與正常老化的差異,並進一步研究輕微認知功能障礙對於語音的影響。本篇論文建構了一個台灣本地年長者的認知功能語料庫,其中包含錄音資料及逐字稿。我們從受試者的語音中抽取出聲學、語速及構音三個層面的特徵,並將目標問題拆分成分類失智症患者與非失智者,以及分類輕微認知功能障礙與健康控制組兩個階段。以不同的方式組合這些特徵集,並透過循序特徵選取演算法篩選出較具有影響力的特徵來訓練機器學習模型。實驗結果表明,使用這三個層面的特徵有助於提升模型對於認知功能障礙的預測能力。最後分析各特徵對於模型預測值的影響,發現在不同的認知功能測驗任務中,每個特徵集的趨勢都會改變。此結果顯示了失智症患者在進行不同面向的測驗任務時,語音上的表現也會有所差異。此外,從我們的分析結果也可以看出認知功能障礙患者與正常老化之成年人在各個面向上的不同,以及失智症患者和輕微認知功能障礙患者在語音上的不同表徵。
Dementia is a type of brain disease and neurocognitive disorder, that leads to a gradual decline in thinking and memory. Common symptoms include reductions in the ability to language, comprehension, and mobility, affecting the daily lives of patients and their families. Dementia is still incurable now, only early detection and treatment of cognitive impairment can provide a possibility of improvement. Our study used recordings from cognitive tests to analyze the differences between dementia patients and normal aging individuals and found the impact of mild cognitive impairment (MCI) on speech.
This thesis constructs a cognitive function speech corpus for the elderly in Taiwan, which includes audio recordings and transcripts. We extracted features from three aspects of speech: acoustics, speech rate, and articulation. The target problems were divided into two stages: classifying dementia patients and non-demented individuals, and classifying MCI and healthy controls (HC). These feature sets were combined in various ways, and the sequential feature selection algorithm was used to identify the most influential features for training machine learning models.
The results show that using these features improves the model's ability to predict cognitive impairment. Finally, we analyzed the impact of each feature on the model's predictions and found that trends for each feature set change depending on the cognitive test task. This result implies that dementia patients exhibit different speech characteristics when performing different types of cognitive tasks. Moreover, our analysis also highlights the differences between cognitive impairment patients and normally aging adults across various aspects, as well as the distinct speech characteristics between dementia patients and those with MCI.
[1] I. Martínez-Nicolás, T. E. Llorente, F. Martínez-Sánchez, and J. J. G. Meilán, “Ten years of research on automatic voice and speech analysis of people with alzheimer’s disease and mild cognitive impairment: A systematic review article,” Frontiers in Psychology, vol. 12, p. 620251, Mar. 2021.
[2] R.-P. Filiou, N. Bier, A. Slegers, B. Houzé, P. Belchior, and S. M. Brambati, “Connected speech assessment in the early detection of alzheimer’s disease and mild cognitive impairment: a scoping review,” Aphasiology, vol. 34, no. 6, pp. 723–755, 2020.
[3] A. Burns and S. Iliffe, “Dementia,” British Medical Journal, vol. 338, p. b75, Feb. 2009.
[4] F. Boller and J. Becker, “Dementia-bank database guide,” 2005.
[5] S. Luz, F. Haider, S. D. L. Fuente, D. Fromm, and B. MacWhinney, “Alzheimer’s dementia recognition through spontaneous speech: The adress challenge,” Interspeech 2020, pp. 2172–2176, Oct. 2020.
[6] J.T.Becker,F.Boller,O.L.Lopez,J.Saxton,andK.L.McGonigle,“Thenat- ural history of Alzheimer’s disease. Description of study cohort and accuracy of diagnosis,” Archives of Neurology, vol. 51, pp. 585–594, June 1994.
[7] H. Goodglass, E. Kaplan, and B. Barresi, Boston Diagnostic Aphasia Examination, Third Edition. Lippincott Williams & Wilkins, 2001.
[8] S. Singh, R. Bucks, and J. Cuerden, “Evaluation of an objective technique for analysing temporal variables in dat spontaneous speech,” Aphasiology, vol. 15, Mar. 2001.
[9] C. O’Keeffe, S. M. Yap, L. Davenport, C. Cogley, F. Craddock, A. Kennedy, N. Tubridy, C. D. Looze, N. Suleyman, F. O’Keeffe, R. B. Reilly, and C. McGuigan, “Association between speech rate measures and cognitive function in people with relapsing and progressive multiple sclerosis,” Multiple Sclerosis Journal - Experimental, Translational and Clinical, vol. 8, Aug. 2022.
[10] H. T. Wei, D. Kulzhabayeva, L. Erceg, J. Robin, Y. Z. Hu, M. Chignell, and J. A. Meltzer, “Cognitive components of aging-related increase in word-finding difficulty,” Aging, Neuropsychology, and Cognition, pp. 1–32, 2024.
[11] A. König, A. Satt, A. Sorin, R. Hoory, O. Toledo-Ronen, A. Derreumaux, V. Manera, F. Verhey, P. Aalten, P. H. Robert, and R. David, “Automatic speech analysis for the assessment of patients with predementia and Alzheimer’s disease,” Alzheimer’s & Dementia : Diagnosis, Assessment & Disease Monitoring, vol. 1, pp. 112–124, Mar. 2015.
[12] L.M.O.Costa,V.D.O.Martins-Reis, and L.C.Celeste,“Methods of analysis speech rate: a pilot study,” CoDAS, vol. 28, pp. 41–45, Feb. 2016.
[13] A. Shamei, Y. Liu, and B. Gick, “Reduction of vowel space in Alzheimer’s disease,” JASA Express Letters, vol. 3, Mar. 2023.
[14] G. Weismer, J.-Y. Jeng, J. S. Laures, R. D. Kent, and J. F. Kent, “Acoustic and intelligibility characteristics of sentence production in neurogenic speech disorders,” 2001.
[15] H.-M. Liu, F.-M. Tsao, and P. K. Kuhl, “The effect of reduced vowel working space on speech intelligibility in Mandarin-speaking young adults with cerebral palsy,” The Journal of the Acoustical Society of America, vol. 117, pp. 3879–3889, June 2005.
[16] C.-P.Chen,H.-H.Pan,S.S.-F.Gau, and C.-C.Lee,“Usingmeasuresofvowel space for autistic traits characterization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 591–607, 2024.
[17] Z. S. Nasreddine, N. A. Phillips, V. Bédirian, S. Charbonneau, V. Whitehead, I. Collin, J. L. Cummings, and H. Chertkow, “The montreal cognitive assessment, MoCA: a brief screening tool for mild cognitive impairment,” Journal of the American Geriatrics Society, vol. 53, pp. 695–699, Apr. 2005.
[18] M. F. Folstein, S. E. Folstein, and P. R. McHugh, “Mini-mental state: A practical method for grading the cognitive state of patients for the clinician,” Journal of Psychiatric Research, vol. 12, pp. 189–198, Nov. 1975.
[19] J. S. Fasnacht, A. S. Wueest, M. Berres, A. E. Thomann, S. Krumm, K. Gutbrod, L. A. Steiner, N. Goettel, and A. U. Monsch, “Conversion between the montreal cognitive assessment and the mini-mental status examination,” Journal of the American Geriatrics Society, vol. 71, no. 3, pp. 869–879, 2023.
[20] E. Sudheer Kumar, K. Jai Surya, K. Yaswanth Varma, A. Akash, and K. Nithish Reddy, “Noise reduction in audio file using spectral gatting and FFT by Python modules,” in Advances in Transdisciplinary Engineering (K. Ramachandra Murthy, S. Kumar, and M. Kumar Singh, eds.), IOS Press, Jan. 2023.
[21] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, pp. 788–798, May 2011.
[22] D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X- vectors: Robust DNN embeddings for speaker recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Calgary, AB), pp. 5329–5333, IEEE, Apr. 2018.
[23] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (J. Burstein, C. Doran, and T. Solorio, eds.), (Minneapolis, Minnesota), pp. 4171–4186, Association for Computational Linguistics, June 2019.
[24] C.-L. Huang, C. K. Chung, N. Hui, Y.-C. Lin, Y.-T. Seih, B. C. P. Lam, W.-C. Chen, M. H. Bond, and J. W. Pennebaker, “中文版「語文探索與字詞計算」詞 典之建立,” 中華心理學刊, vol. 54, pp. 185–201, June 2012.
[25] A. Balagopalan, B. Eyre, J. Robin, F. Rudzicz, and J. Novikova, “Comparing pre-trained and feature-based models for prediction of alzheimer’s disease based on speech,” Frontiers in Aging Neuroscience, vol. 13, p. 635945, Apr. 2021.
[26] P. Boersma et al., “Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound,” IFA Proceedings, vol. 17, pp. 97–110, 1993.
[27] C.T.Ferrand, Speech Science: An Integrated Approach to Theory and Clinical Practice. Pearson, 2014.
[28] J. Burg, “Maximum entropy spectral analysis,” in Proceedings of 37th Meeting, Society of Exploration Geophysics, Society of Exploration Geophysicists, 1967.
[29] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing. Cambridge: Cambridge University Press, 2nd ed., 1992.
[30] C. T. Ferrand, “Harmonics-to-noise ratio: An index of vocal aging,” Journal of Voice, vol. 16, pp. 480–487, Dec. 2002.
[31] K. D. Mueller, B. Hermann, J. Mecollari, and L. S. Turkstra, “Connected speech and language in mild cognitive impairment and Alzheimer’s disease: A review of picture description tasks,” Journal of clinical and experimental neuropsychology, vol. 40, pp. 917–939, Nov. 2018.
[32] Q.Xu, A.Baevski, and M.Auli, “Simple and effective zero-shot cross-lingual phoneme recognition,” in Interspeech 2022, pp. 2113–2117, ISCA, Sept. 2022.
[33] G. Fant, Speech Sounds and Features. Cambridge, MA: MIT Press, 1973.
[34] D. Deterding, “The formants of monophthong vowels in standard southern British English pronunciation,” Journal of the International Phonetic Association, vol. 27, pp. 47–55, June 1997.
[35] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, 2003.
[36] G. Forman, “An extensive empirical study of feature selection metrics for text classification,” J. Mach. Learn. Res., vol. 3, pp. 1289–1305, 2003.
[37] K.Han, Y.Wang, C.Zhang, C.Li, and C.Xu, “Autoencoderinspiredunsuper- vised feature selection,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2941–2945, Apr. 2018.
[38] A.Whitney, “A direct method of nonparametric measurement selection,” IEEE Transactions on Computers, vol. C-20, pp. 1100–1103, Sept. 1971.
[39] L.Hernández-Domínguez,S.Ratté,G.Sierra-Martínez,andA.Roche-Bergua, “Computer-based evaluation of Alzheimer’s disease and mild cognitive impairment patients during a picture description task,” Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, vol. 10, pp. 260–268, Jan. 2018.
[40] Y.-H. Chu, M.-W. Hsiung, C.-S. Lin, M.-H. Lee, H.-W. Wang, and W.-Y. Su, “Voice analysis in normal young men and women,” 中華民國耳鼻喉科醫學雜 誌, vol. 37, pp. 159–162, Mar 2002.
[41] F. Li, “The development of gender-specific patterns in the production of voiceless sibilant fricatives in Mandarin Chinese,” Linguistics, vol. 55, no. 5, pp. 1021–1044, 2017.
[42] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model pre-dictions,” in Neural Information Processing Systems, 2017.