簡易檢索 / 詳目顯示

研究生: 賴彥宇
Lai, Yen-Yu
論文名稱: 使用兒童故事敘述中的語音特徵通過深度學習方法檢測自閉症傾向
Using speech characteristics from children's story narratives to detect autistic tendencies through deep learning methods
指導教授: 陳良弼
Chen, Arbee L. P.
口試委員: 沈之涯
Shen, Chih-Ya
簡仁宗
Chien, Jen-Tzung
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 45
中文關鍵詞: 自閉症語音訊號處理深度學習
外文關鍵詞: Autism, Speech Signal Processing, Deep learning
相關次數: 點閱:94下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自閉症兒童患者數量不斷增加,根據世界衛生組織估計,全球每100名兒童中大約有1名患有自閉症。然而,自閉症的診斷過程非常冗長且複雜。先前的研究表明,自閉症兒童在社交環境中的語言使用能力存在缺陷,例如在講故事方面的技巧。此外,在一些特定的聲學特徵中,自閉症兒童也可能顯示出與典型發展兒童不同的模式。隨著計算機模型的發展,我們期望利用機器學習和深度神經網絡模型對兒童敘述的聲學特徵進行快速的自閉症傾向分析。這種方法有望提高診斷效率。
    在本論文中,我們根據評估和診斷自閉症的標準化工具ADOS-2下的模塊3來蒐集12名自閉症兒童和19名典型發展兒童對”Tuesday”這本圖畫書的敘述資料。接著我們透過MFCC來表示聲學特徵並利用計算機模型架構進行訓練和分類。此外,我們的資料收集過程分為兩次,第一次的收集,我們不會對小孩做過多的引導,小朋友們只需在讀完故事後大概講出故事內容即可,過程僅有當小朋友不說話時,我們會給予他提示和引導,而在第二次中,我們試著在資料收集過程中,對參與者提出更多的問題和引導,以試著讓兒童能說出更多圖畫書的內容。我們也透過模型在兩種資料收集過程中的表現探討怎樣的流程更有助於找出自閉症傾向。
    值得注意的是,我們在資料集中透過t檢定找出10個自閉症兒童和典型發展兒童之間有著顯著差異的低級描述符,這些描述符代表著語音信號的基本特徵,將MFCC結合這10個低級描述符後作為模型的輸入,並且為了讓模型更符合實際的應用,我們確保同位小朋友的說話片段不會同時出現在訓練資料和測試資料,也就是說話者獨立的資料切割方法,最終我們的模型在F1 score上達到了89.4%。在我們進一步分析這10個LLD後,發現這10個LLD確實可以表示先前研究中所發現的TD和ASD之間說話特徵的差異。這意味著我們的模型成功的透過了自閉症兒童和典型發展兒童在說話特徵上的差異去做分類。


    The number of children diagnosed with Autism Spectrum Disorder (ASD) is continually increasing. According to estimates by the World Health Organization, approximately 1 in 100 children worldwide is affected by autism. However, diagnosing autism is not straightforward; the process is lengthy and complex. Previous research has indicated that children with autism exhibit deficits in using language within social contexts, such as storytelling skills. Additionally, children with autism may display distinct patterns in certain acoustic features compared to typically developing children. With the advancement of computational models, we aim to employ deep neural network models to rapidly analyze the acoustic features of children's narratives for indications of autism. This approach is expected to enhance diagnostic efficiency.
    In this study, we collected narrative data from 12 children with autism and 19 typically developing (TD) children using Module 3 of the standardized tool ADOS-2 (Autism Diagnostic Observation Schedule, Second Edition). The children were asked to narrate the picture book "Tuesday." We then represented the acoustic features using Mel-Frequency Cepstral Coefficients (MFCCs) and employed computational model architectures for training and classification. Our data collection was conducted in two phases. In the first phase, we did not provide much guidance to the children; they only needed to roughly narrate the story after reading it. We only gave prompts and guidance when the children were not speaking. In the second phase, we introduced more questions and prompts to encourage participants to narrate more content from the picture book. We also examined how the performance of our model varied between the two data collection processes to determine which procedure was more effective in identifying tendencies indicative of autism.
    Notably, we identified 10 low-level descriptors (LLDs) in our dataset through t-tests that showed significant differences between ASD and TD. These descriptors represent the fundamental characteristics of the speech signal. We combined these 10 LLDs with MFCCs as inputs to the model. To ensure the model's practical applicability, we used a speaker-independent data splitting method, ensuring that speech clips from the same child did not appear in both the training and testing datasets. Ultimately, our model achieved an F1 score of 89.4%. Upon further analysis of these 10 LLDs, we found that they indeed represent the speech characteristic differences between ASD and TD children that have been identified in previous studies. This indicates that our model successfully classified based on the speech characteristic differences between ASD and TD children.

    摘要 i Abstract iii Acknowledgment v Table of Contents vi List of Tables viii List of Figures ix 1. Introduction 1 2. Related Work 6 2.1. Mel-Frequency Cepstral Coefficients 6 2.2. Low Level Descriptors 7 2.3. Classifying Autism from Picture Book Narratives 9 2.4. Classifying Autism from Speech Recordings 10 3. Dataset 11 4. Method 14 4.3. Acoustic Features 16 4.3.1. MFCCs 16 4.3.2 LLDs 17 4.4. Embedding Models 18 4.4.1 TDNN 18 4.4.2. LSTM 22 5. Experiments 25 5.1. Performance of different methods 25 5.2. Performance under speaker-independent data partitioning 26 5.3. The impact of different data augmentation techniques on the model 30 5.4. Comparison of the model's performance under different data collection processes 31 5.5. Model’s performance after incorporating LLDs 33 5.6. Comparison with previous study 38 6. Conclusion and Future Work 40 Reference 41

    1. American Psychiatric Association, D. and A.P. Association, Diagnostic and statistical manual of mental disorder: DSM-5. Vol. 5. 2013: American psychiatric association Washington, DC.
    2. Lord C, Risi S, DiLavore PS, Shulman C, Thurm A, Pickles A. Autism From 2 to 9 Years of Age. Arch Gen Psychiatry. 2006;63(6):694–701.
    3. Hyman, S. L., Levy, S. E., Myers, S. M., Kuo, D. Z., Apkon, S., Davidson, L. F., ... & Bridgemohan, C. (2020). Identification, evaluation, and management of children with autism spectrum disorder. Pediatrics, 145(1).
    4. Zeidan, J., Fombonne, E., Scorah, J., Ibrahim, A., Durkin, M. S., Saxena, S., ... & Elsabbagh, M. (2022). Global prevalence of autism: A systematic review update. Autism research, 15(5), 778-790.
    5. American Psychiatric Association, D. S. M. T. F., & American Psychiatric Association, D. S. (2013). Diagnostic and statistical manual of mental disorders: DSM-5 (Vol. 5, No. 5). Washington, DC: American psychiatric association.
    6. Losh, M., & Capps, L. (2003). Narrative ability in high-functioning children with autism or Asperger's syndrome. Journal of autism and developmental disorders, 33, 239-251.
    7. Capps, L., Losh, M., & Thurber, C. (2000). “The frog ate the bug and made his mouth sad”: Narrative competence in children with autism. Journal of abnormal child psychology, 28, 193-204.
    8. Colle, L., Baron-Cohen, S., Wheelwright, S., & Van Der Lely, H. K. (2008). Narrative discourse in adults with high-functioning autism or Asperger syndrome. Journal of autism and developmental disorders, 38, 28-40.
    9. Landa, R. (2000). Social language use in Asperger syndrome and high-functioning autism. Asperger syndrome, 18, 125-155.
    10. Hutchins, T. L., Prelock, P. A., & Chace, W. (2008). Test-retest reliability of a theory of mind task battery for children with autism spectrum disorders. Focus on autism and other developmental disabilities, 23(4), 195-206.
    11. Paul, R., Augustyn, A., Klin, A., & Volkmar, F. R. (2005). Perception and production of prosody by speakers with autism spectrum disorders. Journal of autism and developmental disorders, 35, 205-220.
    12. Peppé, S., McCann, J., Gibbon, F., O’Hare, A., & Rutherford, M. (2007). Receptive and expressive prosodic ability in children with high-functioning autism.
    13. Shriberg, L. D., Paul, R., McSweeny, J. L., Klin, A., Cohen, D. J., & Volkmar, F. R. (2001). Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome.
    14. Eigsti, I. M., Bennetto, L., & Dadlani, M. B. (2007). Beyond pragmatics: Morphosyntactic development in autism. Journal of autism and developmental disorders, 37, 1007-1023.
    15. Wiesner, D. (1991). Tuesday. New York, NY: Clarion Books.
    16. Lord, C., Rutter, M., DiLavore, P., Risi, S., Gotham, K., & Bishop, S. (2012). Autism diagnostic observation schedule–2nd edition (ADOS-2). Los Angeles, CA: Western Psychological Corporation.
    17. Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
    18. Peeters, G. (2004). A large set of audio features for sound description (similarity and classification) in the CUIDADO project. CUIDADO Ist Project Report, 54(0), 1-25.
    19. Cho, S., Liberman, M., Ryant, N., Cola, M., Schultz, R. T., & Parish-Morris, J. (2019, September). Automatic Detection of Autism Spectrum Disorder in Children Using Acoustic and Text Features from Brief Natural Conversations. In Interspeech (pp. 2513-2517).
    20. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
    21. Sun, R., Wong, J., Chen, E.E., & Chen, A.L.P. (2024). Using Computational Models to Detect Autistic Tendencies for Children from their Story Book Narratives. Technical Report, National Tsing Hua University.
    22. Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., & ... Liu, Q. (2019). Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351.
    23. Chi, N. A., Washington, P., Kline, A., Husic, A., Hou, C., He, C., ... & Wall, D. P. (2022). Classifying autism from crowdsourced semistructured speech recordings: machine learning model comparison study. JMIR pediatrics and parenting, 5(2), e35406.
    24. Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., ... & Ochiai, T. (2018). ESPnet: End-to-end speech processing toolkit. arXiv preprint arXiv:1804.00015.
    25. Eyben, F., Weninger, F., Gross, F., & Schuller, B. (2013, October). Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the 21st ACM international conference on Multimedia (pp. 835-838).
    26. Peddinti, V., Povey, D., & Khudanpur, S. (2015, September). A time delay neural network architecture for efficient modeling of long temporal contexts. In Interspeech (pp. 3214-3218).
    27. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20, 273-297.
    28. Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
    29. Mokgonyane, T. B., Sefara, T. J., Modipa, T. I., Mogale, M. M., Manamela, M. J., & Manamela, P. J. (2019, January). Automatic speaker recognition system based on machine learning algorithms. In 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) (pp. 141-146). IEEE.
    30. Desplanques, B., Thienpondt, J., & Demuynck, K. (2020). Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:2005.07143.
    31. Wang, Y., Huang, M., Zhu, X., & Zhao, L. (2016, November). Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 606-615).
    32. Jung, J. W., Heo, H. S., Yang, I. H., Shim, H. J., & Yu, H. J. (2018). Avoiding speaker overfitting in end-to-end dnns using raw waveform for text-independent speaker verification. extraction, 8(12), 23-24.
    33. Ko, T., Peddinti, V., Povey, D., & Khudanpur, S. (2015, September). Audio augmentation for speech recognition. In Interspeech (Vol. 2015, p. 3586).
    34. Jacewicz, E., Alexander, J. M., & Fox, R. A. (2023). Introduction to the special issue on perception and production of sounds in the high-frequency range of human speech. The Journal of the Acoustical Society of America, 154(5), 3168-3172.

    QR CODE