簡易檢索 / 詳目顯示

研究生: 陳金博
CHEN, CHIN-PO
論文名稱: 使用語音技術量化研究自閉症光譜
Using speech-based technology to characterize autistic traits of autism spectrum disorder
指導教授: 李祈均
Lee, Chi-Chun
口試委員: 劉奕汶
Liu, Yi-Wen
曹昱
Yu, Tsao
黃元豪
Huang, Yuan-Hao
高淑芬
Gau, Shur-Fen
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 106
中文關鍵詞: 自閉症語音訊號處理深度學習
外文關鍵詞: autism spectrum disorder, speech signal processing, deep learning
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 人在溝通會把社交技能用在讓溝通更便捷的目的上,會展現一些特殊的
    模式像是 accomodation。這些特殊的模式如果將它用 sensor 記錄下來,以
    訊號的角度來看也能觀察到類似現象。甚至能更細微地看到人常會忽略
    的地方。自閉症是一個被人稱做溝通與社交障礙的疾病,是一群被視為溝
    通上有缺陷的人,至今為止我們都以醫療的角度來描述這個疾病,可能是
    因為缺乏更細微的方式來定義自閉症導致到現在自閉症的定義都還不完
    善,異質性高且診斷定義容易更換。因此這個研究想要以訊號的角度,透
    過生成 speech analytics 來分析表達自閉症的 trait,我們首先從 non-verbal
    的 analytics 來探討自閉症三個子類別之間的些微差異,我們接著藉由量化
    speech 跟 language 去探討自閉症溝通能力上的幾個子向度的強弱差異,最
    後我們藉由觀察 Vowel space 上的特性來探討自閉症與正常人的區別,以及
    嚴重與輕微自閉症的差異。我們的研究在以上三個子課題都發現 insights,
    這些 insights 提供了我們以另一種角度來看自閉症光譜的特殊行為特徵。
    這篇論文的主要貢獻有兩個方面:語音算法的開發和語音分析所帶來的
    洞見。在每個子主題中,我們都開發了語音算法來生成語音分析。除了對
    自閉症特徵有更好的理解之外,這些語音分析還是數值,可以輸入到下游
    的機器學習分類器中。這個特性使得在臨床場景中能夠進行自動評估成為
    可能。因此,這些分析具有潛力被整合到邊緣設備中,以成為高效而可靠
    的評估工具。此外,疾病進展追蹤和在生活中的自閉症風險篩查都是仍然
    缺乏的潛在應用。至於從語音分析中獲得的洞見,它們提供了一個新的角
    度來討論自閉症特徵。例如,我們發現了一個特殊的轉換結構適用於高功
    能自閉症。因此,轉換互動的過程以及造成這一現象的原因都值得未來深入
    探討。有了先進的工具和多角度來表徵自閉症特徵,我們可以更深入地
    研究 ASD,並指定具體的行為表現形式以更好地理解。


    In face-to-face communication, people have developed rich communication
    skills to convey ideas and intentions for smoother interactions. Under this sce-
    nario, specific patterns such as accommodation in speech tone or syntax can be
    observed. It is possible to characterize these patterns by recording and analyzing
    them. Moreover, from a signal perspective, the analytics characterizing human be-
    havior can be beyond human imagination. For example, prosodic accommodation
    manifests in the duration of each utterance, voice quality, and voice intensity.
    Autism spectrum disorder (ASD) is a neural-developmental disorder featuring
    communication, social reciprocity, and stereotyped behavior. The current defi-
    nition of ASD is still vague, and diagnostic criteria change when versions of the
    Diagnostic and Statistical Manual of Mental Disorders (DSM) are updated. Under-
    standing the disease progression of autism is challenging because of its high het-
    erogeneity. However, researchers have been studying ASD from a clinical angle.
    ASD study from a signal-level point of view still falls short. With the advanced
    technology that allows us to process and analyze signals from speech-language
    sensors like microphones, it would be interesting to study autistic traits with com-
    putational approaches.
    Therefore, this study aims to analyze traits of ASD through signal processing,
    utilizing speech analytics. In our first study, we explore subtle differences among
    three subtypes of autism through non-verbal analytics. Then, we developed ver-
    bal speech analytics to characterize speech and language involved in dialogue and
    found their relationship with several psychological constructs about communica-tion deficits. Lastly, we investigate differences between autism and typical devel-
    oping people, as well as differences between severe and mild cases of autism, by
    observing characteristics in the vowel space.
    There are two major contributions to this speech and language: speech algo-
    rithm development and insights from speech analytics analyses. In each subtopic,
    we develop speech algorithms to generate speech analytics. Besides a better under-
    standing of autistic traits, these speech analytics are numerical values and can be
    input to downstream machine learning classifiers. This property enables automatic
    assessments in clinical scenarios. Hence, these analytics have the potential to be
    integrated into edge devices to perform efficient and reliable assessment tools. In
    addition, Disease progression tracking and into-life ASD risk screening are all pos-
    sible applications that are still lacking. As for the insights from speech analytics
    analyses, they provide a new point of view to discuss autistic traits. For exam-
    ple, we found a special turn-taking structure for high-functioning autism. Hence,
    how the turn-taking interaction goes and what causes this phenomenon is worthy
    of future investigation. With advanced tools and multiple angles to characterize
    autistic traits, we can push deeper into the studies of ASD and specify the concrete
    behavioral manifestation for better understanding.

    摘要 Abstract Acknowledgements 1 Introduction 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1.1 Research goal and contribution . . . . . . . . . . . . . . . . . . . . . .1 1.2 Dissertation organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 2Research Methodology 5 2.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 5 2.1.1 The Autism Diagnostic Observation Schedule (ADOS). . . . . . . . . . . . . . 5 2.1.2 Audio-Video Data Collection . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 The behavior characterizing scaffold . . . . . . . . . . . . . .. . . . . . 10 3 Toward Differential Diagnosis of Autism Spectrum Disorder using Multimodal Behavior Descriptors and Executive Functions. . . . 13 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 3.2 Clinical labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 3.2.1 Clinical measurement of behavior ratings – ADOS . . . . . . . . . . .18 3.2.2 Clinical measurement of executive function – CANTAB . . . . . . . .19 3.3 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 3.3.1 Audio-Video Low-Level Descriptors (LLDs) . . . . . . . . . . . . . .20 3.3.2 Segment-Level Features . . . . . . . . . . . . . . . . . . . . . . . . .23 3.3.3 Session-Level Features . . . . . . . . . . . . . . . . . . . . . . . . . .24 3.4 Experimental Setup and Results . . . . . . . . . . . . . . . . . . . . . . . . .24 3.4.1 Data cohort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 3.4.2 Experiment I Results and Discussions . . . . . . . . . . . . . . . . . .25 3.4.3 Experiment II Results and Discussions . . . . . . . . . . . . . . . . . .29 3.4.4 Experiment III Results . . . . . . . . . . . . . . . . . . . . . . . . . .30 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 3.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34 4 Learning Converse-level Multimodal Embedding to Assess Social Deficit Severity for Autism Spectrum Disorder. . . . . . . . . . . . . . . . . . . . . . . . .39 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Data cohort . . . . . . . . . . . . . . . . . . . . .. . . . . . . .41 4.3 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .42 4.3.1 Converse-level Unit Definition . . . . . . . . . . . . . .. . . . . . .43 4.3.2 Converse-level Embedding (Lex) . . . . . . . . . . . . . . . . . . . .43 4.3.3 Converse-level Embedding (Acous) . . . . . . . . . . . . . . . . . . . .44 4.3.4 GRU with Attentive DNN Fusion Network . . . . . . . . . . . . . . . .45 4.4 Experimental Setup and Results . . . . . . . . . . . . . . . . . . . . . . . . . .46 4.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 4.4.2 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 4.4.3 Analysis of Turn Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49 4.4.4 Analysis of dialogue perplexity . . . . . . . . . . . . . . . . . . . . . . . .49 4.4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50 5 Using Measures of Vowel Space for Autistic Traits Characterization 51 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 5.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . .54 5.2.1 Speech production-related communication impairment . . . . . . . . .55 5.2.2 Interaction-oriented social reciprocity deficits . . . . . . . . . . . . . 56 5.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57 5.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 59 5.3.2 Utterance-level VSC features . . . . . . . . . . . . . . . 60 5.3.3 Conversation-level VSC features . . . . . . . . . . . . . . 63 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67 5.4.1 Data cohort . . . . . . . . . . . . . . . . . . . . . . . . .67 5.4.2 Definition of experimental parameters . . . . . . . . . . .68 5.4.3 Model explanation through SHAPley analysis . . . . . . .69 5.4.4 Additional features for this study . . . . . . . . . . . . . .73 5.4.5 Experiment1: classification of ASD/TD . . . . . . . . . .73 5.4.6 Analysis of the classification tasks . . . . . . . . . . . . .75 5.4.7 Experiment2: regression of communication deficit score .79 5.4.8 Analysis of the regression tasks . . . . . . . . . . . . . .80 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87 5.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88 6 Limitation and Conclusion 95 6.1 Limitations about this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99

    [1] T. W. Robbins, M. James, A. M. Owen, B. J. Sahakian, L. McInnes, and P. Rabbitt,
    “Cambridge neuropsychological test automated battery (cantab): a factor analytic study
    of a large sample of normal elderly volunteers,” Dementia and Geriatric Cognitive Dis-
    orders, vol. 5, no. 5, pp. 266–281, 1994.
    [2] C.-P. Chen, S. S.-F. Gau, and C.-C. Lee, “Learning converse-level multimodal embedding
    to assess social deficit severity for autism spectrum disorder,” in 2020 IEEE International
    Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, 2020.
    [3] H. Giles and P. Powesland, “Accommodation theory,” Sociolinguistics: A reader,
    pp. 232–239, 1997.
    [4] A. Gorbyleva, “Prosodic interaction models in a conversation,” in International Confer-
    ence on Speech and Computer, pp. 380–388, Springer, 2023.
    [5] J. Cakir, R. E. Frye, and S. J. Walker, “The lifetime social cost of autism: 1990–2029,”
    RASD, vol. 72, p. 101502, 2020.
    [6] G. Xu, L. Strathearn, B. Liu, and W. Bao, “Prevalence of autism spectrum disorder among
    us children and adolescents, 2014-2016,” Jama, vol. 319, no. 1, pp. 81–82, 2018.
    [7] S. Qiu, Y. Lu, Y. Li, J. Shi, H. Cui, Y. Gu, Y. Li, W. Zhong, X. Zhu, Y. Liu, et al., “Preva-
    lence of autism spectrum disorder in asia: A systematic review and meta-analysis,” Psy-
    chiatry research, vol. 284, p. 112679, 2020.
    [8] A. P. Association, A. P. Association, et al., “Diagnostic and statistical manual of mental
    disorders: Dsm-5,” Arlington, VA, 2013.
    [9] J. C. Wakefield, “Diagnostic issues and controversies in dsm-5: return of the false posi-
    tives problem,” Annual review of clinical psychology, vol. 12, pp. 105–132, 2016.
    [10] J. N. Constantino and C. P. Gruber, Social responsiveness scale: SRS-2. Western Psy-
    chological Services Torrance, CA, 2012.
    [11] L. C. Eaves, H. D. Wingert, H. H. Ho, and E. C. Mickelson, “Screening for autism spec-
    trum disorders with the social communication questionnaire,” Journal of Developmental
    & Behavioral Pediatrics, vol. 27, no. 2, pp. S95–S103, 2006.
    [12] C. Lord, S. Risi, L. Lambrecht, E. H. Cook, B. L. Leventhal, P. C. DiLavore, A. Pick-
    les, and M. Rutter, “The autism diagnostic observation schedule—generic: A standard
    measure of social and communication deficits associated with the spectrum of autism,”
    Journal of autism and developmental disorders, vol. 30, no. 3, pp. 205–223, 2000.
    [13] E. Schopler, M. D. Lansing, R. J. Reichler, and L. M. Marcus, PEP-3, Psychoeducational
    Profile. Pro-ed, 2005.
    [14] S. Narayanan and P. G. Georgiou, “Behavioral signal processing: Deriving human be-
    havioral informatics from speech and language,” Proceedings of the IEEE, vol. 101,
    no. 5, pp. 1203–1233, 2013.
    [15] P. J. Fray, T. W. Robbins, and B. J. Sahakian, “Neuorpsychiatyric applications of cantab.,”
    International journal of geriatric psychiatry, vol. 11, no. 4, 1996.
    [16] R. Landa, “Early communication development and intervention for children with
    autism,” Developmental Disabilities Research Reviews, vol. 13, no. 1, pp. 16–25, 2007.
    [17] C. Lord, E. H. Cook, B. L. Leventhal, and D. G. Amaral, “Autism spectrum disorders,”
    Neuron, vol. 28, no. 2, pp. 355–363, 2000.
    [18] G. W. H. Organization., The ICD-10 classification of mental and behavioural disorders:
    clinical descriptions and diagnostic guidelines, vol. 1. World Health Organization, 1992.
    [19] C. Gillberg, C. Gillberg, M. Råstam, and E. Wentz, “The asperger syndrome (and high-
    functioning autism) diagnostic interview (asdi): a preliminary study of a new structured
    clinical interview,” Autism, vol. 5, no. 1, pp. 57–66, 2001.
    [20] S. Ozonoff, I. Cook, H. Coon, G. Dawson, R. M. Joseph, A. Klin, W. M. McMahon,
    N. Minshew, J. A. Munson, B. F. Pennington, et al., “Performance on cambridge neu-
    ropsychological test automated battery subtests sensitive to frontal lobe function in peo-
    ple with autistic disorder: evidence from the collaborative programs of excellence in
    autism network,” Journal of autism and developmental disorders, vol. 34, no. 2, pp. 139–150, 2004.
    [21] L. Bennetto, B. F. Pennington, and S. J. Rogers, “Intact and impaired memory functions
    in autism,” Child development, vol. 67, no. 4, pp. 1816–1835, 1996.
    [22] S. Ozonoff and J. Jensen, “Brief report: Specific executive function profiles in three
    neurodevelopmental disorders,” Journal of autism and developmental disorders, vol. 29,
    no. 2, pp. 171–177, 1999.
    [23] M. Prior and W. Hoffmann, “Brief report: Neuropsychological testing of autistic children
    through an exploration with frontal lobe tests,” Journal of autism and developmental
    disorders, vol. 20, no. 4, pp. 581–590, 1990.
    [24] J. E. Russell, Autism as an executive disorder. Oxford University Press, 1997.
    [25] E. L. Hill, “Executive dysfunction in autism,” Trends in cognitive sciences, vol. 8, no. 1,
    pp. 26–32, 2004.
    [26] S. Ozonoff, M. South, and J. N. Miller, “Dsm-iv-defined asperger syndrome: Cogni-
    tive, behavioral and early history differentiation from high-functioning autism,” Autism,
    vol. 4, no. 1, pp. 29–46, 2000.
    [27] S. D. Steele, N. J. Minshew, B. Luna, and J. A. Sweeney, “Spatial working mem-
    ory deficits in autism,” Journal of autism and developmental disorders, vol. 37, no. 4,
    pp. 605–612, 2007.
    [28] A. P. Association, Diagnostic and statistical manual of mental disorders (4th ed., text
    rev.). Washington, DC: Author, 1994.
    [29] L. Wing, “Asperger’s syndrome: a clinical account,” Psychological medicine, vol. 11,
    no. 1, pp. 115–129, 1981.
    [30] M. Ghaziuddin and L. Gerstein, “Pedantic speaking style differentiates asperger syn-
    drome from high-functioning autism,” Journal of autism and developmental disorders,
    vol. 26, no. 6, pp. 585–595, 1996.
    [31] M. Ghaziuddin, “Brief report: Should the dsm v drop asperger syndrome?,” Journal of
    autism and developmental disorders, vol. 40, no. 9, pp. 1146–1148, 2010.
    [32] D. Bone, C.-C. Lee, T. Chaspari, J. Gibson, and S. Narayanan, “Signal processing and
    machine learning for mental health research and clinical applications [perspectives],”
    IEEE Signal Processing Magazine, vol. 34, no. 5, pp. 196–195, 2017.
    [33] C.-C. Lee, A. Katsamanis, M. P. Black, B. R. Baucom, A. Christensen, P. G. Geor-
    giou, and S. S. Narayanan, “Computing vocal entrainment: A signal-derived pca-based
    quantification scheme with application to affect analysis in married couple interactions,”
    Computer Speech & Language, vol. 28, no. 2, pp. 518–539, 2014.
    [34] M. Reblin, R. E. Heyman, L. Ellington, B. R. Baucom, P. G. Georgiou, and S. T. Vada-
    parampil, “Everyday couples'communication research: Overcoming methodological
    barriers with technology,” Patient education and counseling, vol. 101, no. 3, pp. 551–
    556, 2018.
    [35] M. Nasir, B. Baucom, S. Narayanan, and P. Georgiou, “Towards an unsupervised en-
    trainment distance in conversational speech using deep neural networks,” arXiv preprint
    arXiv:1804.08782, 2018.
    [36] B. Xiao, P. G. Georgiou, Z. E. Imel, D. C. Atkins, and S. Narayanan, “Modeling thera-
    pist empathy and vocal entrainment in drug addiction counseling.,” in INTERSPEECH,
    pp. 2861–2865, 2013.
    [37] C.-P. Chen, X.-H. Tseng, S. S.-F. Gau, and C.-C. Lee, “Computing multimodal dyadic
    behaviors during spontaneous diagnosis interviews toward automatic categorization of
    autism spectrum disorder,” in Proc. Interspeech 2017, pp. 2361–2365, 2017.
    [38] S. S.-F. Gau and C.-Y. Shang, “Executive functions as endophenotypes in adhd: evi-
    dence from the cambridge neuropsychological test battery (cantab),” Journal of Child
    Psychology and Psychiatry, vol. 51, no. 7, pp. 838–849, 2010.
    [39] Y.-L. Chien, S.-F. Gau, C.-Y. Shang, Y.-N. Chiu, W.-C. Tsai, and Y.-Y. Wu, “Visual mem-
    ory and sustained attention impairment in youths with autism spectrum disorders,” Psy-
    chological medicine, vol. 45, no. 11, pp. 2263–2273, 2015.
    [40] C. Hughes, J. Russell, and T. W. Robbins, “Evidence for executive dysfunction in
    autism,” Neuropsychologia, vol. 32, no. 4, pp. 477–492, 1994.
    [41] H. Wang and C. Schmid, “Action recognition with improved trajectories,” in IEEE In-
    ternational Conference on Computer Vision, (Sydney, Australia), 2013.
    [42] D. Roy, C. K. Mohan, and K. S. R. Murty, “Action recognition based on discrimina-
    tive embedding of actions using siamese networks,” in 2018 25th IEEE International
    Conference on Image Processing (ICIP), pp. 3473–3477, IEEE, 2018.
    [43] L. Wang, Y. Qiao, and X. Tang, “Action recognition with trajectory-pooled deep-
    convolutional descriptors,” in Proceedings of the IEEE conference on computer vision
    and pattern recognition, pp. 4305–4314, 2015.
    [44] P. . Boersma, “Praat, a system for doing phonetics by computer,” Glot international
    5:9/10, vol. 5, 2002.
    [45] C. Busso, S. Lee, and S. Narayanan, “Analysis of emotionally salient aspects of fun-
    damental frequency for emotion detection,” IEEE transactions on audio, speech, and
    language processing, vol. 17, no. 4, pp. 582–596, 2009.
    [46] J. Hillenbrand and R. A. Houde, “Acoustic correlates of breathy vocal quality: dysphonic
    voices and continuous speech,” Journal of Speech, Language, and Hearing Research,
    vol. 39, no. 2, pp. 311–321, 1996.
    [47] B. Halberstam, “Acoustic and perceptual parameters relating to connected speech are
    more reliable measures of hoarseness than parameters relating to sustained vowels,”
    ORL, vol. 66, no. 2, pp. 70–73, 2004.
    [48] A. Mcallister, J. Sundberg, and S. R. Hibi, “Acoustic measurements and perceptual eval-
    uation of hoarseness in children’s voices,” Logopedics Phoniatrics Vocology, vol. 23,
    no. 1, pp. 27–38, 1998.
    [49] D. Bone, C.-C. Lee, M. P. Black, M. E. Williams, S. Lee, P. Levitt, and S. Narayanan,
    “The psychologist as an interlocutor in autism spectrum disorder assessment: Insights
    from a study of spontaneous prosody,” Journal of Speech, Language, and Hearing Re-
    search, vol. 57, no. 4, pp. 1162–1177, 2014.
    [50] M. Wilson and T. P. Wilson, “An oscillator model of the timing of turn-taking,” Psycho-
    nomic bulletin & review, vol. 12, no. 6, pp. 957–968, 2005.
    [51] Z. Warren, M. L. McPheeters, N. Sathe, J. H. Foss-Feig, A. Glasser, and J. Veenstra-
    VanderWeele, “A systematic review of early intensive intervention for autism spectrum
    disorders,” Pediatrics, vol. 127, no. 5, pp. e1303–e1311, 2011.
    [52]L. A. LeBlanc, A. M. Coates, S. Daneshvar, M. H. Charlop-Christy, C. Morris, and B. M.
    Lancaster, “Using video modeling and reinforcement to teach perspective-taking skills to
    children with autism,” Journal of applied behavior analysis, vol. 36, no. 2, pp. 253–257,
    2003.
    [53] J. Brok and E. Barakova, “Engaging autistic children in imitation and turn-taking games
    with multiagent system of interactive lighting blocks,” Entertainment Computing-ICEC
    2010, pp. 115–126, 2010.
    [54] M. Goudbeek and K. Scherer, “Beyond arousal: Valence and potency/control cues in the
    vocal expression of emotion,” The Journal of the Acoustical Society of America, vol. 128,
    no. 3, pp. 1322–1336, 2010.
    [55]C. Adams, J. Green, A. Gilchrist, and A. Cox, “Conversational behaviour of children with
    asperger syndrome and conduct disorder,” Journal of Child Psychology and Psychiatry,
    vol. 43, no. 5, pp. 679–690, 2002.
    [56] F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, L. Y. Devillers,
    J. Epps, P. Laukka, S. S. Narayanan, et al., “The geneva minimalistic acoustic parameter
    set (gemaps) for voice research and affective computing,” IEEE Transactions on Affective
    Computing, vol. 7, no. 2, pp. 190–202, 2016.
    [57] N. Matsuura, M. Ishitobi, S. Arai, K. Kawamura, M. Asano, K. Inohara, T. Narimoto,
    Y. Wada, M. Hiratani, and H. Kosaka, “Distinguishing between autism spectrum disor-
    der and attention deficit hyperactivity disorder by using behavioral checklists, cognitive
    assessments, and neuropsychological test battery,” Asian journal of psychiatry, vol. 12,
    pp. 50–57, 2014.
    [58] A. Kushki, J. Brian, A. Dupuis, and E. Anagnostou, “Functional autonomic nervous
    system profile in children with autism spectrum disorder,” Molecular autism, vol. 5,
    no. 1, p. 39, 2014.
    [59] S. Ozonoff and D. L. Strayer, “Further evidence of intact working memory in autism,”
    Journal of autism and developmental disorders, vol. 31, no. 3, pp. 257–263, 2001.
    [60] H. B. Garretson, D. Fein, and L. Waterhouse, “Sustained attention in children with
    autism,” Journal of autism and developmental disorders, vol. 20, no. 1, pp. 101–114,
    1990.
    [61] A. B. Sereno and S. C. Amador, “Attention and memory-related responses of neurons
    in the lateral intraparietal area during spatial and shape-delayed match-to-sample tasks,”
    Journal of neurophysiology, vol. 95, no. 2, pp. 1078–1098, 2006.
    [62] S.-F. Chen, Y.-L. Chien, C.-T. Wu, C.-Y. Shang, Y.-Y. Wu, and S. Gau, “Deficits in execu-
    tive functions among youths with autism spectrum disorders: an age-stratified analysis,”
    Psychological medicine, vol. 46, no. 8, pp. 1625–1638, 2016.
    [63] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirec-
    tional transformers for language understanding,” arXiv:1810.04805v2, 2018.
    [64] C. May, A. Wang, S. Bordia, S. R. Bowman, and R. Rudinger, “On measuring social
    biases in sentence encoders,” arXiv:1903.10561v1, 2019.
    [65] Y. Qiao, C. Xiong, Z. Liu, and Z. Liu, “Understanding the behaviors of bert in ranking,”
    arXiv:1904.07531v4, 2019.
    [66] F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-
    source audio feature extractor,” in Proceedings of the 18th ACM international conference
    on Multimedia, pp. 1459–1462, ACM, 2010.
    [67] L. Logeswaran and H. Lee, “An efficient framework for learning sentence representa-
    tions,” arXiv:1803.02893v1, 2018.
    [68] D. P. Kingma and J. Ba, “Adam:A method for stochastic optimization,”arXiv:1412.6980v9, 2014.
    [69] I. Loshchilov and F. Hutter,“Decoupled weigh decayregularization,”arXiv:1711.05101v3, 2018.
    [70] N. Rogge and J. Janssen, “The economic costs of autism spectrum disorder: A literature
    review,” JADD, vol. 49, no. 7, pp. 2873–2900, 2019.
    [71] M.-H. Chen, H.-T. Wei, L.-C. Chen, T.-P. Su, Y.-M. Bai, J.-W. Hsu, K.-L. Huang, W.-H.
    Chang, T.-J. Chen, and Y.-S. Chen, “Autistic spectrum disorder, attention deficit hyperac-
    tivity disorder, and psychiatric comorbidities: A nationwide study,” Research in Autism
    Spectrum Disorders, vol. 10, pp. 1–6, 2015.
    [72] D. Bone, J. Mertens, E. Zane, S. Lee, S. S. Narayanan, and R. B. Grossman, “Acoustic-
    prosodic and physiological response to stressful interactions in children with autism spec-
    trum disorder.,” in INTERSPEECH, pp. 147–151, 2017.
    [73] D. Bone, S. Bishop, R. Gupta, S. Lee, and S. S. Narayanan, “Acoustic-prosodic and
    turn-taking features in interactions with children with neurodevelopmental disorders.,”
    in INTERSPEECH, pp. 1185–1189, 2016.
    [74] M. Li, D. Tang, J. Zeng, T. Zhou, H. Zhu, B. Chen, and X. Zou, “An automated assess-
    ment framework for atypical prosody and stereotyped idiosyncratic phrases related to
    autism spectrum disorder,” Computer Speech & Language, vol. 56, pp. 80–94, 2019.
    [75] J. A. Richards, D. Xu, and J. Gilkerson, “Development and performance of the lena
    automatic autism screen,” Lena Foundation, 2010.
    [76] C. Guo, F. Chen, Y. Chang, and J. Yan, “Applying random forest classification to diag-
    nose autism using acoustical voice-quality parameters during lexical tone production,”
    Biomedical Signal Processing and Control, vol. 77, p. 103811, 2022.
    [77] J. Bishop, C. Zhou, K. Antolovic, L. Grebe, K. H. Hwang, G. Imaezue, E. Kistanova,
    K. E. Lee, K. Paulino, and S. Zhang, “Brief report: Autistic traits predict spectral corre-
    lates of vowel intelligibility for female speakers,” Journal of Autism and Developmental
    Disorders, pp. 1–6, 2021.
    [78]M. Kissine, P. Geelhand, M. Philippart De Foy, B. Harmegnies, and G. Deliens, “Phonetic
    inflexibility in autistic adults,” Autism Research, vol. 14, no. 6, pp. 1186–1196, 2021.
    [79] R. Fusaroli, A. Lambrechts, D. Bang, D. M. Bowler, and S. B. Gaigg, “Is voice a marker
    for autism spectrum disorder? a systematic review and meta-analysis,” Autism Research,
    vol. 10, no. 3, pp. 384–407, 2017.
    [80] M. Kissine and P. Geelhand, “Brief report: Acoustic evidence for increased articulatory
    stability in the speech of adults with autism spectrum disorder,” Journal of autism and
    developmental disorders, vol. 49, no. 6, pp. 2572–2580, 2019.
    [81] T. Talkar, J. R. Williamson, D. J. Hannon, H. M. Rao, S. Yuditskaya, K. T. Claypool,
    D. Sturim, L. Nowinski, H. Saro, C. Stamm, et al., “Assessment of speech and fine
    motor coordination in children with autism spectrum disorder,” IEEE Access, vol. 8,
    pp. 127535–127545, 2020.
    [82] L. McKeever, J. Cleland, J. Delafield-Butt, S. Fuchs, J. Cleland, and A. Rochet-Capellan,
    “Aetiology of speech sound errors in autism,” in Speech Production and Perception:
    Learning and Memory, 2018.
    [83] B. A. Lippke, S. E. Dickey, J. W. Selmar, and A. L. Soder, PAT-3: Photo Articulation
    Test. Pro-Ed, 1997.
    [84] K. Ochi, N. Ono, K. Owada, M. Kojima, M. Kuroda, S. Sagayama, and H. Yamasue,
    “Quantification of speech and synchrony in the conversation of adults with autism spec-
    trum disorder,” PloS one, vol. 14, no. 12, p. e0225377, 2019.
    [85] H. Lehnert-LeHouillier, S. Terrazas, and S. Sandoval, “Prosodic entrainment in conver-
    sations of verbal children and teens on the autism spectrum,” Frontiers in Psychology,
    p. 2718, 2020.
    [86] J. Kruyt and Š. Beňuš, “Prosodic entrainment in individuals with autism spectrum disor-
    der.,” Topics in Linguistics, vol. 22, no. 2, 2021.
    [87] S. Sapir, C. Fox, J. Spielman, and L. Ramig, “Acoustic metrics of vowel articula-
    tion in parkinson’s disease: Vowel space area (vsa) vs. vowel articulation index (vai),”
    MAVEBA, pp. 173–175, 2011.
    [88] C. DiCanio, H. Nam, J. D. Amith, R. C. García, and D. H. Whalen, “Vowel variability
    in elicited versus spontaneous speech: Evidence from mixtec,” Journal of Phonetics,
    vol. 48, pp. 45–59, 2015.
    [89] M. K. Belmonte, T. Saxena-Chandhok, R. Cherian, R. Muneer, L. George, and
    P. Karanth, “Oral motor deficits in speech-impaired children with autism,” Frontiers
    in Integrative Neuroscience, vol. 7, p. 47, 2013.
    [90] J. P. McCleery, N. A. Elliott, D. S. Sampanis, and C. A. Stefanidou, “Motor development
    and motor resonance difficulties in autism: relevance to early intervention for language
    and communication skills,” Frontiers in integrative neuroscience, vol. 7, p. 30, 2013.
    [91] C. J. Wynn, E. R. Josephson, and S. A. Borrie, “An examination of articulatory precision
    in autistic children and adults,” JSLHR, pp. 1–10, 2022.
    [92] R. H. Gálvez, L. Gauder, J. Luque, and A. Gravano, “A unifying framework for mod-
    eling acoustic/prosodic entrainment: definition and evaluation on two large corpora,” in
    SIGDIAL, pp. 215–224, 2020.
    [93] J. D. V. Quiros, O. Kapcak, H. Hung, and L. Cabrera-Quiros, “Individual and joint body
    movement assessed by wearable sensing as a predictor of attraction in speed dates,”
    TAFC, 2021.
    [94] P. Boersma, “Praat, a system for doing phonetics by computer,” Glot. Int., vol. 5, no. 9,
    pp. 341–345, 2001.
    [95] F. Wu, L. P. García-Perera, D. Povey, and S. Khudanpur, “Advances in Automatic Speech
    Recognition for Child Speech Using Factored Time Delay Neural Network,” in Proc.
    Interspeech 2019, pp. 1–5, 2019.
    [96] Y.-F. Liao, Y.-H. S. Chang, Y.-C. Lin, W.-H. Hsu, M. Pleva, and J. Juhar, “Formosa
    speech in the wild corpus for improving taiwanese mandarin speech-enabled human-
    computer interaction,” Journal of Signal Processing Systems, vol. 92, no. 8, pp. 853–873,
    2020.
    [97] B. M. Lobanov, “Classification of russian vowels spoken by different speakers,” The
    Journal of the Acoustical Society of America, vol. 49, no. 2B, pp. 606–608, 1971.
    [98] N. Flynn, “Comparing vowel formant normalisation procedures,” York Papers in Lin-
    guistics Series, vol. 2, no. 11, pp. 1–28, 2011.
    [99] R. D. Kent and H. K. Vorperian, “Static measurements of vowel formant frequencies and
    bandwidths: A review,” JCD, vol. 74, pp. 74–97, 2018.
    [100] N. Roy, S. L. Nissen, C. Dromey, and S. Sapir, “Articulatory changes in muscle ten-
    sion dysphonia: evidence of vowel space expansion following manual circumlaryngeal
    therapy,” JCD, vol. 42, no. 2, pp. 124–135, 2009.
    [101] S. S. Wilks, “Multidimensional statistical scatter,” Contributions to probability and
    statistics, pp. 486–503, 1960.
    [102] C.-P. Chen, S. S.-F. Gau, and C.-C. Lee, “Toward differential diagnosis of autism spec-
    trum disorder using multimodal behavior descriptors and executive functions,” Computer
    Speech & Language, vol. 56, pp. 17–35, 2019.
    [103] C. J. Wynn and S. A. Borrie, “Classifying conversational entrainment of speech behavior:
    An expanded framework and review,” Journal of Phonetics, vol. 94, p. 101173, 2022.
    [104] M. Sundararajan and A. Najmi, “The many shapley values for model explanation,” in
    ICML (H. D. III and A. Singh, eds.), vol. 119 of Proceedings of Machine Learning Re-
    search, pp. 9269–9278, PMLR, 13–18 Jul 2020.
    [105] S. Weidman, M. Breen, and K. C. Haydon, “Prosodic speech entrainment in romantic
    relationships,” in proceedings of Speech Prosody, pp. 508–512, 2016.

    QR CODE