簡易檢索 / 詳目顯示

研究生: 曾憲泓
Tseng, Xian-Hong
論文名稱: 利用LSTM演算法基於自閉症診斷觀察量表訪談建置辨識自閉症小孩之評估系統
Using LSTM algorithm to establish an Evaluation System for Child with Autistic Disorder during Autism Diagnostic Observation Schedule Interview
指導教授: 李祈均
Lee, Chi-Chun
口試委員: 冀泰石
Chi, Tai-Shih
曹昱
Tsao, Yu
江振宇
Chiang, Chen-Yu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2017
畢業學年度: 106
語文別: 中文
論文頁數: 46
中文關鍵詞: 泛自閉症障礙自閉症診斷觀察量表長短時記憶人類行為訊號處理多模態行為
外文關鍵詞: autism spectrum disorder, autism diagnostic observation schedule, long short-term memory, behavioral signal processing, multimodal behaviors
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 泛自閉症障礙是一個高度流行的神經發育障礙的疾病,在許多醫學研究中指出其特徵有在社交互動時有困難、溝通障礙、和有限定且重複性的行為問題,導致人們與自閉症患者互動時會覺得其行為相當怪異,從泛自閉症障礙者的綜合行為徵狀表現又能區分為三種類型如:典型自閉症(AD)、亞斯伯格症(AS)、及高功能性自閉症(HFA),且這些自閉症的症狀程度及類型會經由專業醫生透過臨床觀察、輔助診斷工具後才確診,其中自閉症診斷觀察量表(ADOS)就為該領域的黃金輔助診斷標準。然而現今評量自閉症的方式有人為因素、耗時、且不易擴展的問題,本論文為實現建置辨識自閉症小孩的評估系統,透過在ADOS互動式訪談過程中的說故事片段,藉由聲音及影像的多模態行為訊號,利用LSTM演算法及機器學習模型進行建模,應用於人類行為訊號處理的概念實作,由實驗結果呈現在辨識自閉症及自閉症三種類型的辨識能力可以達到不錯的效果,此外在辨識自閉症三種類型上與使用研究助理在ADOS中評量的分數相比我們的系統效果也較為出色,透過本論文的實驗都呈現自閉症的評估系統是相當有可行性的,藉由醫療以及工程跨領域的結合,期望能夠讓自閉症診斷過程可以進一步俱備一致性、可複製性及客觀性。
    關鍵字:泛自閉症障礙、自閉症診斷觀察量表、長短時記憶、人類行為訊號處理、多模態行為


    Autism spectrum disorder (ASD) is a highly-prevalent neuraldevelopmental disorder. In medical research often characterized by social communicative deficits and restricted repetitive interest. The heterogeneous nature of ASD in its behavior manifestations encompasses broad syndromes such as, Classical Autism (AD), Asperger syndrome (AS), and High functioning Autism (HFA). To evaluate the degree and there syndromes in ASD, doctor will diagnose through clinical observation and auxiliary diagnostic tools, one of them is Autism Diagnostic Observation Schedule (ADOS), i.e., a gold standard diagnostic tool. However, there are existing some problems in diagnosis of autism such as, subjective evaluation, non-scalable, and time-consuming. In this work, we design an automatic assessment system based on computing multimodal behavior features, including acoustic characteristic、body movements of the participant, using LSTM algorithm and machine learning technique to build model during ADOS story-telling part by behavioral signal processing (BSP) concept. Further, our behavior-based measurement achieve competitive, sometimes exceeding, recognition accuracies in discriminating between three syndromes of ASD when compare to investigator’s clinical-rating on participant during ADOS.
    Keywords: autism spectrum disorder, autism diagnostic observation schedule, long short-term memory, behavioral signal processing (BSP), multimodal behaviors

    誌謝......................................i 中文摘要..................................ii Abstract.................................iii 目錄......................................iv 圖目錄....................................vi 表目錄....................................vii 第一章 緒論.................................1 第二章 資料庫介紹...........................4 第三章 研究方法.............................7 3.1 聲音特徵...............................8 3.1.1. VAD................................8 3.1.2. 人聲辨別............................9 3.1.3. 特徵擷取............................10 3.2 影像特徵...............................12 3.2.1. Action Energy......................12 3.3 Long Short-Term Memory................15 3.4 Functional............................20 3.5 Baseline..............................21 3.5.1. 詞袋模型編碼........................21 3.5.2. 費雪向量編碼........................21 3.6 分類器之學習預測模型....................24 3.7 Student t-test........................25 第四章 實驗設計與結果.......................26 4.1 實驗細節說明...........................26 4.2 實驗一 : TD vs ASD.....................30 4.3 實驗二 : AD vs AS vs HFA...............33 4.4 實驗三 : Statistical Analysis..........36 第五章 結論與未來發展.......................40 參考文獻...................................42

    [1] S. Narayanan and P. G. Georgiou, "Behavioral signal processing: Deriving human behavioral informatics from speech and language," Proceedings of the IEEE, vol. 101, pp. 1203-1233, 2013.
    [2] S.-W. Hsiao, H.-C. Sun, M.-C. Hsieh, M.-H. Tsai, H.-C. Lin, and C.-C. Lee, "A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
    [3] M. P. Black, A. Katsamanis, B. R. Baucom, C.-C. Lee, A. C. Lammert, A. Christensen, et al., "Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features," Speech Communication, vol. 55, pp. 1-21, 2013.
    [4] H.-Y. Chen, Y.-H. Liao, H.-T. Jan, L.-W. Kuo, and C.-C. Lee, "A Gaussian mixture regression approach toward modeling the affective dynamics between acoustically-derived vocal arousal score (VC-AS) and internal brain fMRI bold signal response," in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, 2016, pp. 5775-5779.
    [5] W.-C. Chen, P.-T. Lai, Y. Tsao, and C.-C. Lee, "Multimodal arousal rating using unsupervised fusion technique," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, 2015, pp. 5296-5300.
    [6] A. Metallinou, Z. Yang, C.-c. Lee, C. Busso, S. Carnicke, and S. Narayanan, "The USC CreativeIT database of multimodal dyadic interactions: From speech and full body motion capture to continuous emotional annotations," Language resources and evaluation, vol. 50, pp. 497-521, 2016.
    [7] F.-S. Tsai, Y.-L. Hsu, W.-C. Chen, Y.-M. Weng, C.-J. Ng, and C.-C. Lee, "Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions," in INTERSPEECH, 2016, pp. 92-96.
    [8] D. Bone, C.-C. Lee, A. Potamianos, and S. S. Narayanan, "An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
    [9] E. Delaherche, M. Chetouani, F. Bigouret, J. Xavier, M. Plaza, and D. Cohen, "Assessment of the communicative and coordination skills of children with autism spectrum disorders and typically developing children using social signal processing," Research in Autism Spectrum Disorders, vol. 7, pp. 741-756, 2013.
    [10] D. Bone, C.-C. Lee, M. P. Black, M. E. Williams, S. Lee, P. Levitt, et al., "The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody," Journal of Speech, Language, and Hearing Research, vol. 57, pp. 1162-1177, 2014.
    [11] R. L. Spitzer and J. B. Williams, "Diagnostic and statistical manual of mental disorders," in American Psychiatric Association, 1980.
    [12] E. K. Delinicolas and R. L. Young, "Joint attention, language, social relating, and stereotypical behaviours in children with autistic disorder," Autism, vol. 11, pp. 425-436, 2007.
    [13] J. Baio, "Prevalence of Autism Spectrum Disorders: Autism and Developmental Disabilities Monitoring Network, 14 Sites, United States, 2008. Morbidity and Mortality Weekly Report. Surveillance Summaries. Volume 61, Number 3," Centers for Disease Control and Prevention, 2012.
    [14] C. Lord, S. Risi, L. Lambrecht, E. H. Cook, B. L. Leventhal, P. C. DiLavore, et al., "The Autism Diagnostic Observation Schedule—Generic: A standard measure of social and communication deficits associated with the spectrum of autism," Journal of autism and developmental disorders, vol. 30, pp. 205-223, 2000.
    [15] C. Lord, M. Rutter, and A. Le Couteur, "Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders," Journal of autism and developmental disorders, vol. 24, pp. 659-685, 1994.
    [16] P. Boersma and D. Weenink, "Praat-A system for doing phonetics by computer [Computer Software]," The Netherlands: Institute of Phonetic Sciences, University of Amsterdam, 2003.
    [17] R. Paul, A. Augustyn, A. Klin, and F. R. Volkmar, "Perception and production of prosody by speakers with autism spectrum disorders," Journal of autism and developmental disorders, vol. 35, pp. 205-220, 2005.
    [18] D. Ververidis and C. Kotropoulos, "Emotional speech recognition: Resources, features, and methods," Speech communication, vol. 48, pp. 1162-1181, 2006.
    [19] B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, et al., "librosa: Audio and music signal analysis in python," in Proceedings of the 14th python in science conference, 2015, pp. 18-25.
    [20] P. Boersma, "Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound," in Proceedings of the institute of phonetic sciences, 1993, pp. 97-110.
    [21] M. K. Sönmez, L. Heck, M. Weintraub, E. Shriberg, M. Kemal, S. Larry, et al., "A lognormal tied mixture model of pitch for prosody-based speaker recognition," 1997.
    [22] M. Farrús, "Jitter and shimmer measurements for speaker recognition," in 8th Annual Conference of the International Speech Communication Association; 2007 Aug. 27-31; Antwerp (Belgium).[place unknown]: ISCA; 2007. p. 778-81., 2007.
    [23] H. Wang, A. Kläser, C. Schmid, and C.-L. Liu, "Action recognition by dense trajectories," in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, pp. 3169-3176.
    [24] H. Wang and C. Schmid, "Action recognition with improved trajectories," in Proceedings of the IEEE international conference on computer vision, 2013, pp. 3551-3558.
    [25] A. Tamrakar, S. Ali, Q. Yu, J. Liu, O. Javed, A. Divakaran, et al., "Evaluation of low-level features and their combinations for complex event detection in open source videos," in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2012, pp. 3681-3688.
    [26] J. Sun, Y. Mu, S. Yan, and L.-F. Cheong, "Activity recognition using dense long-duration trajectories," in Multimedia and Expo (ICME), 2010 IEEE International Conference on, 2010, pp. 322-327.
    [27] L. Baraldi, F. Paci, G. Serra, L. Benini, and R. Cucchiara, "Gesture recognition in ego-centric videos using dense trajectories and hand segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014, pp. 688-693.
    [28] F. Rosenblatt, "The perceptron: A probabilistic model for information storage and organization in the brain," Psychological review, vol. 65, p. 386, 1958.
    [29] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Cognitive modeling, vol. 5, p. 1, 1988.
    [30] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, pp. 504-507, 2006.
    [31] R. J. Williams and D. Zipser, "A learning algorithm for continually running fully recurrent neural networks," Neural computation, vol. 1, pp. 270-280, 1989.
    [32] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, pp. 1735-1780, 1997.
    [33] Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, vol. 5, pp. 157-166, 1994.
    [34] S. Petridis and M. Pantic, "Deep complementary bottleneck features for visual speech recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, 2016, pp. 2304-2308.
    [35] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, "Image classification with the fisher vector: Theory and practice," International journal of computer vision, vol. 105, pp. 222-245, 2013.
    [36] C. Cortes and V. Vapnik, "Support vector machine," Machine learning, vol. 20, pp. 273-297, 1995.
    [37] B. Waske and J. A. Benediktsson, "Fusion of support vector machines for classification of multisensor data," IEEE Transactions on Geoscience and Remote Sensing, vol. 45, pp. 3858-3866, 2007.
    [38] D. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.

    QR CODE