研究生: |
劉于碩 Liu, Yu-Shuo |
---|---|
論文名稱: |
利用長短期記憶網絡之遺忘閘提取語音及文字流暢度特徵 用以改善自閉症孩童說故事自動辨識系統 Learning Lexical and Speech Coherence Representation by Using LSTM Forget Gate |
指導教授: |
李祈均
Lee, Chi-Chun |
口試委員: |
冀泰石
Chi, Tai-Shih 陳縕儂 Chen, Yun-Nung 李姝慧 Lee, Shu-Hui |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 英文 |
論文頁數: | 57 |
中文關鍵詞: | 人類行為訊號處理 、語意流暢度 、語音流暢度 、長短期記憶模型 、自閉症 、說故事 |
外文關鍵詞: | behavioral signal processing, lexical coherence, speech fluency, long-short term memory neural network (LSTM), autism spectrum disorder (ASD), story-telling |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
根據眾多泛自閉症的研究證實,相較於典型孩童,自閉症小孩在口語能力上普遍較為遲緩,較無法流暢的用口語敘事已成為一個用來診斷自閉症兒童的重要指標。以往要衡量不論是語音或是文字的流暢度,都是依靠費時的人工標註,或是得由訓練過的專家設計出的特徵來當作指標。這篇論文提出一種直接資料導向的流暢度特徵學習方式,利用長短期記憶模型(LSTM)架構中的忘記閘(forget gate),在文字上,導出一種嵌入式含有語意流暢度概念的文字特徵,在語音上,導出一種嵌入式含有發聲流暢度概念的文字特徵,在文字上,用這種嶄新的流暢度特徵來區分正常小孩與自閉症小孩的任務上,能夠達到92%的準確率,而以語音流暢度特徵來區分被評為說話流暢與不流暢的小孩任務上,能夠達到75%的準確率。
對照傳統方法使用語法、語詞使用頻率、潛在語意模型分析(LSA)當作流暢度特徵的方法,準確率是73%,在準確率上有顯著的提升。
在這篇論文也進一步驗證提出的此一新特徵值所含的意義。藉由隨機打亂正常小孩敘述故事的流暢句子中的語詞順序以及句子順序,我們製造出這些不流暢的語句,而我們發現,透過我們的特徵擷取模型,擷取出的這些不流暢化句子的特徵值分布,會趨向用我們模型擷取出的自閉症小孩不流暢語句的特徵值分布。因此驗證,我們導出的這一新特徵值,含有流暢度的概念存在。
Since autistic children are less able to carry out a fluent story than typical children, measuring verbal fluency becomes an important indicator when diagnosing autistic children. Fluency assessment, however, needs time-consuming manual tagging, or using expert specially designed characteristics as indicators, therefore, this study proposes a coherence representation learned by directly data-driven architecture, using forget gate of long short-term memory model to export coherence representation from text and audio, at the same time, we also use the ADOS coding related to the evaluation of narration to test our proposed representation. Our proposed lexical coherence representation performs high accuracy of 92% on the task of identifying children with autism from typically development from text modality, and performs high accuracy of 83% on the task of identifying disfluent autistic children’s speech from relatively fluent speech. Comparing with the traditional measurement of text and audio, there is a significant improvement.
This paper also further introduces incoherency into coherent samples by randomly shuffling the word order and sentence order on text and adding some pulse or repetitive signal on speech. These processes make the coherent children's story content become incoherent. By visualizing the data samples after dimension reduction, we further observe the distribution of these coherent, incoherent, and those artificially incoherent data samples. We found the artificially incoherent typical samples would move closer to incoherent autistic samples which prove that our proposed representation contains the concept of coherency.
[1] Shrikanth Narayanan and Panayiotis G Georgiou, “Behavioral signal processing: Deriving human behavioral informatics from speech and language,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013.
[2] Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, and Shrikanth Narayanan, “Signal processing and machine learning for mental health research and clinical applications [perspectives],” IEEE Signal Processing Magazine, vol. 34, no. 5, pp. 196–195, 2017.
[3] Wenbo Liu, Ming Li, and Li Yi, “Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework,” Autism Research, vol. 9, no. 8, pp. 888–898, 2016.
[4] Erik Marchi, Bj¨orn Schuller, Simon Baron-Cohen, Ofer Golan, Sven B¨olte, Prerna Arora, and Reinhold H¨ab- Umbach, “Typicality and emotion in the voice of children with autism spectrum condition: Evidence across three languages,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
[5] Daniel Bone, Chi-Chun Lee, Matthew P Black, Marian E Williams, Sungbok Lee, Pat Levitt, and Shrikanth Narayanan, “The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody,” Journal of Speech, Language, and Hearing Research, vol. 57, no. 4, pp. 1162– 1177, 2014.
[6] Michaela Regneri and Diane King, “Automated discourse analysis of narrations by adolescents with autistic spectrum disorder,” ACL 2016, p. 1, 2016.
[7] Arodami Chorianopoulou, Efthymios Tzinis, Elias Iosif, Asimenia Papoulidi, Christina Papailiou, and Alexandros Potamianos, “Engagement detection for children with autism spectrum disorder,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 5055–5059.
[8] Masoud Rouhizadeh, Emily Prud’Hommeaux, Brian Roark, and Jan Van Santen, “Distributional semantic models for the evaluation of disordered language,” in Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting. NIH Public Access, 2013, vol. 2013, p. 709.
[9] Masoud Rouhizadeh, Richard Sproat, and Jan Van Santen, “Similarity measures for quantifying restrictive and repetitive behavior in conversations of autistic children,” in Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting. NIH Public Access, 2015, vol. 2015, p. 117.
[10] Molly Losh and Peter C Gordon, “Quantifying narrative ability in autism spectrum disorder: A computational linguistic analysis of narrative coherence,” Journal of autism and developmental disorders, vol. 44, no. 12, pp. 3016–3025, 2014.
[11] Cleland, Joanne, et al. "Phonetic and phonological errors in children with high functioning autism and Asperger syndrome." International Journal of Speech-Language Pathology 12.1 (2010): 69-76.
[12] Lisa Capps, Molly Losh, and Christopher Thurber, “the frog ate the bug and made his mouth sad: Narrative competence in children with autism,” Journal of abnormal child psychology, vol. 28, no. 2, pp. 193–204, 2000.
[13] Helen Tager-Flusberg, “once upon a ribbit: Stories narrated by autistic children,” British journal of developmental psychology, vol. 13, no. 1, pp. 45–59, 1995.
[14] Joshua J Diehl, Loisa Bennetto, and Edna Carter Young, “Story recall and narrative coherence of highfunctioning children with autism spectrum disorders,” Journal of abnormal child psychology, vol. 34, no. 1, pp. 83–98, 2006.
[15] Arthur C Graesser, Danielle S McNamara, Max M Louwerse, and Zhiqiang Cai, “Coh-metrix: Analysis of text on cohesion and language,” Behavior Research Methods, vol. 36, no. 2, pp. 193–202, 2004.
[16] Danielle S McNamara, Arthur C Graesser, PhilipMMc- Carthy, and Zhiqiang Cai, Automated evaluation of text and discourse with Coh-Metrix, Cambridge University Press, 2014.
[17] Juan Rafael Orozco-Arroyave, Juan Camilo V´asquez- Correa, Jes´us Francisco Vargas-Bonilla, Raman Arora, Najim Dehak, Phani Sankar Nidadavolu, Heidi Christensen, Frank Rudzicz, Maria Yancheva, H Chinaei, et al., “Neurospeech: An open-source software for parkinson’s speech analysis,” Digital Signal Processing, vol. 77, pp. 207–221, 2018.
[18] Jiwei Li, Will Monroe, and Dan Jurafsky, “Understanding neural networks through representation erasure,” arXiv preprint arXiv:1612.08220, 2016.
[19] Pang Wei Koh and Percy Liang, “Understanding blackbox predictions via influence functions,” arXiv preprint arXiv:1703.04730, 2017.
[20] Catherine Lord, Michael Rutter, Pamela C.. Dilavore, and Susan Risi, ADOS: Autism diagnostic observation schedule, Hogrefe Boston, 2008.
[21] Dahl, George E., et al. "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition." IEEE Transactions on audio, speech, and language processing 20.1 (2012): 30-42.
[22] Marzieh Fadaee, Arianna Bisazza, and Christof Monz, “Data augmentation for low-resource neural machine translation,” arXiv preprint arXiv:1705.00440, 2017.
[23] Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
[24] J Sun, “jiebachinese word segmentation tool,” 2012.
[25] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
[26] Yoav Goldberg and Omer Levy, “word2vec explained: deriving mikolov et al.’s negative-sampling word embedding method,” arXiv preprint arXiv:1402.3722, 2014.
[27] Sepp Hochreiter and J¨urgen Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[28] Trang Tran, Shubham Toshniwal, Mohit Bansal, Kevin Gimpel, Karen Livescu, and Mari Ostendorf, “Parsing speech: A neural approach to integrating lexical and acoustic-prosodic information,” arXiv preprint arXiv:1704.07287, 2017.
[29] James Ferguson, Greg Durrett, and Dan Klein, “Disfluency detection with a semi-markov model and prosodic features,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015, pp. 257–262.
[30] Yangfeng Ji and Jacob Eisenstein, “Discriminative improvements to distributional sentence similarity,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013, pp. 891–896.
[31] Florian Eyben, Klaus R Scherer, Bj¨orn W Schuller, Johan Sundberg, Elisabeth Andr´e, Carlos Busso, Laurence Y Devillers, Julien Epps, Petri Laukka, Shrikanth S Narayanan, et al., “The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016.
[32] Guergana Savova and Joan Bachenko, “Prosodic features of four types of disfluencies,” in Isca tutorial and research workshop on disfluency in spontaneous speech, 2003.
[33] Corinna Cortes and Vladimir Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273– 297, 1995.
[34] Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[35] Shoou-Yi Cheng and Tyne Liang, “ (chinese sentence similarity computing and appling)[in chinese],” in Proceedings of the 17th Conference on Computational Linguistics and Speech Processing, 2005, pp. 113–124.
[36] Kazuhiro Kobayashi and Tomoki Toda, “sprocket: Open-source voice conversion software,” in Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 203–210.
[37] Duez, Danielle. "Syllable structure, syllable duration and final lengthening in Parkinsonian French speech." Journal of Multilingual Communication Disorders 4.1 (2006): 45-57.
[38] Shriberg, Lawrence D., et al. "Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome." Journal of Speech, Language, and Hearing Research 44.5 (2001): 1097-1115.