研究生: |
徐雅玲 Hsu, Ya-Ling |
---|---|
論文名稱: |
利用多模態模型混合CNN和LSTM影音特徵以自動化偵測急診病患疼痛程度 Toward Automatic Pain-Level Detection for Emergency Patients using Fusion of CNN and LSTM Multimodal Audio-Video Features |
指導教授: |
李祈均
Lee, Chi-Chun |
口試委員: |
李宏毅
Lee, Hung-Yi 曹昱 Tsao, Yu 賴穎暉 Lai, Ying-Hui |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 急診檢傷分類 、疼痛程度辨識 、行為訊號處理 、多模態融合 、迴旋積類神經網路 、長短期記憶 |
外文關鍵詞: | Triage, Pain recognition, Behavior signal processing, Multimodal fusion, Convolution neural network, Long short-term memory |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在現今醫療體系中,急診常常被認為是最有時效性的就醫途徑,然而為了能妥善的分配醫療資源,台灣急診醫學會和中華民國急重症護理學會參照加拿大檢傷分類系統架構制訂了台灣急診檢傷與急迫度分級量表(Taiwan Triage and Acuity Scale, TTAS),其規範了台灣醫療檢傷系統的分類標準,而此系統在評估急診病患疾病的嚴重程度方面發揮了重要的功用。TTAS使用數字評分量表(Numerical Rating Scale, NRS)來評量病患自述疼痛程度作為其中主要調節檢傷的因子之一,然而對於無法清楚表達疼痛的病患,檢傷護士或其病患家屬將依照其個人主觀自行判斷,而這些因素將造成檢傷分類系統一致性及有效性的偏差。本論文與林口長庚醫院急診醫師合作,藉由提取病患臉部表情及聲音特徵的多模態行為訊號,人們在表達內在感受時,會經由外在的行為表現出來,而我們對這些外顯行為分別利用迴旋積類神經網路(Convolution Neural Network)和長短期記憶網路(Long Short-Term Memory)演算法的機器學習模型來進行建模,以達到自動化評估病患疼痛程度。由實驗結果顯示在二類及三類辨識疼痛程度的結果中,分別達到了77.1%和55.7%的準確率,而在實驗分析中我們也發現病患臉部表情和聲音特徵與其疼痛程度有顯著關係。透過本論文的實驗結果均呈現經由量化、分析病患外顯行為來達到自動化疼痛程度評估是相當有可行性的。
Nowadays, emergency department are often considered as the most efficient ways to seek medical care. However, to allocate the healthcare resource effectively, triage classification system plays an important role in assessing the severity of illness of the boarding patient at emergency department. There are some factors listed in Taiwan triage and acuity scale (TTAS) about triage classification system. And the self-report pain intensity numerical-rating scale (NRS) is one of the major modifiers of the current triage system based on the TTAS. In clinical practice, physicians and nurses have noticed the difficulty in the systematic implementation of this instrument especially for elderly people, foreigners, or patients with a low education level. This often leads to the triage nurses would select the level through his/her own observations instead of soliciting an answer from the patient. These ways would create a deviation on the consistency and validity of the triage classification system. In this paper, we have cooperation with emergency physicians in Linkou Chang Gung Memorial Hospital. We extract the multimodal behavioral signal of facial expression and vocal characteristics from patients, and model these behaviors by using machine learning models of CNN and LSTM respectively. The experimental results show that the accuracy of 77.1% and 55.7%, respectively, in the two and three classes of pain recognition. Further, in the experimental analysis, we also found that it had significant relationship with facial expression and vocal characteristics of patients.
[1]Mehrabian, Albert. Silent messages. Vol. 8. Belmont, CA: Wadsworth, 1971.
[2]Mehrabian, Albert, and Morton Wiener. "Decoding of inconsistent communications." Journal of personality and social psychology6.1 (1967): 109.
[3]Mehrabian, Albert, and Susan R. Ferris. "Inference of attitudes from nonverbal communication in two channels." Journal of consulting psychology 31.3 (1967): 248.
[4]Margolin, Gayla, et al. "The nuts and bolts of behavioral observation of marital and family interaction." Clinical child and family psychology review 1.4 (1998): 195-213.
[5]Heyman, Richard E. "Observation of couple conflicts: clinical assessment applications, stubborn truths, and shaky foundations." Psychological assessment 13.1 (2001): 5.
[6]Lord, Catherine, et al. "The Autism Diagnostic Observation Schedule—Generic: A standard measure of social and communication deficits associated with the spectrum of autism." Journal of autism and developmental disorders 30.3 (2000): 205-223.
[7]Narayanan, Shrikanth, and Panayiotis G. Georgiou. "Behavioral signal processing: Deriving human behavioral informatics from speech and language." Proceedings of the IEEE 101.5 (2013): 1203-1233.
[8]Chen, Wei-Chen, et al. "Multimodal arousal rating using unsupervised fusion technique." Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015.
[9]Hsiao, Shan-Wen, et al. "A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program." Sixteenth Annual Conference of the International Speech Communication Association. 2015.
[10]Bone, Daniel, et al. "An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model." Fifteenth Annual Conference of the International Speech Communication Association. 2014.
[11]Delaherche, Emilie, et al. "Assessment of the communicative and coordination skills of children with autism spectrum disorders and typically developing children using social signal processing." Research in Autism Spectrum Disorders 7.6 (2013): 741-756.
[12]Bone, Daniel, et al. "The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody." Journal of Speech, Language, and Hearing Research 57.4 (2014): 1162-1177.
[13]Black, Matthew P., et al. "Toward automating a human behavioral coding system for married couples’ interactions using speech acoustic features." Speech Communication 55.1 (2013): 1-21.
[14]Chen, Hsuan-Yu, et al. "A Gaussian mixture regression approach toward modeling the affective dynamics between acoustically-derived vocal arousal score (VC-AS) and internal brain fMRI bold signal response." Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016.
[15]Ng, Chip-Jin, et al. "Validation of the Taiwan triage and acuity scale: a new computerised five-level triage system." Emergency Medicine Journal (2010): emj-2010.
[16]Bieri, Daiva, et al. "The Faces Pain Scale for the self-assessment of the severity of pain experienced by children: development, initial validation, and preliminary investigation for ratio scale properties." Pain 41.2 (1990): 139-150.
[17]Hicks, Carrie L., et al. "The Faces Pain Scale–Revised: toward a common metric in pediatric pain measurement." Pain 93.2 (2001): 173-183.
[18]Carlsson, Anna Maria. "Assessment of chronic pain. I. Aspects of the reliability and validity of the visual analogue scale." Pain16.1 (1983): 87-101.
[19]Bond, M. R., and I. Pilowsky. "Subjective assessment of pain and its relationship to the administration of analgesics in patients with advanced cancer." Journal of psychosomatic research 10.2 (1966): 203-208.
[20]Shields, Brenda J., et al. "Predictors of a child's ability to use a visual analogue scale." Child: care, health and development 29.4 (2003): 281-290.
[21]Bijur, Polly E., Clarke T. Latimer, and E. John Gallagher. "Validation of a verbally administered numerical rating scale of acute pain for use in the emergency department." Academic Emergency Medicine 10.4 (2003): 390-392.
[22]Michener, Lori A., Alison R. Snyder, and Brian G. Leggin. "Responsiveness of the numeric pain rating scale in patients with shoulder pain and the effect of surgical status." Journal of sport rehabilitation 20.1 (2011): 115-128.
[23]Farrar, John T., et al. "Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale." Pain 94.2 (2001): 149-158.
[24]Prkachin, Kenneth M. "The consistency of facial expressions of pain: a comparison across modalities." Pain 51.3 (1992): 297-306.
[25]Lucey, Patrick, et al. "Automatically detecting pain in video through facial action units." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 41.3 (2011): 664-674.
[26]Kaltwang, Sebastian, Ognjen Rudovic, and Maja Pantic. "Continuous pain intensity estimation from facial expressions." Advances in visual computing (2012): 368-377.
[27]Bellantonio, Marco, et al. "Spatio-temporal pain recognition in cnn-based super-resolved facial images." International Workshop on Face and Facial Expression Recognition from Real World Videos. Springer, Cham, 2016.
[28]Lucey, Patrick, et al. "Painful data: The UNBC-McMaster shoulder pain expression archive database." Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 2011.
[29]Baltrusaitis, Tadas, Peter Robinson, and Louis-Philippe Morency. "Constrained local neural fields for robust facial landmark detection in the wild." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2013.
[30]Amos, Brandon, Bartosz Ludwiczuk, and Mahadev Satyanarayanan. "Openface: A general-purpose face recognition library with mobile applications." CMU School of Computer Science (2016).
[31]Ekman, Paul, and Wallace V. Friesen. "Facial action coding system." (1977).
[32]Boersma, P., and D. Weenink. "Praat-A system for doing phonetics by computer [Computer Software]." The Netherlands: Institute of Phonetic Sciences, University of Amsterdam (2003).
[33]Rosenblatt, Frank. "The perceptron: A probabilistic model for information storage and organization in the brain." Psychological review 65.6 (1958): 386.
[34]Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. "Learning representations by back-propagating errors." Cognitive modeling 5.3 (1988): 1.
[35]Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." science 313.5786 (2006): 504-507.
[36]Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
[37]Lawrence, Steve, et al. "Face recognition: A convolutional neural-network approach." IEEE transactions on neural networks8.1 (1997): 98-113.
[38]Ji, Shuiwang, et al. "3D convolutional neural networks for human action recognition." IEEE transactions on pattern analysis and machine intelligence 35.1 (2013): 221-231.
[39]Williams, Ronald J., and David Zipser. "A learning algorithm for continually running fully recurrent neural networks." Neural computation 1.2 (1989): 270-280.
[40]Mesnil, Grégoire, et al. "Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding." Interspeech. 2013.
[41]Mikolov, Tomáš, et al. "Extensions of recurrent neural network language model." Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011.
[42]Mikolov, Tomas, et al. "Recurrent neural network based language model." Interspeech. Vol. 2. 2010.
[43]Sak, Haşim, et al. "Fast and accurate recurrent neural network acoustic models for speech recognition." arXiv preprint arXiv:1507.06947 (2015).
[44]Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
[45]Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.
[46]Sak, Haşim, Andrew Senior, and Françoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." Fifteenth Annual Conference of the International Speech Communication Association. 2014.
[47]Sak, Haşim, Andrew Senior, and Françoise Beaufays. "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition." arXiv preprint arXiv:1402.1128 (2014).
[48]Greff, Klaus, et al. "LSTM: A search space odyssey." IEEE transactions on neural networks and learning systems (2017).
[49]Byeon, Wonmin, et al. "Scene labeling with lstm recurrent neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
[50]Sánchez, Jorge, et al. "Image classification with the fisher vector: Theory and practice." International journal of computer vision105.3 (2013): 222-245.
[51]Kingma, Diederik, and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).