基於聲訊之聲帶疾病偵測與分類｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳靖杰 Chen, Ching-Chieh
論文名稱：	基於聲訊之聲帶疾病偵測與分類 Voice-based detection and classification of vocal cord disorders
指導教授：	劉奕汶 Liu, Yi-Wen
口試委員:	賴穎暉 Lai, Ying-Hui 鄭桂忠 Tang, Kea-Tiong 徐慧娟 Hsu, Hui-Chuan
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	40
中文關鍵詞：	聲帶疾病、支援向量機、卷積神經網路、梅爾頻率倒譜係數
外文關鍵詞：	vocal cord disorders, support vector machine, convolutional neural network, Mel-frequency cepstral coefficients
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

聲帶疾病在現今社會中相當常見，主要歸因於高齡化的人口結構，需要頻繁使
用聲帶的職業，以及通訊軟體的普及。正因為喉嚨不適的問題相當普遍，許多潛在
的聲帶疾病常被忽略，錯過最好的治療時機。聲帶疾病的診斷方法因疾病種類而
異，然而需要專業醫療儀器，或者活體組織切片檢查，若從人耳直接聽發生將難
以清楚辨別。本文旨在搭建一個聲帶疾病分類系統，我們與中國醫藥大學附設醫
院合作，收錄了 459 位喉科患者的聲音資料庫，主要疾病包含聲帶萎縮、聲帶麻
痺、良性聲帶器質性病變及喉癌。聲音檔的內容包含中文的數一到十，持續地發母
音/a/，以及特定的短文朗讀。我們取用發母音/a/的長音當作輸入，特徵包含梅爾
頻率倒譜係數，GRBAS 量表，採用 SVM (support vector machine) 分類模型，我
們還比較了用主流聲帶疾病偵測的方法得出的準確率。根據我們的數據集，最佳聲
帶疾病偵測測準確率為 98.91%，最佳分類準確率為 54.34%。

Vocal cord disorders are quite common in today’s society, mainly due to an aging population structure, occupations that require frequent vocal cord use, and the
popularity of communication software. Because the problem of throat discomfort
is so common, many underlying vocal cord disorders are often overlooked, thus
missing the best time for treatment. The diagnosis method of vocal cord disease
varies, but always requires professional medical equipment, or biopsy. It is difficult
to distinguish the type of disorders by merely listening to the voice produced by the
patients. The purpose of this thesis is to build a classification system for vocal cord
disorders. We cooperated with the China Medical University Hospital to construct a
voice database of 459 laryngeal patients. The main disorders include vocal cord atrophy, vocal cord paralysis, benign vocal cord organic lesions and laryngeal cancer.
The content of the sound file includes counting from one to ten in Mandarin, sustained vowel “a”, and reading a specific short text. We take the sustained vowel /a/
as the input. Mel-frequency cepstral coefficients (MFCC) were extracted, and GRBAS scales, rated by clinicians, were also jointly considered as input features. The
SVM (support vector machine) classification model is used to detect and classify
vocal cord disorders. We also compare the accuracies derived from several existing vocal cord disease detection methods. Based on our dataset, the best detection
accuracy is 98.91%, and the best classification accuracy is 54.34%.

 Introduction
1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Research Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Mechanism of Voice Production and Vocal Cord Disorders 4
1 Vocal system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Vocal Cord Disorders . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Acute and Chronic Laryngitis . . . . . . . . . . . . . . . . 7
3.2 Vocal Cord Atrophy and Sulcus . . . . . . . . . . . . . . . 8
3.3 Vocal Cord Paralysis . . . . . . . . . . . . . . . . . . . . . 9
3.4 Benign Organic Lesions . . . . . . . . . . . . . . . . . . . 10
3.5 Vocal Cord Leukoplakia and Laryngeal Cancer . . . . . . . 13
Materials and Methods 15
1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1 Chinese Medical University Hospital (CMUH) Dataset . . . 15
1.2 Saarbruecken Voice Database (SVD) . . . . . . . . . . . . 17
1.3 Data Content . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1 Mel Frequency Cepstral Coefficients . . . . . . . . . . . . 18
2.2 GRBAS Scale . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The Spectrogram as an Image . . . . . . . . . . . . . . . . 20
3 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . 22
3.1 Support Vector Machine (SVM) . . . . . . . . . . . . . . . 22
3.2 Convolutional Neural Networks (CNN) . . . . . . . . . . . 23
Experiments 24
1 Experiment Methods . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Procedure of the Experiments . . . . . . . . . . . . . . . . . . . . . 25
3 Training Hyperparameters . . . . . . . . . . . . . . . . . . . . . . 26
Results and Discussion 28
1 The Effects of the Dimension of MFCC . . . . . . . . . . . . . . . 28
2 Cross Database Comparison . . . . . . . . . . . . . . . . . . . . . 29
3 Combination of MFCC and GRBAS . . . . . . . . . . . . . . . . . 30
4 Results of CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Conclusions 33
Future Work 34
1 Exploring on Chinese Pronunciations with Vocal Cord Disorders . . 34
2 A diagnostic model combining speech and questionnaire . . . . . . 34
3 Parallel Model of Voice and Laryngoscope Images . . . . . . . . . 35
4 Voice Quality Assessment Before and After Surgery . . . . . . . . . 35
References 36
Appendix 39
A.1 Suggestions From the Oral Defense Committees . . . . . . . . . . . 39
                                

[1] J. Tu, K. Inthavong, and G. Ahmadi, The Human Respiratory System, pp. 19–
44. 01 2013.
[2] K. A. Stevens, Geometry and material properties of vocal fold models.
Brigham Young University, 2015.
[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 770–778, 2016.
[4] B. Woldert-Jokisz, “Saarbruecken voice database,” 2007.
[5] A. Al-Nasheri, G. Muhammad, M. Alsulaiman, Z. Ali, K. H. Malki, T. A.
Mesallam, and M. F. Ibrahim, “Voice pathology detection and classification
using auto-correlation and entropy features in different frequency regions,”
Ieee Access, vol. 6, pp. 6961–6974, 2017.
[6] P. Harar, J. B. Alonso-Hernandezy, J. Mekyska, Z. Galaz, R. Burget, and
Z. Smekal, “Voice pathology detection using deep learning: a preliminary
study,” in 2017 international conference and workshop on bioinspired intelligence (IWOBI), pp. 1–4, IEEE, 2017.
[7] A. Al-Nasheri, G. Muhammad, M. Alsulaiman, Z. Ali, T. A. Mesallam,
M. Farahat, K. H. Malki, and M. A. Bencherif, “An investigation of multidimensional voice program parameters in three different databases for voice
pathology detection and classification,” Journal of Voice, vol. 31, no. 1,
pp. 113–e9, 2017.
[8] M. Alhussein and G. Muhammad, “Voice pathology detection using deep
learning on mobile healthcare framework,” IEEE Access, vol. 6, pp. 41034–
41041, 2018.
[9] M. A. Mohammed, K. H. Abdulkareem, S. A. Mostafa, M. Khanapi
Abd Ghani, M. S. Maashi, B. Garcia-Zapirain, I. Oleagordia, H. Alhakami,
and F. T. Al-Dhief, “Voice pathology detection and classification using convolutional neural network model,” Applied Sciences, vol. 10, no. 11, p. 3723,
2020.
[10] F. T. AL-Dhief, N. M. A. Latiff, N. N. N. A. Malik, N. Sabri, M. M. Baki,
M. A. A. Albadr, A. F. Abbas, Y. M. Hussein, and M. A. Mohammed, “Voice
pathology detection using machine learning technique,” in 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT), pp. 99–
104, IEEE, 2020.
[11] F. T. AL-Dhief, N. M. A. Latiff, M. M. Baki, N. N. N. A. Malik, N. Sabri, and
M. A. A. Albadr, “Voice pathology detection using support vector machine
based on different number of voice signals,” in 2021 26th IEEE Asia-Pacific
Conference on Communications (APCC), pp. 1–6, IEEE, 2021.
[12] S.-H. Fang, Y. Tsao, M.-J. Hsiao, J.-Y. Chen, Y.-H. Lai, F.-C. Lin, and C.-
T. Wang, “Detection of pathological voice using cepstrum vectors: A deep
learning approach,” Journal of Voice, vol. 33, no. 5, pp. 634–641, 2019.
[13] H.-C. Hu, S.-Y. Chang, C.-H. Wang, K.-J. Li, H.-Y. Cho, Y.-T. Chen, C.-J.
Lu, T.-P. Tsai, O. K.-S. Lee, et al., “Deep learning application for vocal fold
disease prediction through voice recognition: preliminary development study,”
Journal of medical Internet research, vol. 23, no. 6, p. e25247, 2021.
[14] K. Shama, A. Krishna, and N. U. Cholayya, “Study of harmonics-to-noise ratio
and critical-band energy spectrum of speech as acoustic indicators of laryngeal
and voice pathology,” EURASIP Journal on Advances in Signal Processing,
vol. 2007, pp. 1–9, 2006.
[15] S. Abe, Support vector machines for pattern classification, vol. 2. Springer,
2005.
[16] J. Jiang, E. Lin, and D. G. Hanson, “Vocal fold physiology,” Otolaryngologic
Clinics of North America, vol. 33, no. 4, pp. 699–718, 2000.
[17] M. T. Caserta, “Acute laryngitis,” Mandell, Douglas, and Bennett’s principles
and practice of infectious diseases, p. 760, 2015.
[18] S. Takano, M. Kimura, T. Nito, H. Imagawa, K.-I. Sakakibara, and N. Tayama,
“Clinical analysis of presbylarynx—vocal fold atrophy in elderly individuals,”
Auris Nasus Larynx, vol. 37, no. 4, pp. 461–464, 2010.
[19] C. N. Ford, K. Inagi, A. Khidr, D. M. Bless, and K. W. Gilchrist, “Sulcus
vocalis: a rational analytical approach to diagnosis and management,” Annals
of otology, rhinology & laryngology, vol. 105, no. 3, pp. 189–200, 1996.
[20] H. M. Tucker, “Vocal cord paralysis—1979: etiology and management,” The
Laryngoscope, vol. 90, no. 4, pp. 585–590, 1980.
[21] H.-C. Chen, Y.-M. Jen, C.-H. Wang, J.-C. Lee, and Y.-S. Lin, “Etiology of
vocal cord paralysis,” ORL, vol. 69, no. 3, pp. 167–171, 2007.
[22] M. R. Naunheim and T. L. Carroll, “Benign vocal fold lesions: update on
nomenclature, cause, diagnosis, and treatment,” Current opinion in otolaryngology & head and neck surgery, vol. 25, no. 6, pp. 453–458, 2017.
[23] O. Kleinsasser, “Pathogenesis of vocal cord polyps,” Annals of Otology, Rhinology & Laryngology, vol. 91, no. 4, pp. 378–381, 1982.
[24] K. Omori, “Diagnosis of voice disorders,” JMAJ, vol. 54, no. 4, pp. 248–253,
2011.
[25] M. M. Johns, “Update on the etiology, diagnosis, and treatment of vocal fold
nodules, polyps, and cysts,” Current opinion in otolaryngology & head and
neck surgery, vol. 11, no. 6, pp. 456–461, 2003.
[26] M. Bouchayer, G. Cornut, R. Loire, J. B. Roch, E. Witzig, and R. W. Bastian,
“Epidermoid cysts, sulci, and mucosal bridges of the true vocal cord: a report
of 157 cases,” The Laryngoscope, vol. 95, no. 9, pp. 1087–1094, 1985.
[27] S. M. Zeitels, G. W. Bunting, R. E. Hillman, and T. Vaughn, “Reinke’s edema:
phonatory mechanisms and management strategies,” Annals of Otology, Rhinology & Laryngology, vol. 106, no. 7, pp. 533–543, 1997.
[28] J. S. Isenberg, D. L. Crozier, and S. H. Dailey, “Institutional and comprehensive review of laryngeal leukoplakia,” Annals of Otology, Rhinology & Laryngology, vol. 117, no. 1, pp. 74–79, 2008.
[29] M. Cattaruzza, P. Maisonneuve, and P. Boyle, “Epidemiology of laryngeal
cancer,” European Journal of Cancer Part B: Oral Oncology, vol. 32, no. 5,
pp. 293–305, 1996.
[30] R. Nocini, G. Molteni, C. Mattiuzzi, and G. Lippi, “Updates on larynx cancer
epidemiology,” Chinese Journal of Cancer Research, vol. 32, no. 1, p. 18,
2020.
[31] K. K. Paliwal, J. G. Lyons, and K. K. Wójcicki, “Preference for 20-40 ms
window duration in speech analysis,” in 2010 4th International Conference on
Signal Processing and Communication Systems, pp. 1–4, IEEE, 2010.
[32] H. A. Fayed and A. F. Atiya, “Decision boundary clustering for efficient local
svm,” Applied Soft Computing, vol. 110, p. 107628, 2021.
[33] J. Milgram, M. Cheriet, and R. Sabourin, ““one against one”or “one
against all”: Which one is better for handwriting recognition with svms?,”
in tenth international workshop on Frontiers in handwriting recognition, Suvisoft, 2006.
[34] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,”
arXiv preprint arXiv:1511.08458, 2015.
[35] R. Islam, M. Tarique, and E. Abdel-Raheem, “A survey on signal processing based pathological voice detection techniques,” IEEE Access, vol. 8,
pp. 66749–66776, 2020.
[36] K. Palanisamy, D. Singhania, and A. Yao, “Rethinking cnn models for audio
classification,” arXiv preprint arXiv:2007.11154, 2020.

簡易檢索 / 詳目顯示

相關論文