簡易檢索 / 詳目顯示

研究生: 陳靖杰
Chen, Ching-Chieh
論文名稱: 基於聲訊之聲帶疾病偵測與分類
Voice-based detection and classification of vocal cord disorders
指導教授: 劉奕汶
Liu, Yi-Wen
口試委員: 賴穎暉
Lai, Ying-Hui
鄭桂忠
Tang, Kea-Tiong
徐慧娟
Hsu, Hui-Chuan
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 40
中文關鍵詞: 聲帶疾病支援向量機卷積神經網路梅爾頻率倒譜係數
外文關鍵詞: vocal cord disorders, support vector machine, convolutional neural network, Mel-frequency cepstral coefficients
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 聲帶疾病在現今社會中相當常見,主要歸因於高齡化的人口結構,需要頻繁使
    用聲帶的職業,以及通訊軟體的普及。正因為喉嚨不適的問題相當普遍,許多潛在
    的聲帶疾病常被忽略,錯過最好的治療時機。聲帶疾病的診斷方法因疾病種類而
    異,然而需要專業醫療儀器,或者活體組織切片檢查,若從人耳直接聽發生將難
    以清楚辨別。本文旨在搭建一個聲帶疾病分類系統,我們與中國醫藥大學附設醫
    院合作,收錄了 459 位喉科患者的聲音資料庫,主要疾病包含聲帶萎縮、聲帶麻
    痺、良性聲帶器質性病變及喉癌。聲音檔的內容包含中文的數一到十,持續地發母
    音/a/,以及特定的短文朗讀。我們取用發母音/a/的長音當作輸入,特徵包含梅爾
    頻率倒譜係數,GRBAS 量表,採用 SVM (support vector machine) 分類模型,我
    們還比較了用主流聲帶疾病偵測的方法得出的準確率。根據我們的數據集,最佳聲
    帶疾病偵測測準確率為 98.91%,最佳分類準確率為 54.34%。


    Vocal cord disorders are quite common in today’s society, mainly due to an aging population structure, occupations that require frequent vocal cord use, and the
    popularity of communication software. Because the problem of throat discomfort
    is so common, many underlying vocal cord disorders are often overlooked, thus
    missing the best time for treatment. The diagnosis method of vocal cord disease
    varies, but always requires professional medical equipment, or biopsy. It is difficult
    to distinguish the type of disorders by merely listening to the voice produced by the
    patients. The purpose of this thesis is to build a classification system for vocal cord
    disorders. We cooperated with the China Medical University Hospital to construct a
    voice database of 459 laryngeal patients. The main disorders include vocal cord atrophy, vocal cord paralysis, benign vocal cord organic lesions and laryngeal cancer.
    The content of the sound file includes counting from one to ten in Mandarin, sustained vowel “a”, and reading a specific short text. We take the sustained vowel /a/
    as the input. Mel-frequency cepstral coefficients (MFCC) were extracted, and GRBAS scales, rated by clinicians, were also jointly considered as input features. The
    SVM (support vector machine) classification model is used to detect and classify
    vocal cord disorders. We also compare the accuracies derived from several existing vocal cord disease detection methods. Based on our dataset, the best detection
    accuracy is 98.91%, and the best classification accuracy is 54.34%.

    1 Introduction 1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Research Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Mechanism of Voice Production and Vocal Cord Disorders 4 2.1 Vocal system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Vocal Cord Disorders . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Acute and Chronic Laryngitis . . . . . . . . . . . . . . . . 7 2.3.2 Vocal Cord Atrophy and Sulcus . . . . . . . . . . . . . . . 8 2.3.3 Vocal Cord Paralysis . . . . . . . . . . . . . . . . . . . . . 9 2.3.4 Benign Organic Lesions . . . . . . . . . . . . . . . . . . . 10 2.3.5 Vocal Cord Leukoplakia and Laryngeal Cancer . . . . . . . 13 3 Materials and Methods 15 3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Chinese Medical University Hospital (CMUH) Dataset . . . 15 3.1.2 Saarbruecken Voice Database (SVD) . . . . . . . . . . . . 17 3.1.3 Data Content . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Mel Frequency Cepstral Coefficients . . . . . . . . . . . . 18 3.2.2 GRBAS Scale . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.3 The Spectrogram as an Image . . . . . . . . . . . . . . . . 20 3.3 Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1 Support Vector Machine (SVM) . . . . . . . . . . . . . . . 22 3.3.2 Convolutional Neural Networks (CNN) . . . . . . . . . . . 23 4 Experiments 24 4.1 Experiment Methods . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Procedure of the Experiments . . . . . . . . . . . . . . . . . . . . . 25 4.3 Training Hyperparameters . . . . . . . . . . . . . . . . . . . . . . 26 5 Results and Discussion 28 5.1 The Effects of the Dimension of MFCC . . . . . . . . . . . . . . . 28 5.2 Cross Database Comparison . . . . . . . . . . . . . . . . . . . . . 29 5.3 Combination of MFCC and GRBAS . . . . . . . . . . . . . . . . . 30 5.4 Results of CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6 Conclusions 33 7 Future Work 34 7.1 Exploring on Chinese Pronunciations with Vocal Cord Disorders . . 34 7.2 A diagnostic model combining speech and questionnaire . . . . . . 34 7.3 Parallel Model of Voice and Laryngoscope Images . . . . . . . . . 35 7.4 Voice Quality Assessment Before and After Surgery . . . . . . . . . 35 References 36 Appendix 39 A.1 Suggestions From the Oral Defense Committees . . . . . . . . . . . 39

    [1] J. Tu, K. Inthavong, and G. Ahmadi, The Human Respiratory System, pp. 19–
    44. 01 2013.
    [2] K. A. Stevens, Geometry and material properties of vocal fold models.
    Brigham Young University, 2015.
    [3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern
    recognition, pp. 770–778, 2016.
    [4] B. Woldert-Jokisz, “Saarbruecken voice database,” 2007.
    [5] A. Al-Nasheri, G. Muhammad, M. Alsulaiman, Z. Ali, K. H. Malki, T. A.
    Mesallam, and M. F. Ibrahim, “Voice pathology detection and classification
    using auto-correlation and entropy features in different frequency regions,”
    Ieee Access, vol. 6, pp. 6961–6974, 2017.
    [6] P. Harar, J. B. Alonso-Hernandezy, J. Mekyska, Z. Galaz, R. Burget, and
    Z. Smekal, “Voice pathology detection using deep learning: a preliminary
    study,” in 2017 international conference and workshop on bioinspired intelligence (IWOBI), pp. 1–4, IEEE, 2017.
    [7] A. Al-Nasheri, G. Muhammad, M. Alsulaiman, Z. Ali, T. A. Mesallam,
    M. Farahat, K. H. Malki, and M. A. Bencherif, “An investigation of multidimensional voice program parameters in three different databases for voice
    pathology detection and classification,” Journal of Voice, vol. 31, no. 1,
    pp. 113–e9, 2017.
    [8] M. Alhussein and G. Muhammad, “Voice pathology detection using deep
    learning on mobile healthcare framework,” IEEE Access, vol. 6, pp. 41034–
    41041, 2018.
    [9] M. A. Mohammed, K. H. Abdulkareem, S. A. Mostafa, M. Khanapi
    Abd Ghani, M. S. Maashi, B. Garcia-Zapirain, I. Oleagordia, H. Alhakami,
    and F. T. Al-Dhief, “Voice pathology detection and classification using convolutional neural network model,” Applied Sciences, vol. 10, no. 11, p. 3723,
    2020.
    [10] F. T. AL-Dhief, N. M. A. Latiff, N. N. N. A. Malik, N. Sabri, M. M. Baki,
    M. A. A. Albadr, A. F. Abbas, Y. M. Hussein, and M. A. Mohammed, “Voice
    pathology detection using machine learning technique,” in 2020 IEEE 5th International Symposium on Telecommunication Technologies (ISTT), pp. 99–
    104, IEEE, 2020.
    [11] F. T. AL-Dhief, N. M. A. Latiff, M. M. Baki, N. N. N. A. Malik, N. Sabri, and
    M. A. A. Albadr, “Voice pathology detection using support vector machine
    based on different number of voice signals,” in 2021 26th IEEE Asia-Pacific
    Conference on Communications (APCC), pp. 1–6, IEEE, 2021.
    [12] S.-H. Fang, Y. Tsao, M.-J. Hsiao, J.-Y. Chen, Y.-H. Lai, F.-C. Lin, and C.-
    T. Wang, “Detection of pathological voice using cepstrum vectors: A deep
    learning approach,” Journal of Voice, vol. 33, no. 5, pp. 634–641, 2019.
    [13] H.-C. Hu, S.-Y. Chang, C.-H. Wang, K.-J. Li, H.-Y. Cho, Y.-T. Chen, C.-J.
    Lu, T.-P. Tsai, O. K.-S. Lee, et al., “Deep learning application for vocal fold
    disease prediction through voice recognition: preliminary development study,”
    Journal of medical Internet research, vol. 23, no. 6, p. e25247, 2021.
    [14] K. Shama, A. Krishna, and N. U. Cholayya, “Study of harmonics-to-noise ratio
    and critical-band energy spectrum of speech as acoustic indicators of laryngeal
    and voice pathology,” EURASIP Journal on Advances in Signal Processing,
    vol. 2007, pp. 1–9, 2006.
    [15] S. Abe, Support vector machines for pattern classification, vol. 2. Springer,
    2005.
    [16] J. Jiang, E. Lin, and D. G. Hanson, “Vocal fold physiology,” Otolaryngologic
    Clinics of North America, vol. 33, no. 4, pp. 699–718, 2000.
    [17] M. T. Caserta, “Acute laryngitis,” Mandell, Douglas, and Bennett’s principles
    and practice of infectious diseases, p. 760, 2015.
    [18] S. Takano, M. Kimura, T. Nito, H. Imagawa, K.-I. Sakakibara, and N. Tayama,
    “Clinical analysis of presbylarynx—vocal fold atrophy in elderly individuals,”
    Auris Nasus Larynx, vol. 37, no. 4, pp. 461–464, 2010.
    [19] C. N. Ford, K. Inagi, A. Khidr, D. M. Bless, and K. W. Gilchrist, “Sulcus
    vocalis: a rational analytical approach to diagnosis and management,” Annals
    of otology, rhinology & laryngology, vol. 105, no. 3, pp. 189–200, 1996.
    [20] H. M. Tucker, “Vocal cord paralysis—1979: etiology and management,” The
    Laryngoscope, vol. 90, no. 4, pp. 585–590, 1980.
    [21] H.-C. Chen, Y.-M. Jen, C.-H. Wang, J.-C. Lee, and Y.-S. Lin, “Etiology of
    vocal cord paralysis,” ORL, vol. 69, no. 3, pp. 167–171, 2007.
    [22] M. R. Naunheim and T. L. Carroll, “Benign vocal fold lesions: update on
    nomenclature, cause, diagnosis, and treatment,” Current opinion in otolaryngology & head and neck surgery, vol. 25, no. 6, pp. 453–458, 2017.
    [23] O. Kleinsasser, “Pathogenesis of vocal cord polyps,” Annals of Otology, Rhinology & Laryngology, vol. 91, no. 4, pp. 378–381, 1982.
    [24] K. Omori, “Diagnosis of voice disorders,” JMAJ, vol. 54, no. 4, pp. 248–253,
    2011.
    [25] M. M. Johns, “Update on the etiology, diagnosis, and treatment of vocal fold
    nodules, polyps, and cysts,” Current opinion in otolaryngology & head and
    neck surgery, vol. 11, no. 6, pp. 456–461, 2003.
    [26] M. Bouchayer, G. Cornut, R. Loire, J. B. Roch, E. Witzig, and R. W. Bastian,
    “Epidermoid cysts, sulci, and mucosal bridges of the true vocal cord: a report
    of 157 cases,” The Laryngoscope, vol. 95, no. 9, pp. 1087–1094, 1985.
    [27] S. M. Zeitels, G. W. Bunting, R. E. Hillman, and T. Vaughn, “Reinke’s edema:
    phonatory mechanisms and management strategies,” Annals of Otology, Rhinology & Laryngology, vol. 106, no. 7, pp. 533–543, 1997.
    [28] J. S. Isenberg, D. L. Crozier, and S. H. Dailey, “Institutional and comprehensive review of laryngeal leukoplakia,” Annals of Otology, Rhinology & Laryngology, vol. 117, no. 1, pp. 74–79, 2008.
    [29] M. Cattaruzza, P. Maisonneuve, and P. Boyle, “Epidemiology of laryngeal
    cancer,” European Journal of Cancer Part B: Oral Oncology, vol. 32, no. 5,
    pp. 293–305, 1996.
    [30] R. Nocini, G. Molteni, C. Mattiuzzi, and G. Lippi, “Updates on larynx cancer
    epidemiology,” Chinese Journal of Cancer Research, vol. 32, no. 1, p. 18,
    2020.
    [31] K. K. Paliwal, J. G. Lyons, and K. K. Wójcicki, “Preference for 20-40 ms
    window duration in speech analysis,” in 2010 4th International Conference on
    Signal Processing and Communication Systems, pp. 1–4, IEEE, 2010.
    [32] H. A. Fayed and A. F. Atiya, “Decision boundary clustering for efficient local
    svm,” Applied Soft Computing, vol. 110, p. 107628, 2021.
    [33] J. Milgram, M. Cheriet, and R. Sabourin, ““one against one”or “one
    against all”: Which one is better for handwriting recognition with svms?,”
    in tenth international workshop on Frontiers in handwriting recognition, Suvisoft, 2006.
    [34] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,”
    arXiv preprint arXiv:1511.08458, 2015.
    [35] R. Islam, M. Tarique, and E. Abdel-Raheem, “A survey on signal processing based pathological voice detection techniques,” IEEE Access, vol. 8,
    pp. 66749–66776, 2020.
    [36] K. Palanisamy, D. Singhania, and A. Yao, “Rethinking cnn models for audio
    classification,” arXiv preprint arXiv:2007.11154, 2020.

    QR CODE