簡易檢索 / 詳目顯示

研究生: 林琦靜
Lin, Chi-Jin
論文名稱: 不同發音方式所測得之電聲門圖與音訊綜合比較與分析
Joint comparison and analysis of EGG and audio signals measured during different types of phonation
指導教授: 劉奕汶
Liu, Yi-Wen
口試委員: 蔡振家
Tsai, Chen-Gia
鄭桂忠
Tang, Kea-Tiong
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 42
中文關鍵詞: 電聲門圖聲門逆濾波聲門氣流發聲方式
外文關鍵詞: Electroglottography, Glottal inverse filtering, Glottal flow, Phonation type
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究探討人聲在產生的過程中,作為聲源的聲帶之振動方式。發聲機制是
    由喉部精密的構造相互調節運作,使聲帶黏膜能夠以不同的方式擺盪振動,產生
    多元豐富的聲音。藉由電聲門圖儀器對聲帶的測量,並同步記錄聲音訊號,收集
    了三種不同發聲方式,分別為氣息發聲、模態發聲與壓迫發聲, 在 三種方式 中分
    別 蒐集了不同音高與五個主要的單母音。 在電聲門圖生理訊號 ,我們發現 其 波形
    與文獻中典型平滑的電聲門圖不同;各母音的發聲中,電聲門圖的波形具有不同
    特徵的波紋,在 K-近鄰演算法中,交叉驗證後的準確率為 0.536。而當使用不同
    發聲方式,閉合商的數值能夠區別各發聲方式,且使用 K-近鄰演算法交叉驗證
    後的準確率為 0.899,代表電聲門圖可以將不同發聲方式的聲帶振動記載至一定
    的程度。為了討論聲音訊號中的聲門資訊,本論文亦使用 聲門氣流模型迭代自適
    應逆濾波 的方法,將聲音訊號中的聲門訊號提取出來後,發現所提取的聲門訊號
    中,其波紋與電聲門圖相應位置的波形有相似的特徵。


    The thesis aims to explore the vocal fold vibration conditions during voice production. The human phonation mechanism adjusts the complex structures near the glottis and uses the laryngeal muscles to control the vibration of the vocal fold. Literature suggested that when a singer sings with different techniques, vocal fold mucosa would vibrate differently. In this research, the condition of vocal fold vibration is recorded via electroglottography (EGG). The EGG signal and the voice audio signal are simultaneously recorded, and we collected breathy, modal, and pressed phonation of notes that are sung with different pitches and in five distinct single vowels. We found that, qualitatively, the waveform of the EGG signal that we saw is not smooth as reported in several papers. The waveform shows ripples during each period of five single vowels. Thus, we implement K-nearest neighbor algorithm (KNN) to classify the waveforms of different vowels. The accuracy of KNN with cross-validation is 0.536, much higher than random guess (0.2). We also found that the EGG signal of different phonation types can be distinguished by the closure quotient. The accuracy of the KNN classification with cross-validation is 0.899. This result shows that the phonation type can be inferred from the EGG signal. Finally, we use a technique called glottal flow model iterative adaptive inverse filtering (GFM-IAIF) to extract the glottal source signal from the audio signal. The waveform of the extracted glottal source has similar characteristic ripples which resemble the waveform of the EGG signals.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 VI 表目錄 VIII 第一章 緒論 1 1.1 研究背景與動機 1 1.2 文獻回顧 1 1.3 研究方法 2 1.4 章節大綱 2 第二章 人聲發音產生機制 3 2.1 聲音產生的生理機制 3 2.1.1 呼吸機制 3 2.1.2 發聲機制 4 2.1.3 共鳴機制 5 2.1.4 構 音機制 5 2.2 喉部構造 6 2.3 發聲方式 (Phonation type) 8 第三章 電聲門圖之生理訊號 9 3.1 電聲門圖介紹電聲門圖介紹 9 3.2 EGG生理訊號與聲帶接觸面積關係生理訊號與聲帶接觸面積關係 10 3.3 EGG生理訊號的參數生理訊號的參數 11 第四章 電聲門圖資料收集與訊號分析電聲門圖資料收集與訊號分析 14 4.1 電聲門圖資料收集電聲門圖資料收集 14 4.2 電聲門圖資料電聲門圖資料 16 4.3 分析分析EGG波形之參數波形之參數 19 4.4 分析電聲門圖之波形分析電聲門圖之波形 20 4.4.1 K-近鄰演算法近鄰演算法分類分類 20 4.4.2 分析電聲門圖對五個母音之發聲分析電聲門圖對五個母音之發聲 21 4.4.3 分析電聲門圖三個發聲方式分析電聲門圖三個發聲方式 24 第五章 音訊至生理訊號映射關係音訊至生理訊號映射關係 25 5.1 聲門逆濾波聲門逆濾波 (Glottal Inverse Filtering) 25 5.2 聲門氣流模型迭代自適應逆濾波聲門氣流模型迭代自適應逆濾波 26 5.3 GFM-IAIF分析結果分析結果 28 5.4 GFM-IAIF與與EGG相似度分析相似度分析 30 5.4.1 動態時間扭曲動態時間扭曲 30 5.4.2 相關係數相關係數 34 第六章 結論與未來展望結論與未來展望 36 參考文獻 37 附錄 41

    [1] S. Hertegard, “What have we learned about laryngeal physiology from high-speed digital videoendoscopy?” Curr. Opin Otolaryngol. Head Neck Surg, vol. 13, no. 3, pp. 152–156, 2005. [2] M. F. Pedersen, “Electroglottography compared with synchronized stroboscopy in normal persons,” Folia Phoniat, vol. 29, pp. 191-199, 1977. [3] G. Fant, Acoustic Theory of Speech Production, The Hague, The Netherlands: Mouton, 1960. [4] S. Hertegård, J. Gauffin, and I. Karlsson, “Physiological correlates of the inverse filtered flow waveform,” Journal of Voice, vol. 6, no. 3, pp. 224-234, 1992. [5] C. H. Hertz, K. Lindström, and B. Sonesson, “Ultrasonic recording of the vibrating vocal folds: a preliminary report,” Acta oto-laryngologica, vol. 69, no. 1-6, pp. 223-230, 1970. [6] H. Hirose, “The activity of the adductor laryngeal muscles in respect to vowel devoicing in Japanese,” Phonetica, vol. 23, no. 3, pp. 156-170, 1971. [7] M. Schroeder, “Reference signal for signal quality studies,” The Journal of the Acoustical Society of America, vol. 44, no. 6, pp. 1735-1736, 1968. [8] K. S. R. Murty and B. Yegnanarayana, “Epoch extraction from speech signals,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1602-1613, 2008. [9] T. Drugman and T. Dutoit, “Glottal closure and opening instant detection from speech signals,” in Proc. INTERSPEECH, pp. 2891–2894, 2009., [10] N. d'Alessandro, C. d'Alessandro, S. Le Beux, and B. Doval, “Real-time CALM Synthesizer: New Approaches in Hands-Controlled Voice Synthesis,” in Proceedings of International Conference on New Interfaces for Musical Expression, vol. 6, pp. 266-271, 2006. [11] S. J. England, D. Bartlett Jr, and J. A. Daubenspeck, “Influence of human vocal cord movements on airflow and resistance during eupnea,” Journal of Applied Physiology, vol. 52, no. 3, pp. 773-779, 1982. [12] J. Iwarsson, M. Thomasson, and J. Sundberg, “Effects of lung volume on the glottal voice source.” Journal of Voice, vol. 12, no. 4, pp. 424-433, 1998. [13] E. Goodyer, F. Müller, M. Hess, K. Kandan, and F. Farukh, “Biomechanical flow amplification arising from the variable deformation of the subglottic mucosa,” Journal of Voice, vol. 31, no. 6, pp. 669-674, 2017. [14] B. Elie and Y. Laprie, “Acoustic impact of the gradual glottal abduction degree on the production of fricatives: A numerical study,” The Journal of the Acoustical Society of America, vol. 142, no. 3, pp. 1303-1317, 2017. [15] I. R. Titze, L. Maxfield, and A. Palaparthi, “An oral pressure conversion ratio as a predictor of vocal efficiency, ” Journal of Voice, vol. 30, no. 4, pp. 398-406, 2016. [16] D. G. Miller, Registers in Singing: Empirical and Systematic Studies in the Theory of the Singing Voice, Rijksuniversiteit Groningen, 2000. [17] B. E. Lindblom and J. E. Sundberg, “Acoustical consequences of lip, tongue, jaw, and larynx
    38
    movement,” The Journal of the Acoustical Society of America, vol. 50, no. 4B, pp. 1166-1179, 1971. [18] B. H. Story, “Mechanisms of voice production,” The Handbook of Speech Production, pp. 34-58, 2015. [19] L. R. Hernandez-Miranda and C Birchmeier, “Mechanisms and neuronal control of vocalization in vertebrates,” Opera Medica et Physiologica, vol. 4, no. 2, pp. 50-62, 2018. [20] D. K. Chhetri, J. Neubauer, and D. A. Berry, “Neuromuscular control of fundamental frequency and glottal posture at phonation onset, ” The Journal of the Acoustical Society of America, vol. 131, no. 2, pp. 1401-1412, 2012. [21] I. R. Titze, “Bi-stable vocal fold adduction: A mechanism of modal-falsetto register shifts and mixed registration. ” The Journal of the Acoustical Society of America, vol. 135, no. 4,pp. 2091-2101, 2014. [22] J. Sundberg, The Science of Singing Voice, Northern Illinois University Press, Dekalb, Il, 1987. [23] M. P. Karnell, “Synchronized videostroboscopy and electroglottography,” Journal of Voice, vol. 3, no. 1, pp. 68-75, 1989. [24] M. Hirano, “Vocal mechanisms in singing: laryngological and phoniatric aspects. ” Journal of Voice,vol. 2, no. 1, pp. 51-69, 1988. [25] M. Rothenberg and J. J. Mahshi, “Monitoring vocal fold abduction through vocal fold contact area. ” Journal of Speech, Language, and Hearing Research, vol. 31, no. 3, pp. 338-351, 1988. [26] C. Dromey, E. T. Stathopoulos, and C. M. Sapienza, “Glottal airflow and electroglottographic measures of vocal function at multiple intensities. ” Journal of Voice, vol. 6, no. 1, pp. 44-54, 1992. [27] V. Hampala, M. Garcia, J. G. Švec, R. C. Scherer, and C. T. Herbst, “Relationship between the electroglottographic signal and vocal fold contact area. ” Journal of Voice, vol. 30, no. 2, pp. 161-171, 2016. [28] N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation,” The Journal of the Acoustical Society of America, vol. 115, no. 3, pp. 1321-1332, 2004. [29] D. G. Childers and A. K. Krishnamurthy. “A critical review of electroglottography,” Critical Reviews in Biomedical Engineering, vol. 12, no. 2, pp. 131-161, 1985. [30] D. G. Childers, D. M. Hicks, G. P. Moore, and Y. A. Alsaka, “A model for vocal fold vibratory motion, contact area, and the electroglottogram,” The Journal of the Acoustical Society of America, vol. 80, no. 5, pp. 1309-1320, 1986. [31] E. Kankare, A. M. Laukkanen, I. lomäki, A. Miettinen, and T. Pylkkänen, “Electroglottographic contact quotient in different phonation types using different amplitude threshold levels,” Logopedics Phoniatrics Vocology, vol. 37, no. 3, pp. 127-132, 2012. [32] D. G. Childers, D. M. Hicks, G. P. Moore, L. Eskenazi, and A. L. Lalwani, “Electroglottography and vocal fold physiology,” Journal of Speech, Language, and Hearing Research, vol. 33, no. 2, pp. 245-254, 1990.
    39
    [33] R. J. Baken, “Electroglottography,” Journal of Voice, vol. 6, no. 2, pp. 98-110, 1992. [34] J. Sundberg, “Vocal fold vibration patterns and phonatory modes,” STL-Quarterly Progress and Status Report, vol. 35, pp. 69-80, 1994. [35] M. B. Higgins, R, Netsell, and L. Schulte, “Vowel-related differences in laryngeal articulatory and phonatory function,” Journal of Speech, Language, and Hearing Research, vol. 41, no. 4, pp. 712-724, 1998. [36] R. S. Prasad and B. Yegnanarayana, “Determination of glottal open regions by exploiting changes in the vocal tract system characteristics,” The Journal of the Acoustical Society of America, vol. 140, no. 1, pp. 666-677, 2016. [37] N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency,” The Journal of the Acoustical Society of America, vol. 117, no. 3, pp. 1417-1430, 2005. [38] M. B. Higgins, R. Netsell, and L. Schulte, “Vowel-related differences in laryngeal articulatory and phonatory function,” Journal of Speech, Language, and Hearing Research, vol. 41, no. 4, pp. 712-724, 1998. [39] N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation,” The Journal of the Acoustical Society of America, vol. 115, no. 3, pp. 1321-1332, 2004. [40] S. Hertegård, J. Gauffin, and I. Karlsson. “Physiological correlates of the inverse filtered flow waveform,” Journal of Voice, vol. 6, no. 3, pp. 224-234, 1992. [41] A. P. Singh, “Analysis of variants of KNN algorithm based on preprocessing techniques,” 2018 IEEE International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 186-191, 2018. [42] L.-Y. Hu, M.-W. Huang, S.-W. Ke, and C.-F. Tsai, “The distance function effect on k-nearest neighbor classification for medical datasets. ” SpringerPlus, vol. 5, no. 1, pp. 1304, 2016. [43] P. Alku, “Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering,” Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992. [44] G. Fant, , J. Liljencrants, and Q.-G. Lin, “A four-parameter model of glottal flow,” STL-Quarterly Progress and Status Report, vol. 4, pp. 1-13, 1985. [45] O. Perrotin and I. McLoughlin, “A spectral glottal flow model for source-filter separation of speech,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 7160–7164, 2019. [46] T. Drugman, “Residual excitation skewness for automatic speech polarity detection,” IEEE Signal Processing Letters, vol.20, no.4, pp. 387-390, 2013. [47] S. B. Sunil Kumar, K. S. Rao, and T. Mandal, “Accurate synchronization of speech and EGG signal using phase information,” in Proc. INTERSPEECH, pp. 694-698, 2017. [48] A. C. Lammert and S. S. Narayanan, “On short-time estimation of vocal tract length from formant frequencies,” PloS ONE, vol. 10, no. 7, e0132193, 2015.
    40
    [49] D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series,” in Proc. Knowledge Discovery and Delivery Workshop, vol. 10, no. 16, pp. 359-370, 1994. [50] T. R. Derrick, B. T. Bates, and J. S. Dufek, “Evaluation of time-series data sets using the Pearson product-moment correlation coefficient,” Medicine and Science in Sports and Exercise, vol. 26, no. 7, pp. 919-928, 1994. [51] M.T. Puth, M. Neuhäuser, and G. D. Ruxton, “Effective use of Pearson's product–moment correlation coefficient,” Animal Behaviour, vol. 93, pp. 183-189, 2014. [52] D. E. Hinkle, W. Wiersma, and S. G. Jurs, Applied Statistics for the Behavioral Sciences, vol. 663, Houghton Mifflin College Division, 2003. [53] A. P. Prathosh, V. Srivastava, and M. Mishra, “Adversarial approximate inference for speech to electroglottograph conversion,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2183-2196, 2019.

    QR CODE