研究生: |
林琦靜 Lin, Chi-Jin |
---|---|
論文名稱: |
不同發音方式所測得之電聲門圖與音訊綜合比較與分析 Joint comparison and analysis of EGG and audio signals measured during different types of phonation |
指導教授: |
劉奕汶
Liu, Yi-Wen |
口試委員: |
蔡振家
Tsai, Chen-Gia 鄭桂忠 Tang, Kea-Tiong |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 42 |
中文關鍵詞: | 電聲門圖 、聲門逆濾波 、聲門氣流 、發聲方式 |
外文關鍵詞: | Electroglottography, Glottal inverse filtering, Glottal flow, Phonation type |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究探討人聲在產生的過程中,作為聲源的聲帶之振動方式。發聲機制是
由喉部精密的構造相互調節運作,使聲帶黏膜能夠以不同的方式擺盪振動,產生
多元豐富的聲音。藉由電聲門圖儀器對聲帶的測量,並同步記錄聲音訊號,收集
了三種不同發聲方式,分別為氣息發聲、模態發聲與壓迫發聲, 在 三種方式 中分
別 蒐集了不同音高與五個主要的單母音。 在電聲門圖生理訊號 ,我們發現 其 波形
與文獻中典型平滑的電聲門圖不同;各母音的發聲中,電聲門圖的波形具有不同
特徵的波紋,在 K-近鄰演算法中,交叉驗證後的準確率為 0.536。而當使用不同
發聲方式,閉合商的數值能夠區別各發聲方式,且使用 K-近鄰演算法交叉驗證
後的準確率為 0.899,代表電聲門圖可以將不同發聲方式的聲帶振動記載至一定
的程度。為了討論聲音訊號中的聲門資訊,本論文亦使用 聲門氣流模型迭代自適
應逆濾波 的方法,將聲音訊號中的聲門訊號提取出來後,發現所提取的聲門訊號
中,其波紋與電聲門圖相應位置的波形有相似的特徵。
The thesis aims to explore the vocal fold vibration conditions during voice production. The human phonation mechanism adjusts the complex structures near the glottis and uses the laryngeal muscles to control the vibration of the vocal fold. Literature suggested that when a singer sings with different techniques, vocal fold mucosa would vibrate differently. In this research, the condition of vocal fold vibration is recorded via electroglottography (EGG). The EGG signal and the voice audio signal are simultaneously recorded, and we collected breathy, modal, and pressed phonation of notes that are sung with different pitches and in five distinct single vowels. We found that, qualitatively, the waveform of the EGG signal that we saw is not smooth as reported in several papers. The waveform shows ripples during each period of five single vowels. Thus, we implement K-nearest neighbor algorithm (KNN) to classify the waveforms of different vowels. The accuracy of KNN with cross-validation is 0.536, much higher than random guess (0.2). We also found that the EGG signal of different phonation types can be distinguished by the closure quotient. The accuracy of the KNN classification with cross-validation is 0.899. This result shows that the phonation type can be inferred from the EGG signal. Finally, we use a technique called glottal flow model iterative adaptive inverse filtering (GFM-IAIF) to extract the glottal source signal from the audio signal. The waveform of the extracted glottal source has similar characteristic ripples which resemble the waveform of the EGG signals.
[1] S. Hertegard, “What have we learned about laryngeal physiology from high-speed digital videoendoscopy?” Curr. Opin Otolaryngol. Head Neck Surg, vol. 13, no. 3, pp. 152–156, 2005. [2] M. F. Pedersen, “Electroglottography compared with synchronized stroboscopy in normal persons,” Folia Phoniat, vol. 29, pp. 191-199, 1977. [3] G. Fant, Acoustic Theory of Speech Production, The Hague, The Netherlands: Mouton, 1960. [4] S. Hertegård, J. Gauffin, and I. Karlsson, “Physiological correlates of the inverse filtered flow waveform,” Journal of Voice, vol. 6, no. 3, pp. 224-234, 1992. [5] C. H. Hertz, K. Lindström, and B. Sonesson, “Ultrasonic recording of the vibrating vocal folds: a preliminary report,” Acta oto-laryngologica, vol. 69, no. 1-6, pp. 223-230, 1970. [6] H. Hirose, “The activity of the adductor laryngeal muscles in respect to vowel devoicing in Japanese,” Phonetica, vol. 23, no. 3, pp. 156-170, 1971. [7] M. Schroeder, “Reference signal for signal quality studies,” The Journal of the Acoustical Society of America, vol. 44, no. 6, pp. 1735-1736, 1968. [8] K. S. R. Murty and B. Yegnanarayana, “Epoch extraction from speech signals,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1602-1613, 2008. [9] T. Drugman and T. Dutoit, “Glottal closure and opening instant detection from speech signals,” in Proc. INTERSPEECH, pp. 2891–2894, 2009., [10] N. d'Alessandro, C. d'Alessandro, S. Le Beux, and B. Doval, “Real-time CALM Synthesizer: New Approaches in Hands-Controlled Voice Synthesis,” in Proceedings of International Conference on New Interfaces for Musical Expression, vol. 6, pp. 266-271, 2006. [11] S. J. England, D. Bartlett Jr, and J. A. Daubenspeck, “Influence of human vocal cord movements on airflow and resistance during eupnea,” Journal of Applied Physiology, vol. 52, no. 3, pp. 773-779, 1982. [12] J. Iwarsson, M. Thomasson, and J. Sundberg, “Effects of lung volume on the glottal voice source.” Journal of Voice, vol. 12, no. 4, pp. 424-433, 1998. [13] E. Goodyer, F. Müller, M. Hess, K. Kandan, and F. Farukh, “Biomechanical flow amplification arising from the variable deformation of the subglottic mucosa,” Journal of Voice, vol. 31, no. 6, pp. 669-674, 2017. [14] B. Elie and Y. Laprie, “Acoustic impact of the gradual glottal abduction degree on the production of fricatives: A numerical study,” The Journal of the Acoustical Society of America, vol. 142, no. 3, pp. 1303-1317, 2017. [15] I. R. Titze, L. Maxfield, and A. Palaparthi, “An oral pressure conversion ratio as a predictor of vocal efficiency, ” Journal of Voice, vol. 30, no. 4, pp. 398-406, 2016. [16] D. G. Miller, Registers in Singing: Empirical and Systematic Studies in the Theory of the Singing Voice, Rijksuniversiteit Groningen, 2000. [17] B. E. Lindblom and J. E. Sundberg, “Acoustical consequences of lip, tongue, jaw, and larynx
38
movement,” The Journal of the Acoustical Society of America, vol. 50, no. 4B, pp. 1166-1179, 1971. [18] B. H. Story, “Mechanisms of voice production,” The Handbook of Speech Production, pp. 34-58, 2015. [19] L. R. Hernandez-Miranda and C Birchmeier, “Mechanisms and neuronal control of vocalization in vertebrates,” Opera Medica et Physiologica, vol. 4, no. 2, pp. 50-62, 2018. [20] D. K. Chhetri, J. Neubauer, and D. A. Berry, “Neuromuscular control of fundamental frequency and glottal posture at phonation onset, ” The Journal of the Acoustical Society of America, vol. 131, no. 2, pp. 1401-1412, 2012. [21] I. R. Titze, “Bi-stable vocal fold adduction: A mechanism of modal-falsetto register shifts and mixed registration. ” The Journal of the Acoustical Society of America, vol. 135, no. 4,pp. 2091-2101, 2014. [22] J. Sundberg, The Science of Singing Voice, Northern Illinois University Press, Dekalb, Il, 1987. [23] M. P. Karnell, “Synchronized videostroboscopy and electroglottography,” Journal of Voice, vol. 3, no. 1, pp. 68-75, 1989. [24] M. Hirano, “Vocal mechanisms in singing: laryngological and phoniatric aspects. ” Journal of Voice,vol. 2, no. 1, pp. 51-69, 1988. [25] M. Rothenberg and J. J. Mahshi, “Monitoring vocal fold abduction through vocal fold contact area. ” Journal of Speech, Language, and Hearing Research, vol. 31, no. 3, pp. 338-351, 1988. [26] C. Dromey, E. T. Stathopoulos, and C. M. Sapienza, “Glottal airflow and electroglottographic measures of vocal function at multiple intensities. ” Journal of Voice, vol. 6, no. 1, pp. 44-54, 1992. [27] V. Hampala, M. Garcia, J. G. Švec, R. C. Scherer, and C. T. Herbst, “Relationship between the electroglottographic signal and vocal fold contact area. ” Journal of Voice, vol. 30, no. 2, pp. 161-171, 2016. [28] N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation,” The Journal of the Acoustical Society of America, vol. 115, no. 3, pp. 1321-1332, 2004. [29] D. G. Childers and A. K. Krishnamurthy. “A critical review of electroglottography,” Critical Reviews in Biomedical Engineering, vol. 12, no. 2, pp. 131-161, 1985. [30] D. G. Childers, D. M. Hicks, G. P. Moore, and Y. A. Alsaka, “A model for vocal fold vibratory motion, contact area, and the electroglottogram,” The Journal of the Acoustical Society of America, vol. 80, no. 5, pp. 1309-1320, 1986. [31] E. Kankare, A. M. Laukkanen, I. lomäki, A. Miettinen, and T. Pylkkänen, “Electroglottographic contact quotient in different phonation types using different amplitude threshold levels,” Logopedics Phoniatrics Vocology, vol. 37, no. 3, pp. 127-132, 2012. [32] D. G. Childers, D. M. Hicks, G. P. Moore, L. Eskenazi, and A. L. Lalwani, “Electroglottography and vocal fold physiology,” Journal of Speech, Language, and Hearing Research, vol. 33, no. 2, pp. 245-254, 1990.
39
[33] R. J. Baken, “Electroglottography,” Journal of Voice, vol. 6, no. 2, pp. 98-110, 1992. [34] J. Sundberg, “Vocal fold vibration patterns and phonatory modes,” STL-Quarterly Progress and Status Report, vol. 35, pp. 69-80, 1994. [35] M. B. Higgins, R, Netsell, and L. Schulte, “Vowel-related differences in laryngeal articulatory and phonatory function,” Journal of Speech, Language, and Hearing Research, vol. 41, no. 4, pp. 712-724, 1998. [36] R. S. Prasad and B. Yegnanarayana, “Determination of glottal open regions by exploiting changes in the vocal tract system characteristics,” The Journal of the Acoustical Society of America, vol. 140, no. 1, pp. 666-677, 2016. [37] N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency,” The Journal of the Acoustical Society of America, vol. 117, no. 3, pp. 1417-1430, 2005. [38] M. B. Higgins, R. Netsell, and L. Schulte, “Vowel-related differences in laryngeal articulatory and phonatory function,” Journal of Speech, Language, and Hearing Research, vol. 41, no. 4, pp. 712-724, 1998. [39] N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation,” The Journal of the Acoustical Society of America, vol. 115, no. 3, pp. 1321-1332, 2004. [40] S. Hertegård, J. Gauffin, and I. Karlsson. “Physiological correlates of the inverse filtered flow waveform,” Journal of Voice, vol. 6, no. 3, pp. 224-234, 1992. [41] A. P. Singh, “Analysis of variants of KNN algorithm based on preprocessing techniques,” 2018 IEEE International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 186-191, 2018. [42] L.-Y. Hu, M.-W. Huang, S.-W. Ke, and C.-F. Tsai, “The distance function effect on k-nearest neighbor classification for medical datasets. ” SpringerPlus, vol. 5, no. 1, pp. 1304, 2016. [43] P. Alku, “Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering,” Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992. [44] G. Fant, , J. Liljencrants, and Q.-G. Lin, “A four-parameter model of glottal flow,” STL-Quarterly Progress and Status Report, vol. 4, pp. 1-13, 1985. [45] O. Perrotin and I. McLoughlin, “A spectral glottal flow model for source-filter separation of speech,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 7160–7164, 2019. [46] T. Drugman, “Residual excitation skewness for automatic speech polarity detection,” IEEE Signal Processing Letters, vol.20, no.4, pp. 387-390, 2013. [47] S. B. Sunil Kumar, K. S. Rao, and T. Mandal, “Accurate synchronization of speech and EGG signal using phase information,” in Proc. INTERSPEECH, pp. 694-698, 2017. [48] A. C. Lammert and S. S. Narayanan, “On short-time estimation of vocal tract length from formant frequencies,” PloS ONE, vol. 10, no. 7, e0132193, 2015.
40
[49] D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series,” in Proc. Knowledge Discovery and Delivery Workshop, vol. 10, no. 16, pp. 359-370, 1994. [50] T. R. Derrick, B. T. Bates, and J. S. Dufek, “Evaluation of time-series data sets using the Pearson product-moment correlation coefficient,” Medicine and Science in Sports and Exercise, vol. 26, no. 7, pp. 919-928, 1994. [51] M.T. Puth, M. Neuhäuser, and G. D. Ruxton, “Effective use of Pearson's product–moment correlation coefficient,” Animal Behaviour, vol. 93, pp. 183-189, 2014. [52] D. E. Hinkle, W. Wiersma, and S. G. Jurs, Applied Statistics for the Behavioral Sciences, vol. 663, Houghton Mifflin College Division, 2003. [53] A. P. Prathosh, V. Srivastava, and M. Mishra, “Adversarial approximate inference for speech to electroglottograph conversion,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2183-2196, 2019.