研究生: |
彭上軒 Peng, Shang-Hsuan |
---|---|
論文名稱: |
從語音訊號估計發聲狀態下之口鼻腔截面積 Estimation of oral cavity and nasal cavity cross-section areas from speech signals during articulation |
指導教授: |
劉奕汶
Liu, Yi-Wen |
口試委員: |
李沛群
Pei-Chun Li 冀泰石 Tai-shih Chi |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2014 |
畢業學年度: | 103 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 線性預估 、逆散射 、分析格狀架構 |
外文關鍵詞: | linear prediction, inverse scattering, analysis lattice structure |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由語音訊號回推發聲腔道的截面積函數(Vocal Tract Area Function, VTAF)是
一種逆散射問題(inverse scattering problem)。VTAF 的估測結果在醫療上可應用於
失語症(aphasia)或吶語症(dysarthria)的臨床分析、聽障者的語言訓練、語言學習
與語音矯正等。它提供了一種聲音上視覺回饋的手段。
在Wakita 所建立的分析方法中,指出訊號分析上的逆向數位濾波器(inverse
digital filter)與物理上的聲學節管模型(acoustic tube model)兩者的濾波過程是相
同的。其過程可以透過線性預估編碼(linear prediction coding)與分析格狀架構
(analysis lattice structure)的轉換關係來解釋,並且使用Levinson Durbin 遞迴演算
法(Levinson-Durbin recursive algorithm)加速其運算效率。由於上述過程將濾波器
的轉移函數限制為全極點型式(all-pole type),整個發聲過程被描述為以聲帶振動
為發聲源,聲波由喉嚨傳遞至口腔而後由嘴唇發出的過程,其中並無考慮到鼻腔
共振所造成的影響。
本論文使用Schnell 與Lacroix 所出的方法,基於Burg-lattice 以遞迴的方式
將整個聲道共鳴腔的轉移函數描述為極-零點的型式(pole-zero type)。接著,根
據此轉移函數以及人體發聲腔道構造與大致的尺寸,設定合適的初始條件與邊界
條件,將發聲腔道描述為由主腔體(main tract)、口腔(oral cavity)與鼻腔(nasal
cavity)三個部分組合而成的三分支交界型態(three-branched model)。最後,我們
使用簡單的多項式因式分解,搭配地毯式搜索最佳解的手段,嘗試在考慮鼻腔共
振的情況下,進行口腔與鼻腔兩部份截面積的同時估測。我們以鼻音化與非鼻音
化的母音作為測試對象,驗證聲學節管模型與逆散射解之合理性。
Estimation of vocal tract area function (VTAF) from speech signals is an
inverse scattering problem. The medical application of VTAF includes clinical
analysis of aphasia and dysarthria, language training, and phoniatrics for the
hearing-impaired. The estimation results of VTAF provide a visual feedback on
auditory sense.
According to Wakita's method, the filtering processes of the inverse digital filter
in signal analysis and the acoustic tube model in physics are identical. It could be
confirmed by the relationship between linear prediction coding and analysis lattice
structure. The efficiency of calculation could be improved with the Levinson-Durbin
recursive algorithm. Due to the restriction of all-pole transfer functions, the speech
production was modeled by Wakita as follows: sound waves produced by the glottis
pass from the throat to the oral cavity and then radiate at lips. One could notice that
the procedure does not take the nasal resonance effect into consideration.
In this thesis, we use an iterative procedure proposed by Schnell and Lacroix to
obtain pole-zero type transfer functions. After that, we set up initial conditions and
boundary conditions based on the typical size and shape of the human vocal tract.
Last, with the assistance of factorization and exhaustive search, the vocal tract is
described as a three-branched model which could be divided into the main tract, the
oral cavity and the nasal cavity. Thus, we could estimate VTAF by considering the
nasal resonance effect. We use nasalized vowels and non-nasalized vowels to verify
the validity of the acoustic tube model and the solution of the inverse scattering
problem.
參考文獻
[1] 王小川, 語音訊號處理, 修訂二版. 全華圖書, 2008.
[2] C. T. Ferrand, Speech Science:An Integrated Approach to Theory and Clinical
Practice, 1st ed. Pearson, 2001.
[3] H. Wakita, “Direct estimation of the vocal tract shape by inverse filtering of
acoustic speech waveforms,” IEEE Trans. Audio Electroacoust., vol. 21, no. 5,
pp. 417–427, Oct. 1973.
[4] D. J. Ertmer, R. E. Stark, and G. R. Karlan, “Real-time spectrographic displays
in vowel production training with children who have profound hearing loss,”
Am. J. Speech-Language Pathol., vol. 5, no. 4, p. 4, Nov. 1996.
[5] S. G. Fletcher and A. Others, “Teaching vowels to profoundly
hearing-impaired speakers using glossometry.,” J. Speech Hear. Res., vol. 34,
no. 4, pp. 943–56, Nov. 1990.
[6] S. G. Fletcher and A. Others, “Teaching consonants to profoundly
hearing-impaired speakers using palatometry.,” J. Speech Hear. Res., vol. 34,
no. 4, pp. 929–42, Nov. 1990.
[7] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4,
pp. 561–580, 1975.
[8] N. Levinson, “The wiener rms (root mean square) error criterion in filter design
and prediction,” J. Math. Phys, vol. 25, no. 4, pp. 261–278, 1947.
[9] E. A. Robinson, Statistical communication and detection. New York: Hafner,
1967.
[10] J. Durbin, “The fitting of time-series models,” Rev. l’Institut Int. Stat. / Rev. Int.
Stat. Inst., vol. 28, no. 3, pp. 233–243, Dec. 1960.
[11] I.-T. Lim and B. G. Lee, “Lossless pole-zero modeling of speech signals,”
IEEE Trans. Speech Audio Process., vol. 1, no. 3, pp. 269–276, Jul. 1993.
[12] I.-T. Lim and B. G. Lee, “Lossy pole-zero modeling for speech signals,” IEEE
Trans. Speech Audio Process., vol. 4, no. 2, pp. 81–88, Mar. 1996.
50
[13] I.-T. Lim and B. G. Lee, “A generalized vocal tract model for pole-zero type
linear prediction (speech processing),” ICASSP-88., Int. Conf. Acoust. Speech,
Signal Process., pp. 687–690, 1988.
[14] A. Lacroix, “Improved vocal tract model for the analysis of nasal speech
sounds,” 1996 IEEE Int. Conf. Acoust. Speech, Signal Process. Conf. Proc., vol.
2, pp. 801–804, 1996.
[15] J. Schroeter, “Techniques for estimating vocal-tract shapes from the speech
signal,” IEEE Trans. Speech Audio Process., vol. 2, no. 1, pp. 133–150, 1994.
[16] H. Kamata, H. Oka, and Y. Ishida, “Estimation of vocal tract transfer function
considering the glottis open and close characteristics,” Proc. IEEE Pacific Rim
Conf. Commun. Comput. Signal Process., vol. 1, pp. 137–140, 1993.
[17] H. Kamata, K. Kawaguchi, Y. Ishida, and T. Honda, “Reconstruction of human
voice using parallel structure transfer function and its estimation error,” IEEE
Pacific Rim Conf. Commun. Comput. Signal Process. Proc., pp. 575–580,
1995.
[18] H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for
obtaining accurate estimates of vocal-tract filters and glottal waves from vowel
sounds,” IEEE Trans. Audio, Speech Lang. Process., vol. 14, no. 2, pp.
445–455, Mar. 2006.
[19] R. K. Ward, M. P. Beddoes, and M. Hodgson, “Estimating vocal-tract area
functions from vowel sound signals over closed glottal phases,” 2004 IEEE Int.
Conf. Acoust. Speech, Signal Process., vol. 1, pp. I–589–92, 2004.
[20] R. K. Ward, M. P. Beddoes, and M. Hodgson, “Effects of Glottal and Lip
Boundary Conditions on Vocal-Tract Area Function Estimates from Speech
Signals,” 2005 IEEE Int. Conf. Acoust. Speech, Signal Process., vol. 1, pp.
901–904, 2005.
[21] R. K. Ward, M. P. Beddoes, and D. O’Shaughnessy, “Obtaining LIP and
Glottal Reflection Coefficients from Vowel Sounds,” 2006 IEEE Int. Conf.
Acoust. Speed Signal Process. Proc., vol. 1, pp. I–373–I–376, 2006.
51
[22] H. Deng, R. Ward, and M. Beddoes, “Glottal waves via inverse filtering of
vowel sounds.,” Conf. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., vol. 7,
pp. 7000–3, Jan. 2005.
[23] J. Makhoul and R. Viswanathan, “Adaptive lattice methods for linear
prediction,” ICASSP ’78. IEEE Int. Conf. Acoust. Speech, Signal Process., vol.
3, pp. 83–86, 1978.
[24] T. Carter, “Study of an adaptive lattice structure for linear prediction analysis
of speech,” ICASSP ’78. IEEE Int. Conf. Acoust. Speech, Signal Process., vol.
3, pp. 27–30, 1978.
[25] J. Makhoul and L. Cosell, “Adaptive lattice analysis of speech,” IEEE Trans.
Acoust., vol. 29, no. 3, pp. 654–659, Jun. 1981.
[26] S. Dandapat and G. C. Ray, “Evaluation of vocal tract disorder using a variable
step adaptive filter,” Proc. First Reg. Conf. IEEE Eng. Med. Biol. Soc. 14th
Conf. Biomed. Eng. Soc. India. An Int. Meet., pp. 3/81–3/82, 1995.
[27] A. Lacroix, “Pole-zero modeling of vocal tract for fricative sounds,” 1997
IEEE Int. Conf. Acoust. Speech, Signal Process., vol. 3, pp. 1659–1662, 1997.
[28] P. Kabal and R. P. Ramachandran, “The computation of line spectral
frequencies using Chebyshev polynomials,” IEEE Trans. Acoust., vol. 34, no. 6,
pp. 1419–1426, Dec. 1986.
[29] J. D. Markel and A. H. Gray, Linear Prediction of Speech. New York: Springer
Verlag, 1976.
[30] A. V. Oppenheim & R. W. Schafer, Discrete-Time Signal Processing
International Edition, 3rd ed. Pearson, 2009.
[31] S. M. Kay and S. L. Marple, “Spectrum analysis A modern perspective,” Proc.
IEEE, vol. 69, no. 11, pp. 1380–1419, 1981.
[32] J. P. Burg, “Maximum entropy spectral analysis,” Standford Univ., Stanford,
CA, 1975.
[33] J. Makhoul, “Stable and efficient lattice methods for linear prediction,” IEEE
Trans. Acoust., vol. 25, no. 5, pp. 423–428, Oct. 1977.
52
[34] P. M. Morse and K. U. Ingard, Theoretical Acoustics. New York:Mcgraw-Hill,
1968, pp. 244–252.
[35] B. S. Atal, “Speech Analysis and Synthesis by Linear Prediction of the Speech
Wave,” J. Acoust. Soc. Am., vol. 50, no. 2B, p. 637, 1971.
[36] F. Itakura and S. Saito, “Digital Filtering Techniques for Speech Analysis and
Synthesis,” Proc.7th Int. Conf. Acoust., 1971.
[37] K. Schnell and A. Lacroix, “Pole zero estimation from speech signals by an
iterative procedure,” 2001 IEEE Int. Conf. Acoust. Speech, Signal Process.
Proc. (Cat. No.01CH37221), vol. 1, pp. 109–112, 2001.
[38] H.-K. Huang, Y.-W. Liu, and R. P.-Y. Chiang, “Detection of obstructive sleep
apnea by estimation of oral and nasal cavity cross-section areas from acoustic
recordings of snore,” Proc. Meet. Acoust., vol. 19, no. 1, pp. 060172–060172,
Jun. 2013.
[39] J. O. Smith, “Introduction to Digital Filters with Audio Applications.” [Online].
Available: https://ccrma.stanford.edu/~jos/filters/. [Accessed: 31-Jul-2014].
[40] K. S. Nataraj, P. C. Pandey, and M. S. Shah, “Improving the consistency of
vocal tract shape estimation,” 2011 Natl. Conf. Commun., pp. 1–5, Jan. 2011.
[41] J. L. Flanagan, Speech analysis synthesis and perception, 3rd ed. Springer
Verlag, 2008.
[42] J. W. Tukey, Exploratory Data Analysis. Addison-Wesley, 1977.