從語音訊號估計發聲狀態下之口鼻腔截面積｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	彭上軒 Peng, Shang-Hsuan
論文名稱：	從語音訊號估計發聲狀態下之口鼻腔截面積 Estimation of oral cavity and nasal cavity cross-section areas from speech signals during articulation
指導教授：	劉奕汶 Liu, Yi-Wen
口試委員:	李沛群 Pei-Chun Li 冀泰石 Tai-shih Chi
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2014
畢業學年度：	103
語文別：	中文
論文頁數：	52
中文關鍵詞：	線性預估、逆散射、分析格狀架構
外文關鍵詞：	linear prediction, inverse scattering, analysis lattice structure
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

由語音訊號回推發聲腔道的截面積函數(Vocal Tract Area Function, VTAF)是
一種逆散射問題(inverse scattering problem)。VTAF 的估測結果在醫療上可應用於
失語症(aphasia)或吶語症(dysarthria)的臨床分析、聽障者的語言訓練、語言學習
與語音矯正等。它提供了一種聲音上視覺回饋的手段。
在Wakita 所建立的分析方法中，指出訊號分析上的逆向數位濾波器(inverse
digital filter)與物理上的聲學節管模型(acoustic tube model)兩者的濾波過程是相
同的。其過程可以透過線性預估編碼(linear prediction coding)與分析格狀架構
(analysis lattice structure)的轉換關係來解釋，並且使用Levinson Durbin 遞迴演算
法(Levinson-Durbin recursive algorithm)加速其運算效率。由於上述過程將濾波器
的轉移函數限制為全極點型式(all-pole type)，整個發聲過程被描述為以聲帶振動
為發聲源，聲波由喉嚨傳遞至口腔而後由嘴唇發出的過程，其中並無考慮到鼻腔
共振所造成的影響。
本論文使用Schnell 與Lacroix 所出的方法，基於Burg-lattice 以遞迴的方式
將整個聲道共鳴腔的轉移函數描述為極－零點的型式(pole-zero type)。接著，根
據此轉移函數以及人體發聲腔道構造與大致的尺寸，設定合適的初始條件與邊界
條件，將發聲腔道描述為由主腔體(main tract)、口腔(oral cavity)與鼻腔(nasal
cavity)三個部分組合而成的三分支交界型態(three-branched model)。最後，我們
使用簡單的多項式因式分解，搭配地毯式搜索最佳解的手段，嘗試在考慮鼻腔共
振的情況下，進行口腔與鼻腔兩部份截面積的同時估測。我們以鼻音化與非鼻音
化的母音作為測試對象，驗證聲學節管模型與逆散射解之合理性。

Estimation of vocal tract area function (VTAF) from speech signals is an
inverse scattering problem. The medical application of VTAF includes clinical
analysis of aphasia and dysarthria, language training, and phoniatrics for the
hearing-impaired. The estimation results of VTAF provide a visual feedback on
auditory sense.
According to Wakita's method, the filtering processes of the inverse digital filter
in signal analysis and the acoustic tube model in physics are identical. It could be
confirmed by the relationship between linear prediction coding and analysis lattice
structure. The efficiency of calculation could be improved with the Levinson-Durbin
recursive algorithm. Due to the restriction of all-pole transfer functions, the speech
production was modeled by Wakita as follows: sound waves produced by the glottis
pass from the throat to the oral cavity and then radiate at lips. One could notice that
the procedure does not take the nasal resonance effect into consideration.
In this thesis, we use an iterative procedure proposed by Schnell and Lacroix to
obtain pole-zero type transfer functions. After that, we set up initial conditions and
boundary conditions based on the typical size and shape of the human vocal tract.
Last, with the assistance of factorization and exhaustive search, the vocal tract is
described as a three-branched model which could be divided into the main tract, the
oral cavity and the nasal cavity. Thus, we could estimate VTAF by considering the
nasal resonance effect. We use nasalized vowels and non-nasalized vowels to verify
the validity of the acoustic tube model and the solution of the inverse scattering
problem.

目次
摘要…………………………………………………………………………………….I
Abstract………………………………………………………………………………..II
誌謝…………………………………………………………………………………..III
目次…………………………………………………………………………………….i
圖目次………………………………………………………………………………...iii
表目次………………………………………………………………………………....v
第一章 緒論…………………………………………………………………………1
1.1 發音構造與機制…………………………………………………………....1
1.2 文獻回顧……………………………………………………………………3
1.3 研究動機與方向……………………………………………………………6
1.4 章節大綱……………………………………………………………………6
第二章 語音分析模型與方法………………………………………………………7
2.1 發聲腔道分析方法[3]……………………………………………………...8
2.1.1 逆向濾波器的最佳解……………………………………………8
2.1.2 聲學節管模型…………………………………………………..11
2.1.3 反轉移函數的關聯性與其應用………………………………..15
2.2 極－零點模型與其係數估測……………………………………………..18
2.2.1 線性預估模型[1][29][7]………………………………………..18
2.2.2 極－零點模型[37]………………………………………….…..20
2.3 考慮鼻腔效應之極－零點模型分析……………………………………..23
2.3.1 逐步降階法[39]…………………………….…………………..23
2.3.2 最佳拆解與地毯式搜索……………………………….……….24
2.4 實驗流程與設定…………………………………………………………..27
2.4.1 音訊資料收集…………………………………………………..27
ii
2.4.2 音框化…………………………………………………………..29
2.4.3 初始與邊界條件設定…………………………………………..29
2.4.4 實驗流程圖……………………………………………………..30
第三章 結果與討論………………………………………………………………..31
3.1 母音分析結果………………………………………………………..31
3.1.1 VTAF 估測結果的合理性………………………………….…..31
3.1.2 口腔結果分析…………………………………………………..33
3.1.3 鼻腔結果分析…………………………………………………..38
3.1.4 一致性分析……………………………………………………..40
3.2 一般母音與鼻音化母音之比較結果………………………………..42
3.2.1 鼻音化程度……………………………………………………..42
3.2.2 例外分析………………………………………………………..44
第四章 結論與未來展望…………………………………………………………..47
4.1 結論…………………………………………………………………..47
4.2 未來展望……………………………………………………………..48
參考文獻…………………………………………………. …………………………49
iii
圖目次
圖1.1 人類的發音構造…………………………………………………………....1
圖1.2 聲源濾波理論示意圖………………………………………………………3
圖1.3 三分支交界的格狀架構[11]….……………………………………………5
圖2.1 語音分析模型………………………………………………………………7
圖2.2 與圖2.1 等效之語音分析模型…………………………………………….8
圖2.3 發聲腔道之聲學節管模型[3]……………………………………….……11
圖2.4 逆向數位濾波器之分析格狀結構[3]…………………………...…..……17
圖2.5 語音發聲模型…………………………………………………..…………18
圖2.6 線性預估模型分析流程圖……………………………………………..…19
圖2.7 極－零點模型與其係數估測流程圖[38]……………………………..…21
圖2.8 三分支交界型態………………………………………………………..…24
圖2.9 錄音環境與儀器配置…………………………………………………..…27
圖2.10 母音舌位圖……………………………………………………………..…28
圖2.11 實驗流程圖……………………………………………………………..…30
圖3.1 原始VTAF 估測結果範例………………………...………………………31
圖3.2 VTAF 估測結果盒狀圖示意範例….…………………………………..…31
圖3.3 使用Wakita’s method 之VTAF 估測結果…………………..…………...32
圖3.4 連續發出母音/a/-/u/-/i/之口鼻腔CSA 估測結果1…………………....…33
圖3.5 母音舌位的矢狀切面…………………………………………………..…34
圖3.6 連續發出母音/a/-/u/-/i/之口鼻腔CSA 估測結果2……………..……..…35
圖3.7 連續發出母音/a/-/u/-/i/之口鼻腔CSA 估測結果3………………..…..…35
圖3.8 連續發出母音/a/-/e/-/i/之口鼻腔CSA 估測結果1………………..…..…36
圖3.9 連續發出母音/a/-/e/-/i/之口鼻腔CSA 估測結果2…………………....…37
圖3.10 連續發出母音/a/-/e/-/i/之口鼻腔CSA估測結果3………………….....…37
iv
圖3.11 連續發出母音/a/-/o/-/u/之口鼻腔CSA估測結果1…………………...….38
圖3.12 連續發出母音/a/-/o/-/u/之口鼻腔CSA估測結果2……………………....39
圖3.13 連續發出母音/a/-/o/-/u/之口鼻腔CSA估測結果3………………………39
圖3.14 一般基頻發出母音/a/之VTAF 估測結果……………………………..….41
圖3.15 較高基頻發出母音/a/之VTAF 估測結果………………………………...41
v
表目次
表3.1 母音的平均共振峰………………………………………………………..40
表3.2 母音/a/與鼻音化母音/a/的r 值分析…………………….………………..42
表3.3 母音/i/與鼻音化母音/i/的r 值分析…………………….…………..……..43
表3.4 母音/e/與鼻音化母音/e/的r 值分析…………………….………………..44
表3.5 母音/o/與鼻音化母音/o/的r 值分析…………………….………………..45
表3.6 母音/u/與鼻音化母音/u/的r 值分析…………………….………………..45
                                

參考文獻
[1] 王小川, 語音訊號處理, 修訂二版. 全華圖書, 2008.
[2] C. T. Ferrand, Speech Science:An Integrated Approach to Theory and Clinical
Practice, 1st ed. Pearson, 2001.
[3] H. Wakita, “Direct estimation of the vocal tract shape by inverse filtering of
acoustic speech waveforms,” IEEE Trans. Audio Electroacoust., vol. 21, no. 5,
pp. 417–427, Oct. 1973.
[4] D. J. Ertmer, R. E. Stark, and G. R. Karlan, “Real-time spectrographic displays
in vowel production training with children who have profound hearing loss,”
Am. J. Speech-Language Pathol., vol. 5, no. 4, p. 4, Nov. 1996.
[5] S. G. Fletcher and A. Others, “Teaching vowels to profoundly
hearing-impaired speakers using glossometry.,” J. Speech Hear. Res., vol. 34,
no. 4, pp. 943–56, Nov. 1990.
[6] S. G. Fletcher and A. Others, “Teaching consonants to profoundly
hearing-impaired speakers using palatometry.,” J. Speech Hear. Res., vol. 34,
no. 4, pp. 929–42, Nov. 1990.
[7] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4,
pp. 561–580, 1975.
[8] N. Levinson, “The wiener rms (root mean square) error criterion in filter design
and prediction,” J. Math. Phys, vol. 25, no. 4, pp. 261–278, 1947.
[9] E. A. Robinson, Statistical communication and detection. New York: Hafner,
1967.
[10] J. Durbin, “The fitting of time-series models,” Rev. l’Institut Int. Stat. / Rev. Int.
Stat. Inst., vol. 28, no. 3, pp. 233–243, Dec. 1960.
[11] I.-T. Lim and B. G. Lee, “Lossless pole-zero modeling of speech signals,”
IEEE Trans. Speech Audio Process., vol. 1, no. 3, pp. 269–276, Jul. 1993.
[12] I.-T. Lim and B. G. Lee, “Lossy pole-zero modeling for speech signals,” IEEE
Trans. Speech Audio Process., vol. 4, no. 2, pp. 81–88, Mar. 1996.
50
[13] I.-T. Lim and B. G. Lee, “A generalized vocal tract model for pole-zero type
linear prediction (speech processing),” ICASSP-88., Int. Conf. Acoust. Speech,
Signal Process., pp. 687–690, 1988.
[14] A. Lacroix, “Improved vocal tract model for the analysis of nasal speech
sounds,” 1996 IEEE Int. Conf. Acoust. Speech, Signal Process. Conf. Proc., vol.
2, pp. 801–804, 1996.
[15] J. Schroeter, “Techniques for estimating vocal-tract shapes from the speech
signal,” IEEE Trans. Speech Audio Process., vol. 2, no. 1, pp. 133–150, 1994.
[16] H. Kamata, H. Oka, and Y. Ishida, “Estimation of vocal tract transfer function
considering the glottis open and close characteristics,” Proc. IEEE Pacific Rim
Conf. Commun. Comput. Signal Process., vol. 1, pp. 137–140, 1993.
[17] H. Kamata, K. Kawaguchi, Y. Ishida, and T. Honda, “Reconstruction of human
voice using parallel structure transfer function and its estimation error,” IEEE
Pacific Rim Conf. Commun. Comput. Signal Process. Proc., pp. 575–580,
1995.
[18] H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for
obtaining accurate estimates of vocal-tract filters and glottal waves from vowel
sounds,” IEEE Trans. Audio, Speech Lang. Process., vol. 14, no. 2, pp.
445–455, Mar. 2006.
[19] R. K. Ward, M. P. Beddoes, and M. Hodgson, “Estimating vocal-tract area
functions from vowel sound signals over closed glottal phases,” 2004 IEEE Int.
Conf. Acoust. Speech, Signal Process., vol. 1, pp. I–589–92, 2004.
[20] R. K. Ward, M. P. Beddoes, and M. Hodgson, “Effects of Glottal and Lip
Boundary Conditions on Vocal-Tract Area Function Estimates from Speech
Signals,” 2005 IEEE Int. Conf. Acoust. Speech, Signal Process., vol. 1, pp.
901–904, 2005.
[21] R. K. Ward, M. P. Beddoes, and D. O’Shaughnessy, “Obtaining LIP and
Glottal Reflection Coefficients from Vowel Sounds,” 2006 IEEE Int. Conf.
Acoust. Speed Signal Process. Proc., vol. 1, pp. I–373–I–376, 2006.
51
[22] H. Deng, R. Ward, and M. Beddoes, “Glottal waves via inverse filtering of
vowel sounds.,” Conf. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., vol. 7,
pp. 7000–3, Jan. 2005.
[23] J. Makhoul and R. Viswanathan, “Adaptive lattice methods for linear
prediction,” ICASSP ’78. IEEE Int. Conf. Acoust. Speech, Signal Process., vol.
3, pp. 83–86, 1978.
[24] T. Carter, “Study of an adaptive lattice structure for linear prediction analysis
of speech,” ICASSP ’78. IEEE Int. Conf. Acoust. Speech, Signal Process., vol.
3, pp. 27–30, 1978.
[25] J. Makhoul and L. Cosell, “Adaptive lattice analysis of speech,” IEEE Trans.
Acoust., vol. 29, no. 3, pp. 654–659, Jun. 1981.
[26] S. Dandapat and G. C. Ray, “Evaluation of vocal tract disorder using a variable
step adaptive filter,” Proc. First Reg. Conf. IEEE Eng. Med. Biol. Soc. 14th
Conf. Biomed. Eng. Soc. India. An Int. Meet., pp. 3/81–3/82, 1995.
[27] A. Lacroix, “Pole-zero modeling of vocal tract for fricative sounds,” 1997
IEEE Int. Conf. Acoust. Speech, Signal Process., vol. 3, pp. 1659–1662, 1997.
[28] P. Kabal and R. P. Ramachandran, “The computation of line spectral
frequencies using Chebyshev polynomials,” IEEE Trans. Acoust., vol. 34, no. 6,
pp. 1419–1426, Dec. 1986.
[29] J. D. Markel and A. H. Gray, Linear Prediction of Speech. New York: Springer
Verlag, 1976.
[30] A. V. Oppenheim & R. W. Schafer, Discrete-Time Signal Processing
International Edition, 3rd ed. Pearson, 2009.
[31] S. M. Kay and S. L. Marple, “Spectrum analysis A modern perspective,” Proc.
IEEE, vol. 69, no. 11, pp. 1380–1419, 1981.
[32] J. P. Burg, “Maximum entropy spectral analysis,” Standford Univ., Stanford,
CA, 1975.
[33] J. Makhoul, “Stable and efficient lattice methods for linear prediction,” IEEE
Trans. Acoust., vol. 25, no. 5, pp. 423–428, Oct. 1977.
52
[34] P. M. Morse and K. U. Ingard, Theoretical Acoustics. New York:Mcgraw-Hill,
1968, pp. 244–252.
[35] B. S. Atal, “Speech Analysis and Synthesis by Linear Prediction of the Speech
Wave,” J. Acoust. Soc. Am., vol. 50, no. 2B, p. 637, 1971.
[36] F. Itakura and S. Saito, “Digital Filtering Techniques for Speech Analysis and
Synthesis,” Proc.7th Int. Conf. Acoust., 1971.
[37] K. Schnell and A. Lacroix, “Pole zero estimation from speech signals by an
iterative procedure,” 2001 IEEE Int. Conf. Acoust. Speech, Signal Process.
Proc. (Cat. No.01CH37221), vol. 1, pp. 109–112, 2001.
[38] H.-K. Huang, Y.-W. Liu, and R. P.-Y. Chiang, “Detection of obstructive sleep
apnea by estimation of oral and nasal cavity cross-section areas from acoustic
recordings of snore,” Proc. Meet. Acoust., vol. 19, no. 1, pp. 060172–060172,
Jun. 2013.
[39] J. O. Smith, “Introduction to Digital Filters with Audio Applications.” [Online].
Available: https://ccrma.stanford.edu/~jos/filters/. [Accessed: 31-Jul-2014].
[40] K. S. Nataraj, P. C. Pandey, and M. S. Shah, “Improving the consistency of
vocal tract shape estimation,” 2011 Natl. Conf. Commun., pp. 1–5, Jan. 2011.
[41] J. L. Flanagan, Speech analysis synthesis and perception, 3rd ed. Springer
Verlag, 2008.
[42] J. W. Tukey, Exploratory Data Analysis. Addison-Wesley, 1977.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文