簡易檢索 / 詳目顯示

研究生: 彭上軒
Peng, Shang-Hsuan
論文名稱: 從語音訊號估計發聲狀態下之口鼻腔截面積
Estimation of oral cavity and nasal cavity cross-section areas from speech signals during articulation
指導教授: 劉奕汶
Liu, Yi-Wen
口試委員: 李沛群
Pei-Chun Li
冀泰石
Tai-shih Chi
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2014
畢業學年度: 103
語文別: 中文
論文頁數: 52
中文關鍵詞: 線性預估逆散射分析格狀架構
外文關鍵詞: linear prediction, inverse scattering, analysis lattice structure
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由語音訊號回推發聲腔道的截面積函數(Vocal Tract Area Function, VTAF)是
    一種逆散射問題(inverse scattering problem)。VTAF 的估測結果在醫療上可應用於
    失語症(aphasia)或吶語症(dysarthria)的臨床分析、聽障者的語言訓練、語言學習
    與語音矯正等。它提供了一種聲音上視覺回饋的手段。
    在Wakita 所建立的分析方法中,指出訊號分析上的逆向數位濾波器(inverse
    digital filter)與物理上的聲學節管模型(acoustic tube model)兩者的濾波過程是相
    同的。其過程可以透過線性預估編碼(linear prediction coding)與分析格狀架構
    (analysis lattice structure)的轉換關係來解釋,並且使用Levinson Durbin 遞迴演算
    法(Levinson-Durbin recursive algorithm)加速其運算效率。由於上述過程將濾波器
    的轉移函數限制為全極點型式(all-pole type),整個發聲過程被描述為以聲帶振動
    為發聲源,聲波由喉嚨傳遞至口腔而後由嘴唇發出的過程,其中並無考慮到鼻腔
    共振所造成的影響。
    本論文使用Schnell 與Lacroix 所出的方法,基於Burg-lattice 以遞迴的方式
    將整個聲道共鳴腔的轉移函數描述為極-零點的型式(pole-zero type)。接著,根
    據此轉移函數以及人體發聲腔道構造與大致的尺寸,設定合適的初始條件與邊界
    條件,將發聲腔道描述為由主腔體(main tract)、口腔(oral cavity)與鼻腔(nasal
    cavity)三個部分組合而成的三分支交界型態(three-branched model)。最後,我們
    使用簡單的多項式因式分解,搭配地毯式搜索最佳解的手段,嘗試在考慮鼻腔共
    振的情況下,進行口腔與鼻腔兩部份截面積的同時估測。我們以鼻音化與非鼻音
    化的母音作為測試對象,驗證聲學節管模型與逆散射解之合理性。


    Estimation of vocal tract area function (VTAF) from speech signals is an
    inverse scattering problem. The medical application of VTAF includes clinical
    analysis of aphasia and dysarthria, language training, and phoniatrics for the
    hearing-impaired. The estimation results of VTAF provide a visual feedback on
    auditory sense.
    According to Wakita's method, the filtering processes of the inverse digital filter
    in signal analysis and the acoustic tube model in physics are identical. It could be
    confirmed by the relationship between linear prediction coding and analysis lattice
    structure. The efficiency of calculation could be improved with the Levinson-Durbin
    recursive algorithm. Due to the restriction of all-pole transfer functions, the speech
    production was modeled by Wakita as follows: sound waves produced by the glottis
    pass from the throat to the oral cavity and then radiate at lips. One could notice that
    the procedure does not take the nasal resonance effect into consideration.
    In this thesis, we use an iterative procedure proposed by Schnell and Lacroix to
    obtain pole-zero type transfer functions. After that, we set up initial conditions and
    boundary conditions based on the typical size and shape of the human vocal tract.
    Last, with the assistance of factorization and exhaustive search, the vocal tract is
    described as a three-branched model which could be divided into the main tract, the
    oral cavity and the nasal cavity. Thus, we could estimate VTAF by considering the
    nasal resonance effect. We use nasalized vowels and non-nasalized vowels to verify
    the validity of the acoustic tube model and the solution of the inverse scattering
    problem.

    目次 摘要…………………………………………………………………………………….I Abstract………………………………………………………………………………..II 誌謝…………………………………………………………………………………..III 目次…………………………………………………………………………………….i 圖目次………………………………………………………………………………...iii 表目次………………………………………………………………………………....v 第一章 緒論…………………………………………………………………………1 1.1 發音構造與機制…………………………………………………………....1 1.2 文獻回顧……………………………………………………………………3 1.3 研究動機與方向……………………………………………………………6 1.4 章節大綱……………………………………………………………………6 第二章 語音分析模型與方法………………………………………………………7 2.1 發聲腔道分析方法[3]……………………………………………………...8 2.1.1 逆向濾波器的最佳解……………………………………………8 2.1.2 聲學節管模型…………………………………………………..11 2.1.3 反轉移函數的關聯性與其應用………………………………..15 2.2 極-零點模型與其係數估測……………………………………………..18 2.2.1 線性預估模型[1][29][7]………………………………………..18 2.2.2 極-零點模型[37]………………………………………….…..20 2.3 考慮鼻腔效應之極-零點模型分析……………………………………..23 2.3.1 逐步降階法[39]…………………………….…………………..23 2.3.2 最佳拆解與地毯式搜索……………………………….……….24 2.4 實驗流程與設定…………………………………………………………..27 2.4.1 音訊資料收集…………………………………………………..27 ii 2.4.2 音框化…………………………………………………………..29 2.4.3 初始與邊界條件設定…………………………………………..29 2.4.4 實驗流程圖……………………………………………………..30 第三章 結果與討論………………………………………………………………..31 3.1 母音分析結果………………………………………………………..31 3.1.1 VTAF 估測結果的合理性………………………………….…..31 3.1.2 口腔結果分析…………………………………………………..33 3.1.3 鼻腔結果分析…………………………………………………..38 3.1.4 一致性分析……………………………………………………..40 3.2 一般母音與鼻音化母音之比較結果………………………………..42 3.2.1 鼻音化程度……………………………………………………..42 3.2.2 例外分析………………………………………………………..44 第四章 結論與未來展望…………………………………………………………..47 4.1 結論…………………………………………………………………..47 4.2 未來展望……………………………………………………………..48 參考文獻…………………………………………………. …………………………49 iii 圖目次 圖1.1 人類的發音構造…………………………………………………………....1 圖1.2 聲源濾波理論示意圖………………………………………………………3 圖1.3 三分支交界的格狀架構[11]….……………………………………………5 圖2.1 語音分析模型………………………………………………………………7 圖2.2 與圖2.1 等效之語音分析模型…………………………………………….8 圖2.3 發聲腔道之聲學節管模型[3]……………………………………….……11 圖2.4 逆向數位濾波器之分析格狀結構[3]…………………………...…..……17 圖2.5 語音發聲模型…………………………………………………..…………18 圖2.6 線性預估模型分析流程圖……………………………………………..…19 圖2.7 極-零點模型與其係數估測流程圖[38]……………………………..…21 圖2.8 三分支交界型態………………………………………………………..…24 圖2.9 錄音環境與儀器配置…………………………………………………..…27 圖2.10 母音舌位圖……………………………………………………………..…28 圖2.11 實驗流程圖……………………………………………………………..…30 圖3.1 原始VTAF 估測結果範例………………………...………………………31 圖3.2 VTAF 估測結果盒狀圖示意範例….…………………………………..…31 圖3.3 使用Wakita’s method 之VTAF 估測結果…………………..…………...32 圖3.4 連續發出母音/a/-/u/-/i/之口鼻腔CSA 估測結果1…………………....…33 圖3.5 母音舌位的矢狀切面…………………………………………………..…34 圖3.6 連續發出母音/a/-/u/-/i/之口鼻腔CSA 估測結果2……………..……..…35 圖3.7 連續發出母音/a/-/u/-/i/之口鼻腔CSA 估測結果3………………..…..…35 圖3.8 連續發出母音/a/-/e/-/i/之口鼻腔CSA 估測結果1………………..…..…36 圖3.9 連續發出母音/a/-/e/-/i/之口鼻腔CSA 估測結果2…………………....…37 圖3.10 連續發出母音/a/-/e/-/i/之口鼻腔CSA估測結果3………………….....…37 iv 圖3.11 連續發出母音/a/-/o/-/u/之口鼻腔CSA估測結果1…………………...….38 圖3.12 連續發出母音/a/-/o/-/u/之口鼻腔CSA估測結果2……………………....39 圖3.13 連續發出母音/a/-/o/-/u/之口鼻腔CSA估測結果3………………………39 圖3.14 一般基頻發出母音/a/之VTAF 估測結果……………………………..….41 圖3.15 較高基頻發出母音/a/之VTAF 估測結果………………………………...41 v 表目次 表3.1 母音的平均共振峰………………………………………………………..40 表3.2 母音/a/與鼻音化母音/a/的r 值分析…………………….………………..42 表3.3 母音/i/與鼻音化母音/i/的r 值分析…………………….…………..……..43 表3.4 母音/e/與鼻音化母音/e/的r 值分析…………………….………………..44 表3.5 母音/o/與鼻音化母音/o/的r 值分析…………………….………………..45 表3.6 母音/u/與鼻音化母音/u/的r 值分析…………………….………………..45

    參考文獻
    [1] 王小川, 語音訊號處理, 修訂二版. 全華圖書, 2008.
    [2] C. T. Ferrand, Speech Science:An Integrated Approach to Theory and Clinical
    Practice, 1st ed. Pearson, 2001.
    [3] H. Wakita, “Direct estimation of the vocal tract shape by inverse filtering of
    acoustic speech waveforms,” IEEE Trans. Audio Electroacoust., vol. 21, no. 5,
    pp. 417–427, Oct. 1973.
    [4] D. J. Ertmer, R. E. Stark, and G. R. Karlan, “Real-time spectrographic displays
    in vowel production training with children who have profound hearing loss,”
    Am. J. Speech-Language Pathol., vol. 5, no. 4, p. 4, Nov. 1996.
    [5] S. G. Fletcher and A. Others, “Teaching vowels to profoundly
    hearing-impaired speakers using glossometry.,” J. Speech Hear. Res., vol. 34,
    no. 4, pp. 943–56, Nov. 1990.
    [6] S. G. Fletcher and A. Others, “Teaching consonants to profoundly
    hearing-impaired speakers using palatometry.,” J. Speech Hear. Res., vol. 34,
    no. 4, pp. 929–42, Nov. 1990.
    [7] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4,
    pp. 561–580, 1975.
    [8] N. Levinson, “The wiener rms (root mean square) error criterion in filter design
    and prediction,” J. Math. Phys, vol. 25, no. 4, pp. 261–278, 1947.
    [9] E. A. Robinson, Statistical communication and detection. New York: Hafner,
    1967.
    [10] J. Durbin, “The fitting of time-series models,” Rev. l’Institut Int. Stat. / Rev. Int.
    Stat. Inst., vol. 28, no. 3, pp. 233–243, Dec. 1960.
    [11] I.-T. Lim and B. G. Lee, “Lossless pole-zero modeling of speech signals,”
    IEEE Trans. Speech Audio Process., vol. 1, no. 3, pp. 269–276, Jul. 1993.
    [12] I.-T. Lim and B. G. Lee, “Lossy pole-zero modeling for speech signals,” IEEE
    Trans. Speech Audio Process., vol. 4, no. 2, pp. 81–88, Mar. 1996.
    50
    [13] I.-T. Lim and B. G. Lee, “A generalized vocal tract model for pole-zero type
    linear prediction (speech processing),” ICASSP-88., Int. Conf. Acoust. Speech,
    Signal Process., pp. 687–690, 1988.
    [14] A. Lacroix, “Improved vocal tract model for the analysis of nasal speech
    sounds,” 1996 IEEE Int. Conf. Acoust. Speech, Signal Process. Conf. Proc., vol.
    2, pp. 801–804, 1996.
    [15] J. Schroeter, “Techniques for estimating vocal-tract shapes from the speech
    signal,” IEEE Trans. Speech Audio Process., vol. 2, no. 1, pp. 133–150, 1994.
    [16] H. Kamata, H. Oka, and Y. Ishida, “Estimation of vocal tract transfer function
    considering the glottis open and close characteristics,” Proc. IEEE Pacific Rim
    Conf. Commun. Comput. Signal Process., vol. 1, pp. 137–140, 1993.
    [17] H. Kamata, K. Kawaguchi, Y. Ishida, and T. Honda, “Reconstruction of human
    voice using parallel structure transfer function and its estimation error,” IEEE
    Pacific Rim Conf. Commun. Comput. Signal Process. Proc., pp. 575–580,
    1995.
    [18] H. Deng, R. K. Ward, M. P. Beddoes, and M. Hodgson, “A new method for
    obtaining accurate estimates of vocal-tract filters and glottal waves from vowel
    sounds,” IEEE Trans. Audio, Speech Lang. Process., vol. 14, no. 2, pp.
    445–455, Mar. 2006.
    [19] R. K. Ward, M. P. Beddoes, and M. Hodgson, “Estimating vocal-tract area
    functions from vowel sound signals over closed glottal phases,” 2004 IEEE Int.
    Conf. Acoust. Speech, Signal Process., vol. 1, pp. I–589–92, 2004.
    [20] R. K. Ward, M. P. Beddoes, and M. Hodgson, “Effects of Glottal and Lip
    Boundary Conditions on Vocal-Tract Area Function Estimates from Speech
    Signals,” 2005 IEEE Int. Conf. Acoust. Speech, Signal Process., vol. 1, pp.
    901–904, 2005.
    [21] R. K. Ward, M. P. Beddoes, and D. O’Shaughnessy, “Obtaining LIP and
    Glottal Reflection Coefficients from Vowel Sounds,” 2006 IEEE Int. Conf.
    Acoust. Speed Signal Process. Proc., vol. 1, pp. I–373–I–376, 2006.
    51
    [22] H. Deng, R. Ward, and M. Beddoes, “Glottal waves via inverse filtering of
    vowel sounds.,” Conf. Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., vol. 7,
    pp. 7000–3, Jan. 2005.
    [23] J. Makhoul and R. Viswanathan, “Adaptive lattice methods for linear
    prediction,” ICASSP ’78. IEEE Int. Conf. Acoust. Speech, Signal Process., vol.
    3, pp. 83–86, 1978.
    [24] T. Carter, “Study of an adaptive lattice structure for linear prediction analysis
    of speech,” ICASSP ’78. IEEE Int. Conf. Acoust. Speech, Signal Process., vol.
    3, pp. 27–30, 1978.
    [25] J. Makhoul and L. Cosell, “Adaptive lattice analysis of speech,” IEEE Trans.
    Acoust., vol. 29, no. 3, pp. 654–659, Jun. 1981.
    [26] S. Dandapat and G. C. Ray, “Evaluation of vocal tract disorder using a variable
    step adaptive filter,” Proc. First Reg. Conf. IEEE Eng. Med. Biol. Soc. 14th
    Conf. Biomed. Eng. Soc. India. An Int. Meet., pp. 3/81–3/82, 1995.
    [27] A. Lacroix, “Pole-zero modeling of vocal tract for fricative sounds,” 1997
    IEEE Int. Conf. Acoust. Speech, Signal Process., vol. 3, pp. 1659–1662, 1997.
    [28] P. Kabal and R. P. Ramachandran, “The computation of line spectral
    frequencies using Chebyshev polynomials,” IEEE Trans. Acoust., vol. 34, no. 6,
    pp. 1419–1426, Dec. 1986.
    [29] J. D. Markel and A. H. Gray, Linear Prediction of Speech. New York: Springer
    Verlag, 1976.
    [30] A. V. Oppenheim & R. W. Schafer, Discrete-Time Signal Processing
    International Edition, 3rd ed. Pearson, 2009.
    [31] S. M. Kay and S. L. Marple, “Spectrum analysis A modern perspective,” Proc.
    IEEE, vol. 69, no. 11, pp. 1380–1419, 1981.
    [32] J. P. Burg, “Maximum entropy spectral analysis,” Standford Univ., Stanford,
    CA, 1975.
    [33] J. Makhoul, “Stable and efficient lattice methods for linear prediction,” IEEE
    Trans. Acoust., vol. 25, no. 5, pp. 423–428, Oct. 1977.
    52
    [34] P. M. Morse and K. U. Ingard, Theoretical Acoustics. New York:Mcgraw-Hill,
    1968, pp. 244–252.
    [35] B. S. Atal, “Speech Analysis and Synthesis by Linear Prediction of the Speech
    Wave,” J. Acoust. Soc. Am., vol. 50, no. 2B, p. 637, 1971.
    [36] F. Itakura and S. Saito, “Digital Filtering Techniques for Speech Analysis and
    Synthesis,” Proc.7th Int. Conf. Acoust., 1971.
    [37] K. Schnell and A. Lacroix, “Pole zero estimation from speech signals by an
    iterative procedure,” 2001 IEEE Int. Conf. Acoust. Speech, Signal Process.
    Proc. (Cat. No.01CH37221), vol. 1, pp. 109–112, 2001.
    [38] H.-K. Huang, Y.-W. Liu, and R. P.-Y. Chiang, “Detection of obstructive sleep
    apnea by estimation of oral and nasal cavity cross-section areas from acoustic
    recordings of snore,” Proc. Meet. Acoust., vol. 19, no. 1, pp. 060172–060172,
    Jun. 2013.
    [39] J. O. Smith, “Introduction to Digital Filters with Audio Applications.” [Online].
    Available: https://ccrma.stanford.edu/~jos/filters/. [Accessed: 31-Jul-2014].
    [40] K. S. Nataraj, P. C. Pandey, and M. S. Shah, “Improving the consistency of
    vocal tract shape estimation,” 2011 Natl. Conf. Commun., pp. 1–5, Jan. 2011.
    [41] J. L. Flanagan, Speech analysis synthesis and perception, 3rd ed. Springer
    Verlag, 2008.
    [42] J. W. Tukey, Exploratory Data Analysis. Addison-Wesley, 1977.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE