簡易檢索 / 詳目顯示

研究生: 李彥農
Yen-nung Lee
論文名稱: 以循序機率比值檢測作語者確認之研究
A Study on the Speaker Verification based on Sequential Probability Ratio Testing
指導教授: 王小川
Hsiao-Chuan Wang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 60
中文關鍵詞: 語者確認循序機率比值檢定特徵參數正規化分數正規化
外文關鍵詞: Speaker verification, Sequential probability ratio test, Feature normalization, Score normalization
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 文字不特定(Text-independent)之語者確認(Speaker verification)乃是語音處理中重要的研究項目之一。然而,現今語者確認系統多數是要求使用者唸完一段文字並錄音後,再對這段語音做語者確認的判斷,使用者必須在錄完音的最後才能得知系統判斷的結果。本論文之目的在於設計一個可以在使用者錄音的同時做判斷的語者確認系統,導入循序機率比值檢定(Sequential probability ratio test, SPRT)的檢定法,使系統能在語音長度足以判斷時,即刻顯示出語者確認的結果,而不需要使用者必須錄完整段語句後才能做判斷,以此增加使用者在操作系統時的效率。這篇論文設計之SPRT語者確認系統,以GMM-UBM為基礎之語者確認做為設計平台,設計過程加入了可增加使用彈性之簡易效能調整法,讓SPRT語者確認系統可應用在不同的情況下,例如建構高安全性之身分認證系統。之後並在SPRT語者確認系統下嘗試納入數種近年來使用廣泛且具代表性之特徵參數正規化與分數正規化法,以期在這些新技術的加持下,可增加語者確認系統對使用環境變異的強健性,最後以NIST 2002 one-speaker detection task效能評估平台來對上述系統做相關之實驗。實驗結果顯示出SPRT語者確認系統可在平均音框數相同之情況下,比原始系統達到更佳的效能,並驗證在不同之特徵參數正規化與分數正規化方法的結合下,SPRT語者確認系統仍能保留這些正規化法之特性。


    第一章 導論.............................................................1 1.1 語者確認系統概述...............................................1 1.2 研究動機.......................................................2 1.3 測試平台介紹...................................................3 1.4 章節概要.......................................................5 第二章 語者確認系統....................................................6 2.1 相似度比值檢定.................................................6 2.2 高斯混合模型...................................................8 2.3 GMM-UBM語者確認系統...........................................10 2.3.1 取得特徵參數..............................................10 2.3.1.1 預強調..............................................11 2.3.1.2 擷取音框............................................11 2.3.1.3 聲音偵測............................................12 2.3.1.4 梅爾頻率倒頻譜係數..................................12 2.3.1.5 特徵參數正規化......................................13 2.3.2 通用背景模型..............................................13 2.3.3 語者模型的調適............................................14 2.3.4 計算分數..................................................16 2.4 建構語者確認系統..............................................17 第三章 特徵參數與分數正規化.........................................18 3.1 特徵參數正規化................................................18 3.1.1 倒頻譜平均值刪減法........................................18 3.1.2 平均值與變異數正規化......................................19 3.1.3 Feature warping............................................19 3.2 分數正規化....................................................21 3.2.1 Zero normalization.........................................22 3.2.2 Test normalization.........................................23 第四章 循序機率比值檢定..............................................25 4.1 背景理論......................................................25 4.2 臨界值的估算.................................................26 4.3 平均觀察數需求的估計..........................................27 4.4 SPRT在語者確認系統上的應用....................................28 4.4.1 語者確認系統的SPRT........................................29 4.4.2 臨界值與觀察數............................................30 4.4.3 調整效能..................................................31 第五章 實驗與討論.....................................................37 5.1 基礎系統......................................................37 5.1.1 實驗語料..................................................37 5.1.2 系統參數設定..............................................38 5.1.3 效能評估方法..............................................39 5.2 基礎系統的效能................................................40 5.2.1 特徵參數正規化............................................40 5.2.2 分數正規化................................................42 5.2.3 綜合比較..................................................44 5.3 各正規化法與SPRT..............................................44 5.3.1 特徵參數正規化............................................45 5.3.2 分數正規化................................................49 5.3.3 綜合比較..................................................52 5.4 音框需求數....................................................55 第六章 結論............................................................58 參考文獻...............................................................59

    [1] D.A. Reynolds, “Channel robust speaker verification via feature mapping,” Proc. ICASSP, pp.1-53-56, 2003.
    [2] J. Pelecanos and S. Sridharan, “Feature warping for robust speaker verification,” Proc. ISCA Workshop on Speaker Recognition – 2001: A Speaker Odyssey, June 2001.
    [3] R. Teunen, B. Shahsahani, and L. Heck, “A model-based transformational approach to robust speaker recognition,” Proc. ICSLP, 2000.
    [4] P. Kenny and P. Dumouchel, “Experiments in speaker verification using factor analysis likelihood ratios,” Proc. Odyssey 2004 Speaker and Language Recognition Workshop, pp.219-226, 2004.
    [5] R. Vogt and S. Sridharan, “Experiments in session variability modeling for speaker verification,” Proc. ICASSP, pp.897-900, 2006.
    [6] R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, “Score normalization for text-independent speaker verification systems,” Digital Signal Processing, vol. 10, pp.42-54, 2000.
    [7] M. A. Lund, ”A robust sequential test for text-independent speaker verification,”J. Acoust. Soc. Am.,vol99, no.1, pp.609-621, Jan. 1996.
    [8] H. Noda ad E. Kawaguchi, “Adaptive speaker identification using sequential probability ratio test,” 15th International Conference on Pattern Recognition (ICPR00), vol3, Sept. 2000.
    [9] Mark Przybocki, Alvin Martin, “NIST speaker recognition evaluation chronicles,” Proc. Odyssey 2004 Speaker and Language Recognition Workshop, 2004.
    [10] “The NIST year 2002 speaker recognition evaluation plan,” 2002, http://www.nist.gov/speech/tests/spk/2002/doc/.
    [11] F. Korkmazskiy, B.H. Juang, “Discriminative adaptation for speaker verification,”ICSLP 96’,1996.
    [12] Barras, C., Gauvain, J.-L, “Feature and score normalization for robust speaker verification of cellular data”, ICASSP ’03. 2003, Vol.2, 6-10 April 2003.
    [13] D. Reynolds, T. Quatieri, and R. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19-41, 2000.
    [14] Fukunaga, Keinosuke, Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, 1990.
    [15] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, Spoken language processing: A guide to theory, algorithm and system development, Prentice Hall PTR, pp.166-170, 2001.
    [16] A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum likelihood from imcomplete data via the EM algorithm,” J. Roy. Stat. Soc., vol.39, no.1, pp. 1-38, 1977.
    [17] S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 4, pp. 357-366, Aug. 1980.
    [18] 王小川編著, “語音訊號處理”,全華科技圖書股份有限公司, 2004.
    [19] C.P. Chen and J. Bilmes, “MVA processing of speech features,” IEEE Trans. On Speech and Audio Processing, vol. 15, no. 1, Jan. 2007.
    [20] J.L. Gauvain and C.H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech and Audio Process. 2, pp.291-298, 1994.
    [21] A. Wald, Sequential Analysis, Wiley and Sons, New York, 1947.
    [22] Don Johnson, Sequential Hypothesis Testing, http://cnx.org/content/m11242/latest/.
    [23] A. Martin, et.al., “The DET Curve in assessment of detection task performance,” Proc. Eurospeech 97’, vol.4, pp.1899-1903, 1997.
    [24] B. Xiang, U. Chaudhari, J. Navrátil, G. Ramaswamy, and R. Gopinath, “Short-time gaussianization for robust speaker verification,” Proc. ICASSP 2002, vol.1, pp.681-684.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE