研究生: |
李彥農 Yen-nung Lee |
---|---|
論文名稱: |
以循序機率比值檢測作語者確認之研究 A Study on the Speaker Verification based on Sequential Probability Ratio Testing |
指導教授: |
王小川
Hsiao-Chuan Wang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 60 |
中文關鍵詞: | 語者確認 、循序機率比值檢定 、特徵參數正規化 、分數正規化 |
外文關鍵詞: | Speaker verification, Sequential probability ratio test, Feature normalization, Score normalization |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
文字不特定(Text-independent)之語者確認(Speaker verification)乃是語音處理中重要的研究項目之一。然而,現今語者確認系統多數是要求使用者唸完一段文字並錄音後,再對這段語音做語者確認的判斷,使用者必須在錄完音的最後才能得知系統判斷的結果。本論文之目的在於設計一個可以在使用者錄音的同時做判斷的語者確認系統,導入循序機率比值檢定(Sequential probability ratio test, SPRT)的檢定法,使系統能在語音長度足以判斷時,即刻顯示出語者確認的結果,而不需要使用者必須錄完整段語句後才能做判斷,以此增加使用者在操作系統時的效率。這篇論文設計之SPRT語者確認系統,以GMM-UBM為基礎之語者確認做為設計平台,設計過程加入了可增加使用彈性之簡易效能調整法,讓SPRT語者確認系統可應用在不同的情況下,例如建構高安全性之身分認證系統。之後並在SPRT語者確認系統下嘗試納入數種近年來使用廣泛且具代表性之特徵參數正規化與分數正規化法,以期在這些新技術的加持下,可增加語者確認系統對使用環境變異的強健性,最後以NIST 2002 one-speaker detection task效能評估平台來對上述系統做相關之實驗。實驗結果顯示出SPRT語者確認系統可在平均音框數相同之情況下,比原始系統達到更佳的效能,並驗證在不同之特徵參數正規化與分數正規化方法的結合下,SPRT語者確認系統仍能保留這些正規化法之特性。
[1] D.A. Reynolds, “Channel robust speaker verification via feature mapping,” Proc. ICASSP, pp.1-53-56, 2003.
[2] J. Pelecanos and S. Sridharan, “Feature warping for robust speaker verification,” Proc. ISCA Workshop on Speaker Recognition – 2001: A Speaker Odyssey, June 2001.
[3] R. Teunen, B. Shahsahani, and L. Heck, “A model-based transformational approach to robust speaker recognition,” Proc. ICSLP, 2000.
[4] P. Kenny and P. Dumouchel, “Experiments in speaker verification using factor analysis likelihood ratios,” Proc. Odyssey 2004 Speaker and Language Recognition Workshop, pp.219-226, 2004.
[5] R. Vogt and S. Sridharan, “Experiments in session variability modeling for speaker verification,” Proc. ICASSP, pp.897-900, 2006.
[6] R. Auckenthaler, M. Carey, and H. Lloyd-Thomas, “Score normalization for text-independent speaker verification systems,” Digital Signal Processing, vol. 10, pp.42-54, 2000.
[7] M. A. Lund, ”A robust sequential test for text-independent speaker verification,”J. Acoust. Soc. Am.,vol99, no.1, pp.609-621, Jan. 1996.
[8] H. Noda ad E. Kawaguchi, “Adaptive speaker identification using sequential probability ratio test,” 15th International Conference on Pattern Recognition (ICPR00), vol3, Sept. 2000.
[9] Mark Przybocki, Alvin Martin, “NIST speaker recognition evaluation chronicles,” Proc. Odyssey 2004 Speaker and Language Recognition Workshop, 2004.
[10] “The NIST year 2002 speaker recognition evaluation plan,” 2002, http://www.nist.gov/speech/tests/spk/2002/doc/.
[11] F. Korkmazskiy, B.H. Juang, “Discriminative adaptation for speaker verification,”ICSLP 96’,1996.
[12] Barras, C., Gauvain, J.-L, “Feature and score normalization for robust speaker verification of cellular data”, ICASSP ’03. 2003, Vol.2, 6-10 April 2003.
[13] D. Reynolds, T. Quatieri, and R. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19-41, 2000.
[14] Fukunaga, Keinosuke, Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, 1990.
[15] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, Spoken language processing: A guide to theory, algorithm and system development, Prentice Hall PTR, pp.166-170, 2001.
[16] A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum likelihood from imcomplete data via the EM algorithm,” J. Roy. Stat. Soc., vol.39, no.1, pp. 1-38, 1977.
[17] S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 4, pp. 357-366, Aug. 1980.
[18] 王小川編著, “語音訊號處理”,全華科技圖書股份有限公司, 2004.
[19] C.P. Chen and J. Bilmes, “MVA processing of speech features,” IEEE Trans. On Speech and Audio Processing, vol. 15, no. 1, Jan. 2007.
[20] J.L. Gauvain and C.H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Trans. Speech and Audio Process. 2, pp.291-298, 1994.
[21] A. Wald, Sequential Analysis, Wiley and Sons, New York, 1947.
[22] Don Johnson, Sequential Hypothesis Testing, http://cnx.org/content/m11242/latest/.
[23] A. Martin, et.al., “The DET Curve in assessment of detection task performance,” Proc. Eurospeech 97’, vol.4, pp.1899-1903, 1997.
[24] B. Xiang, U. Chaudhari, J. Navrátil, G. Ramaswamy, and R. Gopinath, “Short-time gaussianization for robust speaker verification,” Proc. ICASSP 2002, vol.1, pp.681-684.