研究生: |
楊壁如 Pi-Ju Yang |
---|---|
論文名稱: |
語者/歌者識別 Speaker/Singer Identification |
指導教授: |
張智星
Jyh-Shing Roger Jang |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2000 |
畢業學年度: | 88 |
語文別: | 中文 |
中文關鍵詞: | 語者識別 、歌者識別 、語者辨識 、k-最近鄰居法則 、特徵參數擷取 、梅爾刻度式倒頻譜 、語音訊號 、線性識別分析 |
外文關鍵詞: | speaker identification, singer identification, speaker recognition, k-NN rule, feature extraction, mel-frequency cepstrum, speech singal, linear discriminant analysis |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主要來辨識說話者和唱歌者的身份。一個人不論是說話或唱歌,能讓電腦正確地識別出身份最重要的因素就是特徵參數的擷取,也就是找出一組可代表每位語者/歌者聲音特性的特徵參數,且不易受環境干擾,具有強健性,在不同的使用者和背景上都能維持一定的辨識效能。
選取重要的特徵參數後,接著進行語者/歌者識別,我們主要使用k-最近鄰居法則(k-nearest neighbor rule)來作分類的依據,但計算時間很長,所以我們在語者模型訓練上,採用了一些縮減資料的方法。資料縮減主要分為兩部份:降低資料量和降低資料維度。在降低資料量上,我們使用了一些向量量化的方法,例如:k-means、模糊c-means和學習向量量化,而在降低資料維度上,嘗試了線性識別分析法。
本論文嘗試用MATLAB語言寫出一套語者/歌者識別系統,其中特徵參數的擷取是在MATLAB中的Simulink環境下所進行的,其輸入方式是採用圖形輸入方式,因此只要了解訊號流程圖,就可以進行分析。
[1] R. Peacocke, D. Graf, “ An introduction to speech and speaker recognition, ” IEEE Computer, pp26-33, 1990.
[2] D. O’Shaughnessy, “Speaker recognition,” IEEE ASSP Mag., pp.4-`7, Oct. 1986.
[3] G. Doddington, ”Speaker recognition – Identifying people by their voices,” Proc. IEEE, vol. 73, pp. 1651-1664, 1985.
[4] F. Soong et al., “A vector quantization approach to speaker recognition,” in Proc. IEEE ICASSP, 1985, pp.387-390.
[5] F. K. Soong, A. E. Rosenberg, L. R. Rabiner, and B. H. Juang, “A vector quantization approach to speaker recognition,” in Proc. IEEE ICASSP’85, pp. 387-390.
[6] Douglas A. Reynolds, Richard C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions Speech and Audio Processing, vol. 3, no1, pp72-83, January 1995.
[7] 許世俊,”用於高斯混合模型語者辨認之區別式訓練方法”,國立清華大學碩士論文,中華民國八十五年六月
[8] L. Higgins, L. G. Bahler, and J. E. Porter, “Voice identification using nearest-neighbor distance measure,” in Proc. ICASSP, 1993, pp. 375-378.
[9] G. Velius, “Variants of cepstrum based speaker identify verification”, in Proc. ICASSP, pp. 583-586, 1988.
[10] B. H. Juang and L. Rabiner, “Fundamental of speech recognition,” Prentice Hall, New Jersey, 1993
[11] R. J. Mammone, X. Zhang, and R. P. Ramachandran, “Robust speaker recognition: A feature-based approach,” IEEE Signal Processing Mag., vol. 13, pp. 58-71, 1996.
[12] Z. X. Yuan, B. L. Xu, and C. Z. Yu, “Binary quantization of feature vectors for robust text-independent speaker identification,” IEEE Tran. of Speech and Audio Processing, vol. 7, no. 1, Jan 1990.
[13] T. Blum et al., "Audio Databases with Content-Based Retrieval," workshop on Intelligent Multimedia Information Retrieval, 1995 Int'l Joint Conf. on Artificial Intelligence.
[14] D. Keislar et al., "Audio Databases with Content-Based Retrieval," Proc. Int'l Computer Music Conference 1995, International Computer Music Association, San Francisco, 1995, pp. 199-202.
[15] H. Zhang, B. Furht, and S. Smoliar, Video and Image Processing in Multimedia Systems, Kluwer Academic Publishers, Boston, 1995.
[16] S. Pruzansky, “ Pattern-matching procedure for automatic talker recognition,” J. Acoust. Soc. Amer., vol. 35, pp. 354-358, March 1963.
[17] Jim C. Bezdek, "Fuzzy mathematics In pattern classfication", PhD thesis, Applied Math. Center, Cornell University,Ithaca, 1973
[18] T. Kohonen, “The self-organization map,” Proceedings of the IEEE, vol. 78, No. 9, September 1990.
[19] J.-S. R. Jang, C.-T. Sun, E. Mizutani, “Neural-Fuzzy and Soft Computing”, 1997
[20] Bishop, Christopher M, “Neural networks for pattern recognition”, 1995.
[21] D. H. Foley, J. W. Sammon, “An optimal set of discriminant vectors,’ IEEE Trans. On Computer, vol. 24, 1975, pp. 281-289.
[22] J. Duchene and S. Leclercq, “An optimal transformation for discriminant and principal component analysis”, IEEE Trans. PAMI, vol. 10, pp.978-983, 1988.