語者/歌者識別｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	楊壁如 Pi-Ju Yang
論文名稱：	語者/歌者識別 Speaker/Singer Identification
指導教授：	張智星 Jyh-Shing Roger Jang
口試委員:
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2000
畢業學年度：	88
語文別：	中文
中文關鍵詞：	語者識別、歌者識別、語者辨識、k-最近鄰居法則、特徵參數擷取、梅爾刻度式倒頻譜、語音訊號、線性識別分析
外文關鍵詞：	speaker identification, singer identification, speaker recognition, k-NN rule, feature extraction, mel-frequency cepstrum, speech singal, linear discriminant analysis
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文主要來辨識說話者和唱歌者的身份。一個人不論是說話或唱歌，能讓電腦正確地識別出身份最重要的因素就是特徵參數的擷取，也就是找出一組可代表每位語者/歌者聲音特性的特徵參數，且不易受環境干擾，具有強健性，在不同的使用者和背景上都能維持一定的辨識效能。
選取重要的特徵參數後，接著進行語者/歌者識別，我們主要使用k-最近鄰居法則（k-nearest neighbor rule）來作分類的依據，但計算時間很長，所以我們在語者模型訓練上，採用了一些縮減資料的方法。資料縮減主要分為兩部份：降低資料量和降低資料維度。在降低資料量上，我們使用了一些向量量化的方法，例如：k-means、模糊c-means和學習向量量化，而在降低資料維度上，嘗試了線性識別分析法。

本論文嘗試用MATLAB語言寫出一套語者/歌者識別系統，其中特徵參數的擷取是在MATLAB中的Simulink環境下所進行的，其輸入方式是採用圖形輸入方式，因此只要了解訊號流程圖，就可以進行分析。

第一章       緒論
1  研究動機

2  語者辨認系統概要與前人的研究

3  研究方向

4  章節概要

第二章       語者/歌者特徵參數的擷取

1  前言

2  語音訊號前處理

3  語者/歌者特徵參數擷取方法

3.1  線性預估係數導出的倒頻譜參數

3.2  梅爾刻度式倒頻譜參數

3.3  明亮度

3.4  過零率

3.5  基頻

第三章       分類法

1  k-最近鄰居分類法

2  向量量化分類法

2.1  前言

2.2  k-means分群法

2.3  模糊c-means演算法

2.4  學習向量量化演算法

3    高斯混合模型分類法

3.1  模型描述

3.2  最大可能性的參數預估

3.3  語者識別

第四章       資料縮減

1  前言

2  線性識別分析

第五章       語者/歌者識別實驗

1  語者識別實驗

1.1  語者資料庫

1.2  使用k-最近鄰居的辨識率

1.3  不同特徵參數的影響

1.4  不同資料量的比較

1.5  不同資料維度的比較

1.6  向量量化方法的比較

2  歌者識別實驗

2.1  前言

2.2  歌者資料庫

2.3  使用k-最近鄰居的辨識率

2.4  不同特徵參數的影響

2.5  新特徵參數的影響

第六章       結論

參考文獻

[1] R. Peacocke, D. Graf, “ An introduction to speech and speaker recognition, ” IEEE Computer, pp26-33, 1990.
[2] D. O’Shaughnessy, “Speaker recognition,” IEEE ASSP Mag., pp.4-`7, Oct. 1986.
[3] G. Doddington, ”Speaker recognition – Identifying people by their voices,” Proc. IEEE, vol. 73, pp. 1651-1664, 1985.
[4] F. Soong et al., “A vector quantization approach to speaker recognition,” in Proc. IEEE ICASSP, 1985, pp.387-390.
[5] F. K. Soong, A. E. Rosenberg, L. R. Rabiner, and B. H. Juang, “A vector quantization approach to speaker recognition,” in Proc. IEEE ICASSP’85, pp. 387-390.
[6] Douglas A. Reynolds, Richard C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions Speech and Audio Processing, vol. 3, no1, pp72-83, January 1995.
[7] 許世俊,”用於高斯混合模型語者辨認之區別式訓練方法”,國立清華大學碩士論文,中華民國八十五年六月
[8] L. Higgins, L. G. Bahler, and J. E. Porter, “Voice identification using nearest-neighbor distance measure,” in Proc. ICASSP, 1993, pp. 375-378.
[9] G. Velius, “Variants of cepstrum based speaker identify verification”, in Proc. ICASSP, pp. 583-586, 1988.
[10] B. H. Juang and L. Rabiner, “Fundamental of speech recognition,” Prentice Hall, New Jersey, 1993
[11] R. J. Mammone, X. Zhang, and R. P. Ramachandran, “Robust speaker recognition: A feature-based approach,” IEEE Signal Processing Mag., vol. 13, pp. 58-71, 1996.
[12] Z. X. Yuan, B. L. Xu, and C. Z. Yu, “Binary quantization of feature vectors for robust text-independent speaker identification,” IEEE Tran. of Speech and Audio Processing, vol. 7, no. 1, Jan 1990.
[13] T. Blum et al., "Audio Databases with Content-Based Retrieval," workshop on Intelligent Multimedia Information Retrieval, 1995 Int'l Joint Conf. on Artificial Intelligence.
[14] D. Keislar et al., "Audio Databases with Content-Based Retrieval," Proc. Int'l Computer Music Conference 1995, International Computer Music Association, San Francisco, 1995, pp. 199-202.
[15] H. Zhang, B. Furht, and S. Smoliar, Video and Image Processing in Multimedia Systems, Kluwer Academic Publishers, Boston, 1995.
[16] S. Pruzansky, “ Pattern-matching procedure for automatic talker recognition,” J. Acoust. Soc. Amer., vol. 35, pp. 354-358, March 1963.
[17] Jim C. Bezdek, "Fuzzy mathematics In pattern classfication", PhD thesis, Applied Math. Center, Cornell University,Ithaca, 1973
[18] T. Kohonen, “The self-organization map,” Proceedings of the IEEE, vol. 78, No. 9, September 1990.
[19] J.-S. R. Jang, C.-T. Sun, E. Mizutani, “Neural-Fuzzy and Soft Computing”, 1997
[20] Bishop, Christopher M, “Neural networks for pattern recognition”, 1995.
[21] D. H. Foley, J. W. Sammon, “An optimal set of discriminant vectors,’ IEEE Trans. On Computer, vol. 24, 1975, pp. 281-289.
[22] J. Duchene and S. Leclercq, “An optimal transformation for discriminant and principal component analysis”, IEEE Trans. PAMI, vol. 10, pp.978-983, 1988.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文