研究生: |
蘇峻慶 Chun-Ching Su |
---|---|
論文名稱: |
錄音資料中語者切割與分群方法之研究 Speaker Segmentation and Clustering in Sound-Recording Data |
指導教授: |
王小川
Prof. Hsiao-Chuan Wang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 語音切割 、語音分群 、語者轉換點偵測 |
外文關鍵詞: | Speaker Segmentation, Speaker Clustering, Speaker Change Detection |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,主要在探討錄音資料中之語者切割與分群,因為在很多場合中所錄製的語音信號,其內容都是包含一個人以上的。因此如何在一段語音信號中,把不同說話者所發出的語音信號分割出來,並將同一說話者所說的音段分在一起,是本論文的主要目的。
在語者切割方面,本論文所使用之方法有三個步驟,第一步是利用貝氏資訊準約略找出語者轉換點大概的位置,第二步再利用交叉偵測法作精確化,第三步再確認是否為轉換點。在實驗上顯示廣義概似比偵測法偵測轉換點花費的時間少但偵測效能比較差,而貝氏資訊準則偵測法是偵測效能好但偵測轉換點花費的時間相當長,本方法花費的時間雖比廣義概似比偵測法稍長,但比貝氏資訊準則偵測法卻短很多,且偵測效能為三者之冠,可以說是同時擁有廣義概似比偵測法運算量少的優點及貝氏資訊準則偵測法高準確率的優點。在語者分群方面,群集之語者模型採用高斯混合模型,音段與每個群集模型作分群之最大概似法估測,找出最靠近之群集,然後再利用一門檻值判斷要合併或分新群。實驗結果顯示增加高斯混合數對分群的結果是有幫助的,而高斯混合數在等於16時,其結果已達最好,再增加混合數分群效能也不再上升。而另一實驗結果也顯示要分群的音段群中包含語者數愈多,其整體分群效能愈低。
參考文獻
[1] S. Chen and P. Gopalakrishnan, “Speaker, environment and channel change detection and clustering via the Bayesian information criterion “, in DARPA speech recognition workshop, 1998
[2] 詹順凱,”在多語者環境下之語者分割與語言辨認研究”, 電機工程研究所,國立清華大學,中華民國九十一年六月。
[3] Moh, Y., Nguyen, P., and Junqua, J.-C.,”Towards domain independent Speaker clustering”, Proc. ICASSP2003.
[4] John R. Deller, Jr., John G. Proakis, John H.L. Hansen, “Discrete-Time Processing of Speech Signals”, 1987.
[5] G. Schwarz,” Estimating the dimension of a model”, The Annals of Statistics, vol. 6, pp461-464, 1978.
[6] M. Cettolo and M. Federico,”Model Selection criteria for acoustic Segmentation,” Proc. of ISCA ITRW ASR, Paris France,2000.
[7] H.Gish, M.-H Siu, R. Rohlicek,”Segregation of speakers for speech Recognition and speaker identification “, IEEE Signal Processing Magazine, pp 18-32,Oct. 1991
[8] J.F. Bonastre, P. Delacourt, C. Fredouille,” A Speaker Tracking System Based On Speaker Turn Detection For NIST Evaluation”, ICASSP2000.
[9] Lie Lu, Hong-Jiang Zhang, and Hao Jiang, ”Content Analysis for Audio Classification and segmentation”,IEEE transactions on speech and audio processing, Vol.10 No.7 pp.504-516,2002.
[10] S. S. Cheng and H. M. Wang,”A sequential metric-based audio Segmentation method via the Bayesian Information Criterion,” Proceedings of Eurospeech 2003.
[11] A. Adami, S. Kajarekar and H. Hermansky,”A new speaker change detection method for two-speaker segmentation”,ICASSP2002.
[12] H. Gish, N. Schmidt, R. Schwartz, “Text-independent speaker identification”, IEEE Signal Processing Magazine, pp18-21, Oct.1994
[13] D. Reynolds and R. Rose, “Robust test-independent speaker identify -cation using Gaussian Mixture Speaker Models,”IEEE Transactions on Speecn and Audio Processing, Vol.3,No.1, 1995.
[14] Y. Linde, A. Buzo, R.M. Gray,”An Algorithm for the Vector Quantiz -er Design”,IEEE Transaction on Communication, Vol.28, no.4, pp. 84-59, Jan. 1980
[15] J. Ajmera, I. McCowan, and H. Bourlard,”Robust Speaker Change Detection”, IEEE Signal Processing Letters, pp. 649-651, Vol. 11, No. 8,August.2004
[16] I. Lapidot “SOM as Likelihood Estimator for Speaker Clustering”, Eurospeech 2003.