研究生: |
劉賾暵 Liu, Tse-Han |
---|---|
論文名稱: |
結合韻律與頻譜資訊的語言辨認 Language Identification based on the Combinationof Rhythmic and Spectral Information |
指導教授: |
王小川
Wang, Hsiao-Chuan |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 61 |
中文關鍵詞: | 自動化語言辨認 、韻律資訊 、頻譜資訊 、虛擬音節 、序列核函數 、支持向量機 、語言相關權重計算 |
外文關鍵詞: | automatic language identification, rhythmic information, spectral information, pseudo syllable, sequence kernel, support vector machine, language dependent weighting |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於交通科技的發達使得國與國之間的距離越來越縮短,人類的生活圈擴大至世界各地,產生了一些因為語言不同所造成的問題。以緊急事件的求救電話為例,就很有可能因為接線員不能快速的了解求救人的語言而造成悲劇的產生。然而一個人所能了解的語言有限,若我們能夠先用機器辨認出求救人的語言,再轉至同語言的服務人員便能減少悲劇的發生。
本論文研究主要處理的問題為不需要語音標記的自動化語言辨認,藉由結合韻律資訊與頻譜資訊的方法,提高語言辨識的效果。韻律系統擷取,以虛擬音節長度結合二連文法的方式來建立語言模型。頻譜系統擷取位移差分化倒頻譜係數,利用序列核函數的支持向量機分類器建立模型。後端處理的部分,使用語言相關權重計算方法,將兩個系統的輸出作結合,使得不同語言模型以及不同系統之間能夠有更柔性的計算方式。
實驗部分以不同長度的句子來比較。單使用韻律系統時,長句辨識率為89%、中句辨識率為59%、短句辨識率為35%。單使用頻譜系統時,長句辨識率為96%、中句辨識率為88%、短句辨識率為59%。經過後端處理結合後,長句辨識率為100%,比單使用頻譜系統改善了4%,中句辨識率為94%,也有6%的成長,在短句的部分則為62%,比起未結合前改善了3%。本論文所提出的方法,對於不需要標記語料的自動化語言辨認,在中長句子上都有很好的表現。利用語言相關權重計算來結合韻律與頻譜的資訊,也確實能夠在辨識率上有所提升。
[1] J.T. Foil, ”Language identification using noisy speech,” in IEEE ICASSP, pp. 861-864, Tokyo, 1986.
[2] F.J. Goodman, A.F. Martin, and R.E. Wohlford, “Improved automatic language identification in noisy speech,” in IEEE ICASSP pp. 528-531, 1989.
[3] M.A. Zissman, “Automatic language identification using Gaussian mixture and hidden Markov models,” in IEEE ICASSP 1993, Vol.2, pp. 399-402, 1993.
[4] Y.K. Muthusamy, N. Jain, and R.A Cole, ”Perceptual benchmarks for automatic language identification, ” in IEEE ICASSP, Vol.1, pp. 333-336, 1994.
[5] S. Itahashi, J.X Zhou, and K. Tanaka, “Spoken language discrimination using speech fundamental frequency,” in Proc. Of ICSLP’94, Yokohama, Japan, 1994, pp. 1899-1902.
[6] C.Y. Lin, and H.C. Wang, “Language identification using pitch information,” in Proc. ICASSP 2005, Philadelphia, USA, Vol. 1, 2005, pp. 601-604.
[7] C.Y. Lin, and H.C. Wang, “Language identification using pitch information in the ergodic Markov model,” to appear in Proc. ICASSP 2006.
[8] F. Ramus, and J. Mehler, Language identification with suprasegmental cues, A study based on speech resynthesis in Journal of Acoustic Society of America, 105(1), pp.512-521, 1999.
[9] M.A. Zissman, and E.Singer, “Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling,” in IEEE ICASSP 1994, Vol. 1, pp.305-308, 1994.
[10] Ekaterina Timoshenko, Harald Hoege, “Using Speech Rhythm for Acoustic Language Identification ”, in Interspeech, pp.182-185, 2007.
[11] Campbell, W. M., “Gerneralized linear discriminant sequence kernels for speaker recognition,” International Conference on Acoustics, Speech, and Signal Processing 2002, pp. I-161-I-164.
[12] Wenlin Zhang, Bicheng Li , Dan Qu, Bingxi Wang, “Automatic language identification using support vector machines,” in IEEE ICASSP 2006 Proceedings.
[13] J. Louradour, “A new sequence kernel and its application to speaker verification ,” Irit research report, 2005.
[14] Farinas, J. and Pellegrino, F., “Automatic rhythm modeling for language identification,” Proc. Eurospeech Scandinavia, 2001.
[15] Bo Yin, Eliathamby Ambikairajah, Fang Chen, “A novel weighting technique for combining likekihood scores in language identification systems,” in IEEE ICICS 2007.
[16] P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, D. A. Reynolds, and J. R. Deller, Jr., “Approaches to language identification using Gaussian mixture models and shifted delta cepstral features,” in Proc. International Conference on Spoken Language Processing, 2002, pp. 89-92.
[17] Nello Cristianini and John Shawe-Taylor, Support Vector Machines, Cambridge University Press, Cambridge, 2000.
[18] P. Moreno and P. Ho, “A new svm approach to speaker identification and verification using probabilistic distance kernels,” in Proc. Eurospeech, 2003.
[19] R. Konodorand Jebara T. , “A kernel between sets of vectors ,” in Proc. ICML, 2003.
[20] T. Jaakkola and D. Haussler, “Exploiting generative models in discriminative classifiers, ” Advances in Neural Information Processing Systems 11, 1998.
[21] V.Wan and S. Renals, “Speaker verification using sequence discriminant support vector machines,” IEEE Trans. On Speech and Audio Processing, 2004.
[22] Yeshwant K. Muthusamy, Ronald A. Cole and Beatrice T. Oshika, “The OGI multi-language telephone speech corpus,” ICSLP , 92 Proceeding, Vol. 2 , pp. 895-898, October 1992.