研究生: |
張智傑 Zhi-Jie Chang |
---|---|
論文名稱: |
以高斯混合模型表徵器與語言模型為基礎之語言辨認研究 Language Identification based on Gaussian Mixture Model Tokenizer and Language Model |
指導教授: |
王小川
Hsiao-Chuan Wang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 中文 |
論文頁數: | 69 |
中文關鍵詞: | 語言辨認 、高斯混合模型 、表徵器 、語言模型 、連結聲學語言模型 |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
隨著世界化的潮流驅勢,不同人種共同生活的機會越來越高,隨之而來的問題便是語言之間的差異。各式自動化的服務諸如飯店的訂房、機票的預訂、車票的訂購、以及醫院的掛號等等,都面臨到多國語言使用者的考驗。
本篇論文的研究重點在於尋找出不需要標註資料的自動化語言辨認法,主要建立於高斯混合模型表徵器,以及語言模型兩種基本模型上,加上切割處理以及後端處理的輔助,處理語音資料的語言辨認工作。主要系統架構分別為串聯高斯混合模型表徵器和語言模型的 “高斯混合模型表徵器-語言模型法” ,以及將語言模型融合在表徵器裡面的 “連結聲學-語言模型法” 兩種型式。
根據實驗結果發現,對於 “高斯混合模型表徵器-語言模型法”,在特徵參數的選用上,以38階梅爾刻度式倒頻譜參數表現最為理想 ; 表徵器的個數選擇上則是多表徵器的表現比起單一表徵器來得好 ; 後端處理器的選用以算術平均法的表現最為理想。
對於 “連結聲學-語言模型法”,經過切割處理的語料,在表徵器混合數128時,便能夠有和混合數256時差不多的高辨認率,顯示加入切割處理的幫助,系統的混合數能夠大幅度的降低。
由於 “連結聲學-語言模型法” 是根據模型的聲學相似度來做為辨認的依據,可能會有某些語言相似度偏高的狀況發生,實驗中也加入了偏差值的考量。
由實驗結果觀察,加入切割處理的幫助,能夠提升系統的辨認效能。在3國以及11國語言的辨認上,以 “高斯混合模型表徵器-語言模型法” 表現較好 ; 對於 6 國語言的辨認,則是去除偏差值的 ”連結聲學-語言模型法” 表現較好。
[1] Marc A. Zissman, Kay M. Berkling, “Automatic language identification”, Speech Communication, 2001
[2] Marc A. Zissman, “Automatic Language Identification Using Gaussian Mixture and Hidden Markov Models”, IEEE , 1993
[3] Eddie Wong, Sridha Sridharan, “Three Approaches to Multilingual Phone Recognition”, ICASSP, 2003
[4] Marc A. Zissman,“Comparison of Four Approaches to Automatic Language Identification of Telephone Speech, IEEE, 1996
[5] Pedro A. Torres-Carrasquillo, Elliot Singer, T. P. Gleason, W. M. Campbell, D. A. Reynolds, “Acoustic, Phonetic, and Discriminative Approaches to Automatic Language Identification”, Eurospeech, 2003
[6] Pedro A. Torres-Carrasquillo, Douglas A. Reynolds, J. R. Deller, Jr. , “Language Identification Using Gaussian Mixture Model Tokenization”, IEEE, 2002
[7] A. K. V. Sai Jayram, V. Ramasubramanian, T. V. Sreenivas, “Automatic Language Identification Using Acoustic Sub-word Units”, ICSLP, 2002
[8] A. K. V. Sai Jayram, V. Ramasubramanian, T. V. Sreenivas, “Language Identification Using Parallel Sub-word Recognition”, ICASSP, 2003
[9] A. K. V. Sai Jayram, V. Ramasubramanian, T. V. Sreenivas, “Language Identification Using Parallel Sub-word Recognition – an Ergodic HMM Equivalence”, Eurospeech, 2003
[10] Wuei-He Tsai, Wen-Whei Chang, “Discriminative training of Gaussian Mixture Bigram Models with Application to Chinese Dialect Identification”, Speech Communication, 2002
[11] Dan Qu, Bingxi Wang, “Automatic Language Identification Based on GMBM-UBBM”, IEEE, 2003
[12] Yeshwant K. Muthusamy, Ronald A. Cole and Beatrice T. Oshika, “The OGI Multi-language Telephone Speech Corpus”, ICSLP ’92 Proceedings, volume 2, pages 895-898, October 1992
[13] Marc A. Zissman, Elliot Singer, “Automatic Language Identification of Telephone Speech Messages Using Phoneme Recognition and N-gram Modeling”, IEEE, 1994
[14] M. A. Kohler, M. Kennedy, “Language Identification Using Shifted Delta Cepstra”, IEEE 2002
[15] Pedro A. Torres-Carrasquillo, Elliot Singer, Mary A. Kohler, Richard J. Greene, Douglas A. Reynolds, J. R. Deller, Jr. , “Approaches to Language Identification Using Gaussian Mixture Models and Shifted Cepstral Features”, ICSLP, 2002
[16] A. K. V. Sai Jayram, V. Ramasubramanian, T. V. Sreenivas, “Robust Parameters For Automatic Segmentation of Speech”, IEEE, 2002
[17] Chin-Hui Lee, Frank K. Soong, Biing-Hwang Juang, “A Segment Model Based Approach to Speech Recognition”, IEEE, 1998
[18] Torbjorn Svendsen, Frank K. Soong, “On the Automatic Segmentation of Speech Signals”, IEEE, 1987
[19] Eddie Wong, Sridha Sridharan, “Comparison of Linear Prediction Cepstrum Coefficients and Mel-Frequency Cepstrum Coefficients for Language Identification”, International Symposium on Intelligent Multimedia , Video and Speech Processing, 2001, page 95~98
[20] Eddie Wong, Sridha Sridharan, “Methods to Improve Gaussian Mixture Model Based Language Identification System”, ICSLP, 2002
[21] Chi-Jiun Shia, Yu-Hsien Chiu, Jia-Hsin Hsieh, Chung-Hsien Wu, “Language Boundary Detection and Identification of Mixed-Language Speech Based on MAP Estimation”, ICASSP, 2004