研究生: |
周哲玄 Che-Hsuan Chou |
---|---|
論文名稱: |
台語關鍵詞辨識之實作與比較 Implementation and Comparison of Keyword Spotting for Taiwanese |
指導教授: | 張智星 |
口試委員: |
呂仁園
江永進 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 關鍵詞辨識 、隱藏式馬可夫模型 、懲罰矩陣 |
外文關鍵詞: | Keywords spotting, hidden Markov model, penalty matrix |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主要探討為結合語音評分與聲調辨識實做台語關鍵詞辨識系統改進其辨識率,第一階段先分別利用了不同方法來實做台語關鍵詞辨識系統,第二階段使用語音評分與音高走勢分類器進行驗證以改善辨識效能。
首先,在第一階段實做關鍵詞辨識使用了兩種方法,第一種為隱藏式馬可夫模型(hidden Markov model)。第二種方法為音素匹配法(phone mismatch),而音素匹配法又分為三種方法,懲罰矩陣法(penalty matrix, PM)、混淆矩陣法(confusion matrix , CM)及距離矩陣法(Levenshtein matrix, LD)。而第二階段則是將此兩種方法辨識出來的候選關鍵詞,進行語音評分和音高走勢分類器驗證,將此兩種方法分別設立門檻值,將此兩個門檻值利用決策樹法,做關鍵詞的驗證。
實驗結果顯示,使用隱藏式馬可夫模型其相等錯誤率(equal error rate, ERR)為46.5%,加入語音評分做為驗證後ERR下降為26.5%,更進一步加入音高走勢分類器其FAR下降為1.8%,音素匹配法ERR分別為懲罰矩陣法39.4%、混淆矩陣法34.0%、距離矩陣法42.2%最佳為ERR 34.0%,加入評分驗證ERR分別下降為PM-34.6%、CM-28.4%,進一步加入音高走勢分類器驗證,ERR分別下降為PM-33.7%、CM-27.3%,因此語音評分和音高走勢分類器對於台語關鍵詞系統的驗證是很有幫助的。
This thesis focuses on improving in the performance of a Taiwanese keyword spotting system by integrating speech assessment and pitch contour classification. In the first part of this research, we use different methods to implement a Taiwanese keyword spotting system. In second part, we improve the system by validation using speech assessment and pitch contour classification.
In the first part, two methods are adopted to implement the keyword spotting system. The first method uses the hidden Markov model while the second method uses the phone mismatching method. The phone mismatching method can be further characterized into three types of algorithm: penalty matrix (PM), confusion matrix (CM) and Levenshtein matrix (LD). We then perform speech assessment and pitch contour classification to validates the candidate keywords selected by these two methods to refine the results. A threshold is used for each of these two methods, and a decision tree is used to make the final decision.
Experimental results shows that the HMM method can achieve an equal error rate (ERR) of 46.5%. The ERR reduces to 26.5% after the HMM method is incorporated with speech assessment validation, the FAR further reduces to 24.7% after being incorporated with pitch contour classification. In the phone mismatch experiment, PM, CM, and LD achieve an ERR of 39.4%, 34.0%, and 42.2% respectively. After being incorporated speech assessment validation, ERRs reduce to 34.6% for PM and 28.4% for CM. After being incorporated with pitch contour classification, ERRs further reduce to 33.7% for PM and 27.3% for CM. This concludes that the validation technique using speech assessment and pitch contour classification can improve the performance of Taiwanese keyword spotting.
【1】 C. W. Han, S. J. Kang, and N. S. Kim, “Estimation of phone mismatch penalty matrices for two-stage keyword spotting,” IEICE TRANSACTIONS on Information and Systems Vol.E93-D No.8 pp.2331-2335
【2】 K. Audhkhasi and A. Verma, “Keyword search using modified minimum edit distance measure,” Proc. ICASSP, pp. 929-932,Apr. 2007.
【3】 M. S. Barakat, C. H. Ritz, D. A. Stirling, “Keyword Spotting based on the Analysis of Template Matching Distances,” 5th International Conference on Signal Processing and Communication Systems ICSPCS,2011
【4】 S.L. Zhang, Z.W. Shuang, Q. Shi, and Y. Qin, “Improved mandarin keyword spotting using confusion garbage model”, in Proc. ICPR, pp. 3700-3703, Istanbul, Turkey, 2010
【5】 L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proc. Of the IEEE, Vol.77, No.2, pp. 257-286, Feb, 1989.
【6】 Huang, X., Acero, A., and Hon, H.W., “Spoken Language Processing”, New Jersey, Prentice Hall, 2001
【7】 R.-Y. Lyu, M.-S. Liang, Y.-C. Chiang, Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin Chinese, International Journal of Computational Linguistics & Chinese Language Processing, 2004
【8】 Liao, H.-C., Chen, J.-C. , Chang, S.-C., Guan, Y.-H., Lee, C.-H., “Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment”, In INTERSPEECH 2010.
【9】 Chen, J.-C., Jang, J.S. R., “TRUES: Tone Recognition Using Extended Segment”, ACM Trans. Asian Lang. Inform. Process. 7, 3, Article 10, 2008.
【10】 Ross, M.Shaffer, H. Cohen, A. Freudberg, R., and Manley, H., “Average Magnitude Difference Function Pitch Extractor,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 22,No. 5,353-362, 1974
【11】 鐘進竹,結合聲調辨認之中文關鍵詞辨認系統,國立交通大學碩士論文,民國100年
【12】 李俊毅,語音評分,清華大學碩士論文,民國91年
【13】 陳宏瑞,使用多重聲學模型以改良台語語音評分,清華大學碩士論文,民國100年
【14】 傅振宏,基於自動產生合成單元之台語語音合成系統,長庚大學碩士論文,民國89年
【15】 黃士旗,中文語音聲調辨識的改良與錯誤分析,清大碩士論文,民國95年。
【16】 黃冠達,應用支撐向量機於中文關鍵詞驗證之研究,國立臺灣科技大學碩士論文,民國96年