研究生: |
徐偉棠 |
---|---|
論文名稱: |
以語音辨識技術輔助英文母音發音之偵錯 Error-Spotting in Pronunciation of English Vowels based on Speech Recognition Technologies |
指導教授: |
張智星
Jyh-Shing Roger Jang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 英文 |
論文頁數: | 44 |
中文關鍵詞: | 語音辨識 、共振峰 、語音評分 |
外文關鍵詞: | speech recognition, formant, speech assessment |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
英文語音學習系統是結合音訊處理及語音辨識技術的學問,本論文主要論述的重點在於偵測英文母音的發音錯誤。我們提出可以不需要目標語句的錯誤發音偵測與學習的方法。許多語音上的研究都提到共振峰頻率係數的重要性,因此我們研究將此係數與傳統用於語音辨識的梅爾倒頻譜係數整合於隱藏式馬可夫模型(Hidden Markov Model,HMM),以期提升語音辨識以及錯誤偵測的正確率。另外,我們也提出發音混淆網路的方法,預測發音錯誤,之後再計算錯誤音素的可信度,提升錯誤發音的正確率。共振峰在發音上的特性則是提供回饋訊息的依據。
最後我們設計不同的實驗說明各方法的可行性及效能。
This thesis investigates the method for detecting error pronunciation of English vowels in utterances spoken by L2 learners, which requires the techniques from digital signal processing and speech recognition. We propose a text independent approach (which does not require the use of a target utterance) for English vowels error detection and learning. Various studies in formant-based speech synthesis have suggested the importance of formant coefficients; this motivates us to investigate pronunciation assessment using formant information instead of MFCC (Mel-frequency cesptrum coefficients) alone.
In particular, we explore the addition of formant information to improve the recognition rates of HMM. Then we propose the use of PCN (pronunciation confusion network) together with a formant-based confidence measure to raise error detection rates. The phonology knowledge about the formant and the articulator is then employing to generate high-level feedbacks to the user. Experimental results demonstrate that automatic generation of reliable pronunciation instruction (without using a target utterance) becomes highly possible.
[1] A. Acero, Formant analysis and synthesis using hidden Markov models, Proc. EuroSpeech, 1:1047-1050, 1999.
[2] David Talkin and John Shore. The ESPS formant tracker. Entropic Research Laboratory, Inc., 1997
[3] Eds Int’l Culture Enterprise Co., LTD, 2002
[4] Hidden Markov Model Toolkit V3.2. Speech Vision and Robotics Group of the Cambridge University Engineering Department, 2002. (http://htk.eng.cam.ac.uk/)
[5] http://ccms.ntu.edu.tw/~karchung/intro%20page%2029.htm
[6] http://www.speech.cs.cmu.edu/sphinx/doc/phoneset_s2.html
[7] http://www.uiowa.edu/~acadtech/phonetics/
[8] Huang, X., Acero A., and Hon, H. -W., Spoken Language Processing. Prentice Hall PTR, New Jersey, 2001.
[9] Jiang-Chun Chen, Jui-Lin Lo, Jyh-Shing Roger Jang, “Computer Assisted Spoken English Learning for Chinese in Taiwan”, ISCSLP 2004, HongKong.
[10] L. Neumeyer, H. Franco, V. Digalakis, and M. Weintraub, “Automatic scoring of pronunciation quality”, Speech Communication, vol. 30, no. 2-3, pp. 83-93, Feb.2000.
[11] My English Tutor, http://www.myet.com/en/Index.htm
[12] Peter Ladefoged, A Course in Phonetics, Harcourt Brace Johanovich, 2001.
[13] Rafid A. Sukkar and Chin-Hui Lee, “Vocabulary Independent Discriminative Utterance Verification for Nonkeyword Rejection in Subword based Speech Recognition”, ICASSP 1996
[14] Roseller Ortega Ing, “The Teaching of English Pronunciation”, The Crane Publishing CO., LTD, 1986.
[15] Wu, “The Secret of English Pronunciation”, Learning Publishing CO., LTD, 1992.
[16] Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “Computer-assisted English vowel learning system for Japanese speakers using cross language formant structures”. Proc. ICSLP 2000.
[17] Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “Practical Use of English Pronunciation System for Japanese Students in the CALL Classroom”. Proc. ICSLP 2004.
[18] C. Cucchiarini, W. Daelemans & H. Strik (2001) ,automatic speech recognition for second language learning,ELRA newsletter, Vol. 6, nr. 4, pp. 3-7
[19] Abhinav Sethy and Shrikanth Narayanan, Refined speech segmentation for concatenative synthesis,In Proc. of ICSLP, Denver, CO, 2002
[20] A. Acero. Formant Analysis and Synthesis using Hidden Markov Models, Proc. of the Eurospeech Conference. Budapest, Sep 1999
[21] Philippe Langlais, Anne-Marie Öster, Björn Granström.,Automatic Detection of Mispronunciation in non-native Swedish Speech.,Proceedings of Speech Technology in Language Learning, pages 41-44, ESCA workshop, Marholmen, Swede, may 1998
[22] Abhinav Sethy, Nicolaus Mote, Shrikanth Narayanan, W. Lewis Johnson. , Modeling and Automating Detectino of Errors in Arabic Language Learner Speech. , In Eurospeech, 2005
[23] Nicolaus Mote, Lewis Johnson, Abhinav Sethy, Jorge Silva, Shrikanth Narayanan. ,Tactical Language Detection and Modeling of Learner Speech Errors: The case of Arabic tactical language training for American English speakers. , In InSTIL, 2004.