簡易檢索 / 詳目顯示

研究生: 徐偉棠
論文名稱: 以語音辨識技術輔助英文母音發音之偵錯
Error-Spotting in Pronunciation of English Vowels based on Speech Recognition Technologies
指導教授: 張智星
Jyh-Shing Roger Jang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 44
中文關鍵詞: 語音辨識共振峰語音評分
外文關鍵詞: speech recognition, formant, speech assessment
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 英文語音學習系統是結合音訊處理及語音辨識技術的學問,本論文主要論述的重點在於偵測英文母音的發音錯誤。我們提出可以不需要目標語句的錯誤發音偵測與學習的方法。許多語音上的研究都提到共振峰頻率係數的重要性,因此我們研究將此係數與傳統用於語音辨識的梅爾倒頻譜係數整合於隱藏式馬可夫模型(Hidden Markov Model,HMM),以期提升語音辨識以及錯誤偵測的正確率。另外,我們也提出發音混淆網路的方法,預測發音錯誤,之後再計算錯誤音素的可信度,提升錯誤發音的正確率。共振峰在發音上的特性則是提供回饋訊息的依據。
    最後我們設計不同的實驗說明各方法的可行性及效能。


    This thesis investigates the method for detecting error pronunciation of English vowels in utterances spoken by L2 learners, which requires the techniques from digital signal processing and speech recognition. We propose a text independent approach (which does not require the use of a target utterance) for English vowels error detection and learning. Various studies in formant-based speech synthesis have suggested the importance of formant coefficients; this motivates us to investigate pronunciation assessment using formant information instead of MFCC (Mel-frequency cesptrum coefficients) alone.
    In particular, we explore the addition of formant information to improve the recognition rates of HMM. Then we propose the use of PCN (pronunciation confusion network) together with a formant-based confidence measure to raise error detection rates. The phonology knowledge about the formant and the articulator is then employing to generate high-level feedbacks to the user. Experimental results demonstrate that automatic generation of reliable pronunciation instruction (without using a target utterance) becomes highly possible.

    ABSTRACT (CHINESE) II ABSTRACT (ENGLISH) III CONTENTS IV LIST OF FIGURES VI LIST OF TABLES VII CHAPTER 1 INTRODUCTION 1 1.1 BACKGROUND 1 1.2 AUTOMATIC SPEECH RECOGNITION FOR CAPT 2 1.3 RESEARCH TOPIC 4 1.4 RELATED WORK 5 CHAPTER 2 SPEECH SEGMENTATION 7 2.1 ACOUSTIC MODEL TRAINING 7 2.1.1 Speech Corpora 7 2.1.2 Acoustic Model Design 9 2.2 RECOGNITION NETWORK GENERATION 9 2.2.1 Word-Internal and Cross-Word Expansion 9 2.2.2 Pronunciation Confusion Network (PCN) 12 2.3 SPEECH SEGMENTATION & PCN APPROACH 13 CHAPTER 3 FORMANT-LEVEL ASSESSMENT 18 3.1 FORMANT AND FORMANT FREQUENCY 18 3.2 RELATION BETWEEN ARTICULATION AND FORMANT 19 3.3 FORMANT NORMALIZATION 20 3.4 FORMANT-BASED HMM 21 3.5 FORMANT-LEVEL ASSESSMENT 23 3.5.1 Derived a GMM for Each Phone Models 23 3.5.2 Ranking process 23 3.5.3 Rank-Based Confidence Measure 24 3.6 FEEDBACK GENERATION 26 CHAPTER 4 EXPERIMENTAL RESULTS 28 4.1 INTRODUCTION 28 4.2 THE CORPORA 29 4.2.1 Corpus for Acoustic Model Training 29 4.2.2 Test data for Error Pronunciation Detection 29 4.3 EXPERIMENT 1: RECOGNITION ACCURACY FOR TIMIT 29 4.4 EXPERIMENT 2: PHONETIC SEGMENTATION ACCURACY FOR TIMIT 33 4.5 EXPERIMENT 3: RECOGNITION ACCURACY FOR EAT 35 4.6 EXPERIMENT 4: RECOGNITION ACCURACY USING WORD-INTERNAL & CROSS-WORD NETWORK EXPANSION FOR TIMIT, EAT 37 4.7 EXPERIMENT 5: DETERMINE THE NUMBER OF MIXTURES FOR GMM USED IN FORMANT-LEVEL ASSESSMENT 39 4.8 EXPERIMENT 5: DETERMINE THE THRESHOLD FOR RCM 40 CHAPTER 5 CONCLUSIONS & FUTURE WORK 42 REFERENCE 43

    [1] A. Acero, Formant analysis and synthesis using hidden Markov models, Proc. EuroSpeech, 1:1047-1050, 1999.
    [2] David Talkin and John Shore. The ESPS formant tracker. Entropic Research Laboratory, Inc., 1997
    [3] Eds Int’l Culture Enterprise Co., LTD, 2002
    [4] Hidden Markov Model Toolkit V3.2. Speech Vision and Robotics Group of the Cambridge University Engineering Department, 2002. (http://htk.eng.cam.ac.uk/)
    [5] http://ccms.ntu.edu.tw/~karchung/intro%20page%2029.htm
    [6] http://www.speech.cs.cmu.edu/sphinx/doc/phoneset_s2.html
    [7] http://www.uiowa.edu/~acadtech/phonetics/
    [8] Huang, X., Acero A., and Hon, H. -W., Spoken Language Processing. Prentice Hall PTR, New Jersey, 2001.
    [9] Jiang-Chun Chen, Jui-Lin Lo, Jyh-Shing Roger Jang, “Computer Assisted Spoken English Learning for Chinese in Taiwan”, ISCSLP 2004, HongKong.
    [10] L. Neumeyer, H. Franco, V. Digalakis, and M. Weintraub, “Automatic scoring of pronunciation quality”, Speech Communication, vol. 30, no. 2-3, pp. 83-93, Feb.2000.
    [11] My English Tutor, http://www.myet.com/en/Index.htm
    [12] Peter Ladefoged, A Course in Phonetics, Harcourt Brace Johanovich, 2001.
    [13] Rafid A. Sukkar and Chin-Hui Lee, “Vocabulary Independent Discriminative Utterance Verification for Nonkeyword Rejection in Subword based Speech Recognition”, ICASSP 1996
    [14] Roseller Ortega Ing, “The Teaching of English Pronunciation”, The Crane Publishing CO., LTD, 1986.
    [15] Wu, “The Secret of English Pronunciation”, Learning Publishing CO., LTD, 1992.
    [16] Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “Computer-assisted English vowel learning system for Japanese speakers using cross language formant structures”. Proc. ICSLP 2000.
    [17] Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “Practical Use of English Pronunciation System for Japanese Students in the CALL Classroom”. Proc. ICSLP 2004.
    [18] C. Cucchiarini, W. Daelemans & H. Strik (2001) ,automatic speech recognition for second language learning,ELRA newsletter, Vol. 6, nr. 4, pp. 3-7
    [19] Abhinav Sethy and Shrikanth Narayanan, Refined speech segmentation for concatenative synthesis,In Proc. of ICSLP, Denver, CO, 2002
    [20] A. Acero. Formant Analysis and Synthesis using Hidden Markov Models, Proc. of the Eurospeech Conference. Budapest, Sep 1999
    [21] Philippe Langlais, Anne-Marie Öster, Björn Granström.,Automatic Detection of Mispronunciation in non-native Swedish Speech.,Proceedings of Speech Technology in Language Learning, pages 41-44, ESCA workshop, Marholmen, Swede, may 1998
    [22] Abhinav Sethy, Nicolaus Mote, Shrikanth Narayanan, W. Lewis Johnson. , Modeling and Automating Detectino of Errors in Arabic Language Learner Speech. , In Eurospeech, 2005
    [23] Nicolaus Mote, Lewis Johnson, Abhinav Sethy, Jorge Silva, Shrikanth Narayanan. ,Tactical Language Detection and Modeling of Learner Speech Errors: The case of Arabic tactical language training for American English speakers. , In InSTIL, 2004.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE