簡易檢索 / 詳目顯示

研究生: 許碩斌
Shuo-Pin Hsu
論文名稱: 最小音素錯誤鑑別式訓練法則應用於連續音素辨識系統之初步研究
An Initial Study on Minimum Phone Error Discriminative Training for Continuous Phone Recognition System
指導教授: 張智星
Jyh-Shing Roger Jang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 53
中文關鍵詞: 最小音素錯誤鑑別式訓練法則連續音素辨識系統
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於傳統的聲學模型訓練方法採用最大化相似度訓練法則(Maximum Likelihood Estimation, MLE),在訓練時沒有考慮模型與模型之間相互鑑別性的關係,為了提升模型之間的相互鑑別度,鑑別式訓練法則因而被提出。本論文有鑒於近年來最小化音素錯誤訓練法則(Minimum Phone Error, MPE)在許多實驗的效果顯著,於是將之應用於以TIMIT為語料的連續音素辨識系統。在本論文之實驗中,先以最大化相似度訓練法則做聲學模型的訓練,接著採用最小化音素錯誤訓練法則對聲學模型進行再訓練。實驗結果顯示相較於最大化相似度訓練法則,最小化音素錯誤訓練確實能進一步降低音素錯誤率。最小化音素錯誤採用 Phone Lattice來代表所有可能句子的集合,本論文主要採用N-最佳路徑(N-Best List)的方法來建構Phone Lattice(N-Best Synthesized Lattice),針對Phone Lattice中最混淆的部份做最有效率的訓練。另外為了突顯音素混淆的部份以及過濾在相同時間重複出現的音素,本論文採用另一種Phone Lattice:Sausage來實做,藉由此精簡過的Phone Lattice,提升最小音素錯誤訓練的效果。


    Maximum Likelihood Estimation (MLE) is a traditional method for training acoustic models for speech recognition. This method does not consider discriminative relation between acoustic models, so some models are apt to obscure each other. In order to raise the differentiation degree between models, discriminative training criteria are proposed. Seeing that Minimum Phone Error (MPE) criterion has great progress reported in the literature, we apply MPE to continuous phone speech recognition system in this thesis. The procedure is to adopt MLE to train acoustic models first, and then use MPE to refine the models again. According to the experimental result, MPE can reduce phone error rate further. In general, MPE adopts phone lattice to express all possible sentences. In order to improve the efficiency, we use N-Best list to construct a phone lattice which is called N-Best Synthesized Lattice. Besides, in order to distinguish obscure phones and remove repeated words that appear in very close time, we use another kind of phone lattice called sausage that can improve the results of MPE.

    第一章、緒論 1 1.1 研究動機 1 1.2 研究內容 2 1.3 相關研究 3 1.4 章節概要 4 第二章、鑑別式訓練法則 5 2.1.貝氏風險(Bayes Risk) 5 2.2.最大交互資訊(Maximum Mutual Information) 6 2.3.最小分類錯誤(Minimum Classification Error) 9 第三章、最小化音素錯誤訓練法則 11 3.1 最小音素錯誤的目標函式以及音素正確率的估算 11 3.2 最大化目標函式 14 3.3 模型參數的更新 18 3.4另一種Phone Lattice:Sausage 23 第四章、實驗與結果討論 25 4.1 語料介紹與模型參數之設定 25 4.2 實驗流程 28 4.3 實驗結果與討論 29 4.3.1 參數設定介紹 29 4.3.2 實驗一:比較Mono-Phone與Bi-Phone應用於MPE的結果 30 4.3.3 實驗二:是否加入I-平滑函式的實驗 32 4.3.4 實驗三:Phone Lattice中是否加入正確句子的實驗 34 4.3.5 實驗四:不同平滑係數D的實驗 36 4.4.6 實驗五:人工切音與聲學模型切音產生正確句子的比較 38 4.3.7 實驗六:比較Sausage與N-Best Synthesized Lattice的結果 40 4.3.8 實驗七:Sausage與N-Best Synthesized Lattice的交替訓練 42 4.3.9 實驗八:Phone Lattice大小的實驗 44 第五章、結論與未來工作 46 附錄A 平滑函式條件符合之驗證 48 附錄B 模型參數更新公式的推導 49 附錄C 平滑係數D條件推導與設定 51 參考文獻 52

    【1】 B. Merialdo, “Phonetic Recognition using Hidden Markov Models and Maximum Mutual Information Training”, ICASSP1988.
    【2】 KAI-FU LEE, HSIAO-WUEN HON, “Speaker-Independent Phone Recognition Using Hidden Markov Models”, IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989.
    【3】 Frank K. Soong, Eng-Fong Huang*, “A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition”, ICASSP1991.
    【4】 S. Katagiri, B.-H. Juang, “Discriminative Learning for Minimum Error Classification”, IEEE Transactions on Signal Processing, Vol. 40, No. 12, December, 1992.
    【5】 Jung-Kuei Chen, Frank K. Soong, “An N-Best Candidates-Based Discriminative Training for Speech Recognition Applications”, IEEE Transactions on Speech and Audio Processing, 1994.
    【6】 Bach-Hiep Tran, Frank Seide, Volker Steinbiss, “A Word Graph Based N-Best Search in Continuous Speech Recognition”, ICSLP1996.
    【7】 V. Valtchev, J.J. Odell, P.C. Woodland, S.J. Young, “Lattice-based Discriminative Training for Large Vocabulary Speech Recognition”, ICASSP 1996.
    【8】 Biing-Hwang Juang, Wu Chou, Chin-Hui Lee, “Minimum Classification Error Rate Methods for Speech Recognition”, IEEE Transactions on Acoustics, Speech, and Signal Processing, 1997.
    【9】 R. Schl¨uter, W. Macherey, S. Kanthak, H. Ney, L. Welling,“Comparison of Optimization Methods for Discriminative Training Criteria”,EuroSpeech 1997.
    【10】 V. Valtchev, J.J. Odell, P.C. Woodland, S.J. Young, “MMIE Training of Large Vocabulary Speech Recognition System”, Speech Communication 1997.
    【11】 Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern Classification, Second Edition”, 2000.
    【12】 L. Mangu, E. Brill, A. Stolcke, “Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks”, Ph.D. thesis 2000.
    【13】 P.C. Woodland, D. Povey, “Large Scale Discriminative Training For Speech Recognition”, ISCA ITRW 2000.
    【14】 D. Povey, P.C. Woodland, “Minimum phone error and I-smoothing for Improved Discriminative Training”, ICASSP2002.
    【15】 S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, P.C. Woodland, “The HTK Book”, 2002.
    【16】 D. Povey, “Discriminative Training for Large Vocabulary Speech Recognition”, Ph.D. thesis 2004.
    【17】 Keith Vertanen, “An Overview of Discriminative Training for Speech Recognition”, Computer Speech, Text and Internet Technology University of Cambridge, 2004.
    【18】 Jing Zheng, Andreas Stolcke, “Improved Discriminative Training Using Phone Lattices”, INTERSPEECH2005.
    【19】 Jen-Wei Kuo, “An Initial Study on Minimum Phone Error Discriminative Learning of Acoustic Models for Mandarin Large Vocabulary Continuous Speech Recognition”, thesis, 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE