簡易檢索 / 詳目顯示

研究生: 牛學文
Niu Hsueh-Wen
論文名稱: 最小化音素錯誤鑑別式訓練法則應用於華語語者調適之研究
A Study on Minimum Phone Error Discriminative Training for Mandarin Chinese Speaker Adaptation
指導教授: 張智星
Jyh-Shing Jang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2007
畢業學年度: 95
語文別: 中文
論文頁數: 50
中文關鍵詞: 最小化音素錯誤語者調適最大化相似度線性迴歸
外文關鍵詞: MPE, speaker adaptation, mllr, map, regression tree
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在語音辨識技術的實際應用上,語者調適技術(speaker adaptation)常被用做調整語者無關(speaker independent)之聲學模型,使其對於特定語者之辨識率有所提升,常見的語者調適技術有最大化相似度線性迴歸(Maximum Likelihood Linear Regression),其精神在於透過對語音辨識模型相近的mixture做群聚,再對各個群聚做調整,以期能達到利用少量語料,提升辨識率的目的,但其缺點為,在發音上相近的模型(如ㄓ、ㄗ),其mixture原本就極為近似,若再分類為同一群聚做調整,容易因為使用者的發音習慣,而使模型偏向ㄓ或ㄗ,因此雖然整體的辨識率獲得提升,但卻造成混淆音的錯誤率上升。
    本論文提出應用近年來提出之最小化音素錯誤鑑別式訓練法則,對於經過語者調適之模型,使用調適語料,做更進一步之最小化音素錯誤訓練,並透過調整I-smoothing參數、降低或是改變I-smoothing中maximum likelihood estimation的權重,改變音素圖結構、以及音素正確率計算方式,以期能降低混淆音之錯誤率,並進一步提升模型之整體辨識率。此外,本論文更進一步結合regression tree的概念,以regression tree中群聚為基礎,調整MPE之I-smoothing權重參數,目標在使調適後之聲學模型對regression tree群聚中的音素有更佳的辨識率。


    In order to decrease the error rate of speech recognition, speaker adaptation techniques are often used to adjust speaker-dependent acoustic models. MLLR (Maximum Likelihood Linear Regression) and MAP (Maximum a Posteriori) are two of the most popular techniques in recent years. MLLR uses the technique of regression trees. It calculates the transform matrix for each leaf node of the tree. This makes it possible to use fewer sentences to decrease the error rate of HMM-based speech recognition. However, while we examined the recognition result, we found that although the overall error rate decreased, but the error rate of certain confusable phones was higher.
    In order to solve this problem, we propose the use MPE (Minimum Phone Error Discriminative Training) to solve this problem. We use the same corpus as the one in MLLR adaptation, and use MPE to make further adjustment to acoustic models which have been adapted by MLLR. Besides, we tested several methods such as adjusting I-smoothing factors or phone lattices to obtain finer result. Besides, we also introduced a new approach to reduce the computation time of both the lattice construction and the MPE- weight calculation, all based on a better use of n-best recognition (3.3.3).
    Furthermore, we proposed a new method to combine the statistic result of regression trees and I-smoothing factor based on the observation result of chapter 2.1.3. Experiment results show that it can further reduce the error rate.

    目錄 致謝 I Abstract II 中文摘要 III 目錄 IV 圖片目錄: VII 表格目錄: IX 第1章. 緒論 10 1.1. 實驗動機 10 1.2. 相關研究 10 1.2.1. 語者調適相關研究(使用MLLR與MAP) 10 1.2.2. 最小化音素錯誤鑑別式訓練法 11 1.3. 研究內容 11 1.4. 章節概要 11 第2章. 語者調適(Speaker Adaptation) 13 2.1. 最大化相似度線性迴歸(Maximum Likelihood Linear Regression) 13 2.1.1. 估測對應Mean之轉換矩陣 13 2.1.2. 估測對應Variance之轉換矩陣 14 2.1.3. Regression Class Trees 15 2.2. MAP (Maximum a Posteriori) 16 2.3. MPE最小化音素錯誤訓練法 18 2.4. MLLR+MAP+MPE 22 2.5. 以regression tree為基礎調整 平滑係數 23 第3章. 實驗結果與討論 28 3.1. 語料介紹與模型參數之設定: 28 3.2. 實驗流程 30 3.3. 實驗結果與討論 31 3.3.1. 參數設定: 31 3.3.2. 實驗一:MPE語者調適 33 3.3.3. 實驗二:比較N-Best之N值對Lattice建立以及辨識率之影響 35 3.3.4. 實驗三:比較I-smoothing對MPE語者調適結果之影響 39 3.3.5. 實驗四:音素圖中靜音之原始正確率設為0 41 3.3.6. 實驗五:音素圖中加入正確答案對MPE語者調適之影響 43 3.3.7. 實驗六:Regression Tree Based MPE 44 第4章. 結論與未來展望 47 附錄一:拼音 音素編號對照表 48

    參考文獻
    [1]
    D. Povey, “Discriminative Training for Large Vocabulary Speech Recognition”, Ph.D. thesis 2004
    [2]
    Jen-Wei Kuo, “An Initial Study on Minimum Phone Error Discriminative Learning of Acoustic Models for Mandarin Large Vocabulary Continuous Speech Recognition”, thesis, 2005.
    [3]
    M.J.F Gales , “Maximum Likelihood Linear Transformations For Hmm Based Speech Recognition”, May 1997 (revised January 1998)
    [4]
    M.J.F Gales and P.C. Woodland , “Mean and Variance Adaptation within the MLLR Framework.” , April 1996
    [5]
    C.J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”. Proc. ARPA Spoken Language Technology Workshop, pp. 104–109, Feb. 1995.
    [6]
    M.J.F Gales and P.C. Woodland , “Speaker Adaptation of HMMS Using Linear Regression”. CUED/F-INFENG/TR. 181 June 1994.
    [7]
    The HTK Book (for HTK Version 3.4) , COPYRIGHT 2001-2006 Cambridge University Engineering Department.
    [8]
    L.F. Uebel & P.C. Woodland , “Discriminative Linear Transforms for Speaker Adaptation” ITRW on Adaptation Methods for Speech Recognition Sophia antipolis , France , August 29-30 2001.
    [9]
    Shuo-Pin Hsu, “An Initial Study on Minimum Phone Error Discriminative Training for Continuous Phone Recognition System”, Thesis 2006.
    [10]
    D. Povey, P.C. Woodland, “Minimum phone error and I-smoothing for Improved Discriminative Training”, ICASSP2002.
    [11]
    Shih-Sian Cheng, Yeong-Yuh Xu, Hsin-Min Wang and Hsin-Chia Fu, “Automatic Construction of Regression Class Tree for MLLR via Model-based Hierarchical Clustering”, IEEE Trans. on Audio, Speech, and Language Processing, volume 14, number 1, pages 330-341, January 2006.
    [12]
    Seyed Mohammad Ahadi-Sarkani, “Bayesian and Predictive Techniques for Speaker Adaptation”. Ph.D. Thesis, Cambridge University, U.K., 1996.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE