簡易檢索 / 詳目顯示

研究生: 陳宏瑞
Chen, Hung-Jui
論文名稱: 使用多重聲學模型以改進台語語音評分
Improving Taiwanese Pronunciation Scoring via Multiple Acoustic Models
指導教授: 張智星
Jang, Jyh-Shing Roger
口試委員: 江永進
呂仁園
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 41
中文關鍵詞: 隱藏式馬可夫模型語音模型多重模型
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文論述的重點在於使用多個語音模型改良評分,此論文是利用基於排名的評分方式進行評分,所以在評估實驗的效果上,也是使用以排名為基礎的評估方式。在論文中將會介紹用五種不同的訓練方法建立聲學模型,第一種模型並不針對任何問題進行解決,只是最基本的訓練流程,其他四種方法分別針對四種問題進行處理,四項問題分別為:切音不準確問題、台語腔調問題、子音評分不準確、以及部分右連模型訓練不足問題。

    首先,在切音不準確問題上,因為音檔的頭尾切音容易不準確,所以我們不把頭尾音素列入從訓練音檔中,用此不含頭尾音素的訓練語料訓練出一組聲學模型。針對台語腔調問題,因為音素模型er和o在部分腔調中發音很接近容易混淆,因此本實驗利用調高音素模型er和o的高斯混和數試圖解決此問題。針對第三點子音評分不準確的問題,子音在各種語音處理上一直是個困難點,本實驗利用調高所有音素模型高斯混和數試圖解決此問題。在最後一點,部分右連模型訓練不足問題上,由於部分音素模型在右聯音素出現次數很少,但單音音素出現很多,所以在實驗中我們提高單音音素模型的訓練次數,以期提高此種模型的效果。經由實驗結果證實,前三種模型能夠有效改善原有的問題,而第四種模型所得到的整體效能則是比原有模型略為下降,但五種訓練法所得的模型分別對不同的音素有較佳的評分表現。

    因此本論文提出了一種將多個模型合併使用的多重模型評分法,分別利用五種模型,先進行內部測試,以取得各音素在不同模型下的效果,並找出每種音素對應的模型,然後進行外部測試,同時利用五個模型分別對測試語料進行評分,再利用內部測試時得知的模型和音素的對應,從五個模型對測試語料的評分結果中取出音素所對應的結果,最後結合成完整的評分結果。由最後的實驗結果得知,比起使用任一前五種訓練法所得模型,使用多重模型能夠將評分效果更加提升。


    This thesis proposes the use of multiple acoustic models in order to improve Taiwanese pronunciation scoring. All pronunciation scoring used in this research is based on ranking of a phone model against its competing models so that the performance evaluation is also carried out based on ranking. Five training methods are used to generating acoustic models for different purposes. The first method trains acoustic models for the general purpose by using all training corpus. This type of acoustic models serves as out baseline system. The rest of the four training methods aim to solve different problems encountered during the course of pronunciation scoring. They are inaccurate forced alignment, variations in Taiwanese accents, unreliable pronunciation scoring of consonant phonemes, and insufficient training data for certain right-context dependent biphone models. First of all, since the forced alignment results on the beginning and the end of a sentence are usually inaccurate, we discard all sentence-beginning and sentence-end phoneme segments from the training data. We use the remaining training data to train our acoustic models. For the problem of variations in Taiwanese accents, we found that certain occurrances of "er" and "o" sounds in our training data were pronounced similarly and are easily confused in our speech recognition system. We attempt this problem by explicitly increasing the number of mixture components of their corresponding biphone models. For the problem of unreliable pronunciation scoring of consonant phonemes, consonants are usually short in duration and do not have a stable waveform so that they are usually more difficult to model. We tackle this problem by increasing the number of mixture components of all models. For the problem of insufficient training data for certain right-context dependent biphone models, we found that certain biphone instances are rarely seen in our training data, but their corresponding monophone instances are abundant. We therefore increase the number of training iterations on these monophone models before extending them into biphone models. The experimental result shows that the first three training methods can effectively improve the scoring performance while the last method has a light decrease in performance. However, we also found that acoustic models trained from each of the five training methods show satisfactory scoring performance to different set of phone models. We therefore propose a method that uses multiple acoustic models for pronunciation scoring. We look for the best phone model among the five the above-mentioned five types of acoustic models by running an inside test. We then carry out an outside test for scoring by using the corresponding phone models. The experimental result shows that the proposed method exhibits a better performance than any of the above five models. Key words: hidden Markov models, acoustic models, multiple acoustic models.

    摘要 2 Abstract 3 謝誌 5 目錄 6 表目次 8 圖目次 9 第1章 緒論 10 1.1. 研究主題 10 1.2. 台語語音評分系統簡介 10 1.3. 研究方向與主要成果 10 1.4. 章節概要 11 第2章 文獻探討 11 2.1. 特徵的抽取 11 2.2. 隱藏式馬可夫模型的建立 13 2.3. 語音評分 13 第3章 研究方法 16 3.1. 系統架構 16 3.2. 訓練及測試資料 17 3.3. 實驗方法 18 3.3.1. 模型一 一般模型 19 3.3.2. 模型二 頭尾模型 20 3.3.3. 模型三 腔調特色模型 21 3.3.4. 模型四 子音改良模型 22 3.3.5. 模型五 右連模型改良模型 23 3.3.6. 多重模型 24 第4章 結果分析 27 4.1. 評估方式 27 4.2. 實驗數據 30 4.2.1. 模型一 一般模型 30 4.2.2. 模型二 頭尾模型 30 4.2.3. 模型三 腔調特色模型 31 4.2.4. 模型四 子音改良模型 33 4.2.5. 模型五 右連模型改良處理模型 33 4.2.6. 多重模型 35 4.3. 實驗結果的分析 36 4.4. 研究工具 37 第5章 結論與未來研究方向 38 5.1. 結論 38 5.2. 未來研究方向 38 第6章 參考文獻 39

    【1】 A. Ito, Y.-L. Lim, M. Suzuki and S. Makino, Pronunciation Error Detection Method based on Error Rule Clustering using a Decision Tree, INTERSPEECH, pp173-176, 2005.
    【2】 F. Zhang and M. Wagner, Effects of F0 Feedback on the Learning of Chinese Tones by Native Speakers of English, ITERSPEECH, pp.181-184, 2005.
    【3】 F.-C. Chou, Ya-Ya Language Box – A Portable Device for English Pronunciation Training with Speech Recognition Technologies, INTERSPEECH, pp.169-172, 2005.
    【4】 J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice Hall, 1996.
    【5】 J.-C. Chen, J.-S. R. Jang, J.-Y. Li and M.-C. Wu, Automatic Pronunciation Assessment for Mandarin Chinese, IEEE ICME, pp. 1979-1982, 2004.
    【6】 Khiet Truong, Ambra Neri, Febe de Wet, Catia Cucchiarini , Helmer Strik, Automatic detection of frequent pronunciation errors made by L2-learners, INTERSPEECH, pp.1345-1348, 2005.
    【7】 L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall PTR, Upper Saddle River, New Jersey, 1993.
    【8】 L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc. Of the IEEE, Vol.77, No.2, pp. 257-286, Feb, 1989.
    【9】 S. Young, The HTK Book version 3, Microsoft Corporation, 2000
    【10】 R.-Y. Lyu, M.-S. Liang, Y.-C. Chiang, Toward Constructing A Multilingual Speech Corpus for Taiwanese (Min-nan), Hakka, and Mandarin Chinese, International Journal of Computational Linguistics & Chinese Language Processing, 2004
    【11】 R.-Y. Lyu, M.-S. Liang, D.-C. Lyu, Y.-C. Chiang, C.-N. Hsu, Taiwanese Min-nan Speech Recognition and Synthesis" book chapter in ADVANCES IN CHINESE SPOKEN LANGUAGE PROCESSING, edit by Chin-Hui Lee, Lin-shan Lee, etc., published by World Scientific Publishing, 2006, ISBN 981-256-904-9.
    【12】 Valery A. Petrushin, Learning Chinese Tones, EUROSPEECH, pp. 3145-3148, 2003.
    【13】 艾爾科技 MyCT、MyET 自動語音分析系統 (Automatic Speech Analysis System)
    【14】 李俊毅,語音評分,清華大學碩士論文,民國91年
    【15】 梁振豊,台語語音辨識及智慧型口語對話汽車導航系統,國立交通大學碩士論文,民國95年
    【16】 楊永泰, 隱藏式馬可夫模型應用於中文語音辨識之研究, 中原大學碩士論文, 民國89年
    【17】 羅瑞麟,以語音辨識與評分輔助口說英文學習,國立清華大學碩士論文, 民國92年

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE