研究生: |
游永昌 |
---|---|
論文名稱: |
三音豐富以及雙音豐富語音資料庫在語音辨識表現之探討 A Comparison of the Biphone-rich and the Triphone-rich Speech Corpora in Automatic Speech Recognition |
指導教授: | 江永進 |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
理學院 - 統計學研究所 Institute of Statistics |
論文出版年: | 2004 |
畢業學年度: | 92 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 語音辨識 、台語 、雙音豐富 、三音豐富 、平衡詞 、雙音模型 、三音模型 |
外文關鍵詞: | speech recognition, Taiwanese, biphone-rich, triphone-rich, balance corpus, RCD, triphone |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文要討論從台語詞庫中以不同的平衡條件,分別為以“音節間雙音模型”與“音節間三音模型”為平衡詞的主要篩選條件,錄製雙音豐富與三音豐富語音資料庫從中比較語音辨識效果,實驗為特定語者(speaker dependent),訓練聲學模型為隱藏式馬可夫模型,並以聲學模型的frame大小,動態決定每個聲學模型狀態的混合高斯函數個數,在搜尋網路上做了單音節網路與線性網路的辨識結果分析。
三音豐富平衡詞集合在音節間三音模型覆蓋率上高於雙音豐富平衡詞集合,所以預期三音豐富語音資料庫以音節間三音模型為訓練模型的辨識率高於雙音豐富的辨識率,但發現雙音豐富與三音豐富語音資料庫以音節間三音模型為訓練模型,在辨識效果不如原先所預期,雖然有比較高,但沒有顯著的差距。
首先,語音資料庫以音節內聲學模型為訓練模型比較其辨識率,由於語料不足,使得音節內雙音模型比音節內的三音模型辨識率來的高;接著比較以“音節內聲學模型”與“音節間聲學模型”為訓練模型,在相同Mixture數下,音節間聲學模型大多比音節內聲學模型辨識率高,所以音節間聲學模型當訓練模型是有必要的,但辨識時間上卻差一大截,這是值得研究的一項課題。
We compare in this thesis the performance of a speech recognition system trained with two speech corpora. From the dictionary of the Daiim input method, we select two set of words such that they covered all the cross-syllable biphones and triphones, and are called biphone-rich and triphone-rich respectively. It is found that a complete coverage of the cross-syllable triphones requires words of about ten times than that of cross-syllable biphones. To facilitate fair comparison, the biphone-rich corpus is thus consisted of ten sets of words that each covers all the cross-syllable biphones. It is interesting to note that the triphone coverage of this biphone-rich corpus is much lower than that of the triphone-rich set.
With those words as transcript, a male Taiwanese speaker recorded all the words as microphone speech. The resulting speech corpora, about 100 minutes for each set, are used to train for the acoustic models. Although both perform quite well in tasks with recognition networks of linear net and free syllable net, the triphone-rich corpus does not show advantages over the biphone-rich corpus.
Liang, M., Lyu, R., Chaing, Y. (2003), “An Efficient Algorithm to Select Phonetically Balanced Scripts for Constructing A Speech Corpus”, International Conference on Natural Language Processing and Knowledge Engineering , NLP-KE’03, Beijing, China 26-29
Rabiner, L. & Jung, B.H. (1993), “Fundamental of Speech Recognition”, Prentice-Hall International
Young, S. (2000), “HTK BOOK v3.0”, Entropic
江永進 (2003), “台音輸入法6.0”, 清華大學統計所
呂道誠 (2000), “Speaker Independent Acoustic Modeling for Large Vocabulary Bi-lingual Mandarin/Taiwanese Continuous Speech Recognition”, 長庚大學碩士論文
陳志宇 (1990), “國台雙語大詞彙與連續語音辨認系統研究”, 長庚大學碩士論文
梁伯宇 (1998), “國語連續語音辨識之聲學模型研究”, 台灣大學碩士論文
謝文萍 (1998), “以右連音為單位運用決策分類樹的台語大辭彙語音辨識”, 清華大學碩士論文