簡易檢索 / 詳目顯示

研究生: 李宛穎
Lee, Wan-Ying
論文名稱: 使用音高資訊以改進華語發音評量
Improving Mandarin Chinese Pronunciation Assessment by Utilizing Pitch Information
指導教授: 張智星
Jang, Jyh-Shing
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2010
畢業學年度: 99
語文別: 中文
論文頁數: 37
中文關鍵詞: 華語語音辨識切音聲學模型強迫對位音高資訊
外文關鍵詞: Mandarin Chinese automatic speech assessment, forced alignment, co-articulation, acoustic model, phone segmentation
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文主旨在於改進華語切音不夠準確之問題。良好的語音模型為自動語音評量的基礎,傳統電腦自動語音評量的步驟為:將錄製的語料利用訓練好的語音模型進行切音,一般利用強迫對位(Forced Alignment)方式,再將切好音的詞句與正確答案進行比對;然而,以往強迫對位方式對於連音,常常會發生切不準的情況,這裡連音定義為字與字之間無短暫停(Short Pause)之連續韻母,例如:蘇武(ㄙㄨ ㄨˇ)、一意(一ˊ一ˋ)、無謂(ㄨˊㄨㄟˋ)等;這些詞句都會影響切音的準確性,進而影響整體評量效果。
    因此我們提出利用華語聲調特性,將音高特徵加入訓練,預期能增進連音的切音準確率,並使用三種評估方式評量改良後的模型,分別是整句辨識率、模型排名比率、切音準確率評估,其中切音準確率分成一階段式與二階段式兩種作法;前兩種評估方式為模型可靠度評估,而切音準確率評估為論文重點。結果顯示雖然加入音高特徵使連音在模型排名比率中排名稍微滑落,仍然能幫助提昇切音準確率。


    This study aims to improve the accuracy of forced alignment for Mandarin Chinese. The performance of automatic speech assessment relies on the quality of acoustic models. The first step of traditional automatic speech assessment is to perform model-based forced alignment on input recording and then compare with the ground truth of acoustic model. However, forced alignment is not accurate enough for co-articulations. Here, we focus on those co-articulations without short pauses between two syllables. For example, 蘇武(“sū wǔ”), 一意(“yi yi”), 無謂(“wu wei”), and so on; the syllable boundaries between the two co-articulated syllables are heavily misaligned and hence impact the quality of the assessment.
    We therefore propose a new approach using the characteristic of tones in Mandarin Chinese. Additional pitch features are considered to improve the accuracy of forced alignment. Three metrics are evaluated: sentence recognition rate, model ranking ratio, one-pass and two-pass alignment. The first and the second metrics are focus on model reliability. And the third metric emphasize the accuracy of alignment. The results show that the accuracy of alignment is improved while model ranking ratio is slightly down.

    摘要 i Abstract ii 目錄 iii 表目錄 v 圖目錄 vi 第1章 緒論 1 1.1 簡介 1 1.2 章節概要 2 第2章 相關研究 3 2.1 自動語音辨識的電腦輔助發音訓練 3 2.1.1 發音評分 3 2.1.2 發音偵錯 4 2.2 語音特徵與模型 4 2.2.1 音高特徵擷取方法 4 2.2.2 語音模型相關研究 4 第3章 論文方法 6 3.1 基礎語音模型 6 3.1.1 建立基礎語音模型 6 3.2 加入音高特徵之語音模型 7 3.2.1 多重語音特徵簡介 7 3.2.2 UPDUDP音高追蹤 9 3.2.3 建立加入音高特徵聲學模型 13 第4章 效能評估方法與實驗及結果分析 15 4.1 實驗語料簡介 15 4.1.1 訓練語料 15 4.1.2 測試語料 16 4.2 效能評估方法 17 4.2.1 整句辨識率 17 4.2.2 模型排名比率 17 4.2.3 切音準確率 19 4.3 實驗1:原始聲學模型與加入音高特徵聲學模型比較 22 4.3.1 實驗目的 22 4.3.2 實驗流程與設定 22 4.3.3 實驗結果與分析 25 4.4 實驗2:經強迫對位之原始聲學模型與音高特徵聲學模型之比較 28 4.4.1 實驗目的 28 4.4.2 實驗流程與設定 28 4.4.3 實驗結果 30 4.4.4 探討與分析 33 第5章 結論與未來展望 35 參考文獻 36

    【1】 Kim, Y., Franco, H., and Neumeyer, L.,“Automatic Pronunciation Scoring of Specific phoneme Segments for Language Instruction”, in Proceedings of the 4th European Conferaence on Speech Communication and Technology, pp. 649-652, Rhodes, 1997.
    【2】 Jang, J.S.R., Chen, J.C., and Tsai, T.L.,“Automatic Pronunciation Assessment for Mandarin Chinese : Approach and System Overview”, Computational Linguistics and Chinese Language Processing, 2007.
    【3】 Jang J.S.R., Sun, C.T., and Mizutani, E., “Neural-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence,” Prentice Hall PTR, Upper Saddle River, New Jersey, 1997.
    【4】 Witt, S. M., and Young, S. J., “Phoneme-level Pronunciation Scoring and Assessment for Interactive Language Learning”, Speech Communication 30, 95-108, 2000.
    【5】 Jie Jiang and Bo Xu, “Exploring the Automatic Mispronunciation Detection of Confusable Phones for Mandarin”, International Conference on Acoustics, Speech and Signal Processing, 2009.
    【6】 Shuang Xu, Jie Jiang, Zhenbiao Chen and Bo Xu. “Automatic Pronunciation Error Detection Based on Linguistic knowledge and Pronunciation Space”, International Conference on Acoustics, Speech and Signal Processing, 2009.
    【7】 Chen, J.C., and Jang, J.S.R., “TRUES: Tong Recognition Using Extended Segment”, ACM Transaction on Asian Language Information Processing, 2008.
    【8】 黃怡寧, “華語捲舌音與非捲舌音辨識之研究”, 國立清華大學, 2008.
    【9】 林宏俊, “華語混淆音與耦合音之自動切分”, 國立清華大學, 2008.
    【10】 Rabiner, L., “On the use of autocorrelation analysis for pitch detection”, IEEE Transactions on Acoustics, Speech, and Signal 41 Processing , Vol. 25, No. 1, 24-33, 1977.
    【11】 Dengfeng Ke and Bo Xu. “Chinese Intonation Assessment Using SEV Feasures”, International Conference on Acoustics, Speech and Signal Processing, 2009.
    【12】 董姵汝, “使用音高資訊來改進日文發音評量”, 國立清華大學, 2010
    【13】 Rabiner, L. and Juang, B.H., “Fundamentals of Speech Recognition”, Prentice Hall PTR, Upper Saddle River, New Jersey, 1993.
    【14】 Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P., The HTK (Hidden Markov Model Toolkit) Book V3.2 Cambridge University Engineering Department, 2002.
    【15】 Huang, X., Acero, A., and Hon, H.W., “Spoken Language Processing, New Jersey”, Prentice Hall, 2001.
    【16】 Ross, M. Shaffer, H. Cohen, A. Freudberg, R., and Manley, H., 1974. ”Average magnitude difference function pitch extractor,” IEEE Transaction on Acoustics, Speech, and Signal Processing, Vol. 22, No. 5, 353-362, 1974.
    【17】 黃士旗, “中文語音聲調辨識的改良與錯誤分析”, 國立清華大學, 2006.
    【18】 TCC-300 Corpus, http://www.aclclp.org.tw/use_mat.php#tcc300edu

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE