研究生: |
李宛穎 Lee, Wan-Ying |
---|---|
論文名稱: |
使用音高資訊以改進華語發音評量 Improving Mandarin Chinese Pronunciation Assessment by Utilizing Pitch Information |
指導教授: |
張智星
Jang, Jyh-Shing |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2010 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 37 |
中文關鍵詞: | 華語語音辨識 、切音 、聲學模型 、強迫對位 、音高資訊 |
外文關鍵詞: | Mandarin Chinese automatic speech assessment, forced alignment, co-articulation, acoustic model, phone segmentation |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主旨在於改進華語切音不夠準確之問題。良好的語音模型為自動語音評量的基礎,傳統電腦自動語音評量的步驟為:將錄製的語料利用訓練好的語音模型進行切音,一般利用強迫對位(Forced Alignment)方式,再將切好音的詞句與正確答案進行比對;然而,以往強迫對位方式對於連音,常常會發生切不準的情況,這裡連音定義為字與字之間無短暫停(Short Pause)之連續韻母,例如:蘇武(ㄙㄨ ㄨˇ)、一意(一ˊ一ˋ)、無謂(ㄨˊㄨㄟˋ)等;這些詞句都會影響切音的準確性,進而影響整體評量效果。
因此我們提出利用華語聲調特性,將音高特徵加入訓練,預期能增進連音的切音準確率,並使用三種評估方式評量改良後的模型,分別是整句辨識率、模型排名比率、切音準確率評估,其中切音準確率分成一階段式與二階段式兩種作法;前兩種評估方式為模型可靠度評估,而切音準確率評估為論文重點。結果顯示雖然加入音高特徵使連音在模型排名比率中排名稍微滑落,仍然能幫助提昇切音準確率。
This study aims to improve the accuracy of forced alignment for Mandarin Chinese. The performance of automatic speech assessment relies on the quality of acoustic models. The first step of traditional automatic speech assessment is to perform model-based forced alignment on input recording and then compare with the ground truth of acoustic model. However, forced alignment is not accurate enough for co-articulations. Here, we focus on those co-articulations without short pauses between two syllables. For example, 蘇武(“sū wǔ”), 一意(“yi yi”), 無謂(“wu wei”), and so on; the syllable boundaries between the two co-articulated syllables are heavily misaligned and hence impact the quality of the assessment.
We therefore propose a new approach using the characteristic of tones in Mandarin Chinese. Additional pitch features are considered to improve the accuracy of forced alignment. Three metrics are evaluated: sentence recognition rate, model ranking ratio, one-pass and two-pass alignment. The first and the second metrics are focus on model reliability. And the third metric emphasize the accuracy of alignment. The results show that the accuracy of alignment is improved while model ranking ratio is slightly down.
【1】 Kim, Y., Franco, H., and Neumeyer, L.,“Automatic Pronunciation Scoring of Specific phoneme Segments for Language Instruction”, in Proceedings of the 4th European Conferaence on Speech Communication and Technology, pp. 649-652, Rhodes, 1997.
【2】 Jang, J.S.R., Chen, J.C., and Tsai, T.L.,“Automatic Pronunciation Assessment for Mandarin Chinese : Approach and System Overview”, Computational Linguistics and Chinese Language Processing, 2007.
【3】 Jang J.S.R., Sun, C.T., and Mizutani, E., “Neural-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence,” Prentice Hall PTR, Upper Saddle River, New Jersey, 1997.
【4】 Witt, S. M., and Young, S. J., “Phoneme-level Pronunciation Scoring and Assessment for Interactive Language Learning”, Speech Communication 30, 95-108, 2000.
【5】 Jie Jiang and Bo Xu, “Exploring the Automatic Mispronunciation Detection of Confusable Phones for Mandarin”, International Conference on Acoustics, Speech and Signal Processing, 2009.
【6】 Shuang Xu, Jie Jiang, Zhenbiao Chen and Bo Xu. “Automatic Pronunciation Error Detection Based on Linguistic knowledge and Pronunciation Space”, International Conference on Acoustics, Speech and Signal Processing, 2009.
【7】 Chen, J.C., and Jang, J.S.R., “TRUES: Tong Recognition Using Extended Segment”, ACM Transaction on Asian Language Information Processing, 2008.
【8】 黃怡寧, “華語捲舌音與非捲舌音辨識之研究”, 國立清華大學, 2008.
【9】 林宏俊, “華語混淆音與耦合音之自動切分”, 國立清華大學, 2008.
【10】 Rabiner, L., “On the use of autocorrelation analysis for pitch detection”, IEEE Transactions on Acoustics, Speech, and Signal 41 Processing , Vol. 25, No. 1, 24-33, 1977.
【11】 Dengfeng Ke and Bo Xu. “Chinese Intonation Assessment Using SEV Feasures”, International Conference on Acoustics, Speech and Signal Processing, 2009.
【12】 董姵汝, “使用音高資訊來改進日文發音評量”, 國立清華大學, 2010
【13】 Rabiner, L. and Juang, B.H., “Fundamentals of Speech Recognition”, Prentice Hall PTR, Upper Saddle River, New Jersey, 1993.
【14】 Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P., The HTK (Hidden Markov Model Toolkit) Book V3.2 Cambridge University Engineering Department, 2002.
【15】 Huang, X., Acero, A., and Hon, H.W., “Spoken Language Processing, New Jersey”, Prentice Hall, 2001.
【16】 Ross, M. Shaffer, H. Cohen, A. Freudberg, R., and Manley, H., 1974. ”Average magnitude difference function pitch extractor,” IEEE Transaction on Acoustics, Speech, and Signal Processing, Vol. 22, No. 5, 353-362, 1974.
【17】 黃士旗, “中文語音聲調辨識的改良與錯誤分析”, 國立清華大學, 2006.
【18】 TCC-300 Corpus, http://www.aclclp.org.tw/use_mat.php#tcc300edu