研究生: |
董姵汝 Tung, Pei-Ju |
---|---|
論文名稱: |
使用音高資訊來改進日文發音評量 Improving Japanese Pronunciation Assessment by Utilizing Pitch Information |
指導教授: |
張智星
Jang, Jyh-Shing Roger |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 41 |
中文關鍵詞: | 發音評量 、音高資訊 、電腦輔助發音訓練 、電腦輔助語言學習 |
外文關鍵詞: | pronunciation assessment, pitch information, CAPT, CALL |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文主旨是以加入音高資訊來改進日文發音評量,並使用評量相關的量測方法測試改良後的效能。
我們首先加入梅爾倒頻譜係數 (Mel-frequency cepstral coefficients,MFCCs) 和對數能量 (log energy) 特徵,並且利用系統化調整標音的步驟,以更貼近真實發音的標音訓練出基礎語音模型;接著除了 MFCCs 和對數能量,我們再加入音高特徵,用以改良基礎模型,其中音高擷取我們使用 ACF (autocorrelation function) 及 UPDUDP (unbroken pitch determination using dynamic programming) 兩種音高追蹤方法,分別擷取出非連續音高 (broken pitch) 及連續音高 (unbroken pitch)。
為測試改良後模型應用在發音評量的效能,我們使用兩種評量相關的測試方法,分別是以排名為基礎的信心度量測和發音錯誤偵測。經實驗,改良後模型的整體評量效能優於基礎語音模型,但其中並非所有音素皆適用加入音高特徵,因此我們再實驗選擇性的載入包含音高特徵的模型或是基礎模型,結果顯示,相較於非選擇性載入模型亦有微幅的評量效能提升。
The aim of this work is to improve Japanese pronunciation assessment by utilizing pitch information, and the performance of the proposed method is evaluated against several performance measures.
Firstly the baseline models are constructed by using MFCCs (Mel-frequency cepstral coefficients) as well as the log energy. The transcriptions are adjusted systematically due to the unique property of Japanese pronunciation. Then we train the improved acoustic models, called pitch-added models, with MFCCs, log energy and pitch. ACF (autocorrelation function) and UPDUDP (unbroken pitch determination using dynamic programming) are adopted as the pitch extraction method to generate a broken pitch contour and an unbroken pitch contour respectively.
The performance of the proposed method is evaluated by using ranking-based confidence measure and pronunciation error detection. Experimental results show that the proposed method outperforms the baseline. However, unvoiced phonemes are considered to have no pitch values. It is therefore we try to load the models selectively between the pitch-added models and the original ones, and the experimental results show a slight improvement of the selective approach than the non-selective approach.
【1】 艾爾科技 MyCT、MyET 自動語音分析系統 (Automatic Speech Analysis System)
http://www.myet.com/MyETWeb/PersonalizedPage.asp
【2】 日文音節的單位:莫拉
http://sp.cis.iwate-u.ac.jp/sp/lessonj/doc/mora.
【3】 Japanese Word Accent http://sp.cis.iwate-u.ac.jp/sp/lessonj/doc/accent.html
【4】 KIM, Y., FRANCO, H., AND NEUMEYER, L., “Automatic Pronunciation Scoring of Specific phoneme Segments for Language Instruction”, in Proceedings of the 4th European Conferaence on Speech Communication and Technology, pp. 649-652, Rhodes, 1997.
【5】 JANG, J.S.R., CHEN, J.C., AND TSAI, T.L., ”Automatic Pronunciation Assessment for Mandarin Chinese : Approach and System Overview”, Computational Linguistics and Chinese Language Processing, 2007.
【6】 JANG J.S.R., SUN, C.T., AND MIZUTANI, E., “Neural-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence,” Prentice Hall PTR, Upper Saddle River, New Jersey, 1997.
【7】 WITT, S. M., AND YOUNG, S. J., “Phoneme-level Pronunciation Scoring and Assessment for Interactive Language Learning”, Speech Communication 30, 95-108, 2000.
【8】 CHEN, L. Y., AND JANG, J.S.R., “Automatic Pronunciation Scoring using Learning to Rank and DP-based Score Segmentation”, International Speech Communication Association, 2010.
【9】 RABINER, L. AND JUANG, B.H., “Fundamentals of Speech Recognition”, Prentice Hall PTR, Upper Saddle River, New Jersey, 1993.
【10】 HUANG, X., ACERO, A., AND HON, H.W., “Spoken Language Processing, New Jersey”, Prentice Hall, 2001.
【11】 HIROSE, K., “Accent Type Recognition of Japanese Using Perceived Mora Pitch Values and Its Use Pronunciation Training System”, Graduate School of Frontier Sciences, University of Tokyo, Japan, 2004.
【12】 CUTLER, A., OTAKE, T., “Pitch Accent in Spoken –Word recognition in Japanese”, Acoustical Society of America, 1999.
【13】 RABINER, L., “On the use of autocorrelation analysis for pitch detection”, IEEE Transactions on Acoustics, Speech, and Signal Processing , Vol. 25, No. 1, 24-33, 1977
【14】 CHEN, J.C., AND JANG, J.S.R., “TRUES: Tong Recognition Using Extended Segment”, ACM Transaction on Asian Language Information Processing, 2008.
【15】 SEIDE, F. AND WANG, N.J.C., “Two-stream modeling of Mandarin tones”, in Proc. of the International Conference on Spoken Language Processing.867-870, 2000
【16】 YOUNG, S., EVERMANN, G., KERSHAW, D., MOORE, G., ODELL, J., OLLASON, D., VALTCHEV, V., and WOODLAND, P., The HTK (Hidden Markov Model Toolkit) Book V3.2 Cambridge University Engineering Department, 2002.
http://htk.eng.cam.ac.uk
【17】 ROSS, M. SHAFFER, H. COHEN, A. FREUDBERG, R. MANLEY, H., 1974. ”Average magnitude difference function pitch extractor,” IEEE Transaction on Acoustics, Speech, and Signal Processing, Vol. 22, No. 5, 353-362, 1974