研究生: |
黃羿銘 Huang, Yi-Ming |
---|---|
論文名稱: |
結合重音偵測與語調評分於口說英語評分系統 Combining Stress Detection and Intonation Assessment for Spoken English Scoring |
指導教授: |
張智星
Jang, Jyh-Shing Roger |
口試委員: |
王新民
Wang, Hsin-Min 蔡偉和 Tsai, Wei-Ho |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 69 |
中文關鍵詞: | 重音偵測 、語調評分 、高斯混合模型 、支撐向量機 |
外文關鍵詞: | stress detection, intonation assessment, Gaussian mixture model, support vector machine |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本篇論文分成兩部分:英語重音偵測與語調評分。英語重音偵測是針對多音節英語詞彙,找出最為明顯之音節;在語調評分系統中,則是將學生音檔和老師音檔進行比對,根據兩者的音高曲線趨勢相似程度評分。
在重音偵測的部分,本論文比較了四種方法,包含三種前人的方法與本論文提出之「一步驟重音偵測」方法。其中本論文的方法使用各音節的母音區段音高、能量相關特徵與持續時間,為各個音節的字訓練分類器,比前人的方法提升了約2%的辨識率。另外本論文也比較了使用音高相關特徵、能量相關特徵、持續時間這三類特徵的所有組合進行重音偵測對於辨識率之影響,發現全部使用可得到最佳的辨識率。
本論文設計之語調評分機制包含單字語調相似程度評分與整句語調相似程度評分。為了建立評分機制,本論文錄製了和老師音檔較為相似與較不相似的句子,分成兩類進行辨識。本論文在辨識部分比較了三種方法的辨識率,分別是一種前人方法、使用高斯混合模型分類器與使用支撐向量機進行分類;三種方法之中,使用支撐向量機進行分類辨識率最佳,比前人方法提升約14%。此外,本論文比較了使用相關係數、方均根差、排序誤差向量三種特徵的所有組合進行辨識的辨識率,使用相關係數可得到最佳辨識率,加入其他特徵則使辨識率下降。針對音高追蹤方法與音高曲線來源,本論文比較了使用UPDUDP、使用Praat軟體的音高追蹤功能來追蹤音高對於辨識率的影響,並且比較只取母音音素、取所有有音高的音素組成音高曲線對於辨識率的影響。本論文發現使用Praat軟體追蹤音高、取所有有音高的音素組成音高曲線可得到最佳的辨識率。
為了評估評分系統效能,本論文錄製了另外一份較小的語料,並請八位評分者進行人工評分。人工評分評分者間的相關係數是0.80,人工評分與系統評分的相關係數達到0.65,顯示兩者間有相當高的一致性。
This thesis describes our research on stress detection and intonation assessment. Stress detection is to find the most prominent syllable in an utterance of a multi-syllabic English word. In an intonation assessment system, an utterance pronounced by the student is compared with the one pronounced by the teacher, and a score is produced based on similarity of pitch contours.
Four methods are compared in stress detection, including three methods proposed by past research and a single-stage stress detection method proposed in this thesis. Our method trains classifiers for each n-syllabic words (n=2-5) using pitch-related features, energy-related features and duration extracted from the vowel segment of each syllable. The proposed method yields the best recognition rate among the four methods, outperforming the previous method by about 2%. On the other hand, we also compare the detection performance by using different combinations of features, including pitch-related features, energy-related features, and duration. We found that the best evaluation result is produced when all features are combined.
The intonation assessment mechanism proposed in this thesis includes two parts: assessment for each word and for the whole sentence. In order to build the mechanism, a corpus containing two types of utterances was recorded, namely, utterances whose intonation patterns are similar to the teachers’ and utterances whose intonation patterns that are not so similar. This makes the classification task a two-class classification problem. Three methods are compared in classifying similar and dissimilar utterances, including one method proposed by past research, one method using Gaussian mixture model, and one method using a support vector machine. Among the three methods, the one using a support vector machine gives the best recognition rate, outperforming the previous method by about 14%. We also evaluated the performance by using different combinations of features, including correlation coefficient, root-mean-square error, and sorted error vector. The best evaluation result is obtained when only correlation coefficient is used, and adding other features reduce recognition rates. Lastly, we evaluate the performance of the system by using different pitch-tracking methods, UPDUDP and the pitch-tracking function provided by Praat, and by extracting pitch contours from different sound segments: vowel phones only and from all pitched phones. The best evaluation result is obtained when the pitch-tracking function provided by Praat is used to extract pitch contours from all pitched phones.
In order to evaluate the performance of the intonation assessment system, a smaller corpus was recorded and rated by eight human-raters. The inter-rater correlation coefficient is 0.80, and the correlation coefficient of the system scores and human scores is 0.65, showing satisfactory performance of the system.
[1]. Wells, J.C., 2006. English Intonation. Cambridge University Press, Cambridge.
[2]. D. Wang and S. Narayanan, "An acoustic measure for word prominence in spontaneous speech", IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp. 690-701, 2007.
[3]. F. Tamburini and C. Caini, “An automatic system for detecting prosodic prominence in American English continuous speech,” Int. J. Speech Technol., vol. 8, pp. 33–44, 2005.
[4]. J. Tepperman and S. Narayanan, "Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners", in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, Philadelphia, March 2005.
[5]. Q. Shi, et al., "Spoken English Assessment System for Non-Native Speakers Using Acoustic and Prosodic Features", Interspeech, 2010.
[6]. 曾璟鈺,”口說英語重音辨識之初步研究”,清華大學碩士論文,民國97年
[7]. Dengfeng Ke and Bo Xu, "Chinese intonation assessment using SEV features," in IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4853-4856, 2009.
[8]. J. P. Arias, et al., "Automatic intonation assessment for computer aided language learning," Speech Communication, vol. 52, pp. 254-267, 2010.
[9]. Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing(TALIP), vol. 7, pp. 1-23, 2008.
[10]. Boersma, Paul & Weenink, David (2011). Praat: doing phonetics by computer [Computer program]. Version 5.2.24, retrieved 10 May 2011 from http://www.praat.org/
[11]. 李俊毅,”語音評分”,清華大學碩士論文,民國90年
[12]. Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[13]. Jyh-Shing Roger Jang, "DCPR (Data Clustering and Pattern Recognition) Toolbox", available from the link at the author's homepage at "http://mirlab.org/jang".