簡易檢索 / 詳目顯示

研究生: 陳亮宇
Chen, Liang-Yu
論文名稱: 基於學習排序與類別標準化動態規劃量化法之自動發音評分
Automatic Pronunciation Scoring with Score Combination by Learning to Rank and Class-Normalized DP-based Quantization
指導教授: 張智星
Jyh-Shing Roger Jang
口試委員: 簡仁宗
Chien, Jen-Tzung
張俊盛
Chang, Jason S.
王逸如
Wang, Yih-Ru
曹昱
Tsao, Yu
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2015
畢業學年度: 103
語文別: 中文
論文頁數: 64
中文關鍵詞: 自動發音評分電腦輔助語言學習電腦輔助發音訓練學習排序
外文關鍵詞: automatic pronunciation scoring, computer assisted language learning, computer assisted pronunciation training, learning to rank
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本博士論文主要在描述我們所提出的一個基於學習排序與類別標準化動態規劃量化法的自動發音評分的架構。本研究的目的是要能夠訓練出一個幫助第二語言學習者做自動發音評分的模型,使得所得到的分數能夠越接近人類老師所評的分數越好。在此架構下,每個學習者所唸的句子都會由人類老師給序1~5分的評分,而這個分數將被視為模型訓練的學習目標。本研究所使用的語料是由台灣的英語老師所評分。在一開始每個發音的句子會先使用九種音素層級的評分方式來評分,然後使用四種轉換法來轉換為單字層級的評分。我們一共選擇了十六種效果較佳的單字層級的評分來當作學習排多演算法的輸入特徵值,而演算法的輸出再使用我們所提出的量化法來得到離散的1~5分評分。此處所用的量化法是採取類別標準化的動態規劃量化法,可以大幅減輕資料上不同類別間的數量不平衡所衍生的問題。實驗結果證實,我們所提出的評分架構比起前人所提之方法,確實可以達到與人類評分更高的相關係數,以及更高的錯誤發音偵測的精準度。而最後我們也公開了我們在本研究所使用的評分語料庫。


    This thesis describes an automatic pronunciation scoring framework using learning to rank and class-normalized, dynamic-programming-based quantization. The goal is to train a model that is able to grade the pronunciation of a second language learner, such that the predicted score is as close as possible to the one given by a human teacher. Under this framework, each utterance is given a score of 1 to 5 by human raters, which is treated as a ground truth rank for the training algorithm. The corpus was rated by qualified English teachers in Taiwan (nonnative speakers). Nine phone-level scores are computed and converted into word-level scores through four conversion methods. We select the 16 best performing scores as the input features to train the learning-to-rank function. The output of the function is then quantized to a discrete rank on a 1-5 scale. The quantization is done with class normalization to alleviate the problem of data imbalance over different classes. Experimental results show that the proposed framework achieves a higher correlation to the human scores than other methods, along with higher accuracy in detecting instances of mispronunciation. We also release a new version of our nonnative corpus with human rankings.

    Chapter 1. Introduction 1 Chapter 2. Speech Corpora 7 2.1 Native Corpus 7 2.2 Non-native Corpus 9 2.3 Human Ranks for Pronunciation Assessment 10 Chapter 3. Basic Pronunciation Scoring 13 3.1 HMM-based Log-Likelihood Score 13 3.2 HMM-based Log-Posterior Probability Score 14 3.3 Duration Distribution Score 15 3.4 Segment Classification Score 16 3.5 Likelihood Distribution Score 17 3.6 Posterior and Log-Posterior Distribution Score 17 3.7 Rank Ratio Score 18 Chapter 4. Score Combination using Learning to Rank 19 4.1 Basic Pronunciation Scoring 20 4.2 Conversion from Phone-level to Word-level Scores 20 4.3 Learning to Rank Function 22 4.4 LTR Score Quantization 29 Chapter 5. Experimental Results 34 5.1 Evaluation of Basic Scores using Different Conversion Methods 36 5.2 Evaluations of Class-Normalized Score Quantization 39 5.3 Performance Comparison of Various Score Combination Methods 43 Chapter 6. Conclusions 54 References 57

    [1] A. Neri, C. Cucchiarini, and H. Strik, “Segmental errors in Dutch as a second language: how to establish priorities for CAPT,” in Proc. InSTIL/ICALL Symposium, Venice, Italy, Jun. 2004, pp. 13-16.
    [2] A. Neri, C. Cucchiarini, and H. Strik, “ASR-based corrective feedback on pronunciation: does it really work?” in Proc. INTERSPEECH 2006, Pittsburgh, Pennsylvania, Sep. 2006, pp. 1982-1985.
    [3] H. Wang, C. J. Waple, and T. Kawahara, “Computer assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition,” Speech Communication, vol. 51, pp.995-1005, Oct. 2009.
    [4] O. Ronen, L. Neumeyer, and H. Franco, “Automatic detection of mispronunciation for language instruction,” in Proc. 5th European Conf. on Speech Communication and Technology (Eurospeech ’97), Rhodes, Sep. 1997, pp. 645-648.
    [5] S. M. Witt and S. J. Young, “Phone-level pronunciation scoring and assessment for interactive language learning,” Speech Communication, vol. 30, no. 2-3, pp. 95-108, Feb. 2000.
    [6] A. M. Harrison, W. Lo, X. Qian, and H. Meng, “Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training,” in Proc. 2nd ISCA Workshop on Speech and Language Technology in Education (SLaTE), Warrickshire, UK, Sep. 2009, pp 45-48.
    [7] Y. Tsubota, T. Kawahara, and M. Dantsuji, “Practical use of English pronunciation system for Japanese students in the CALL classroom,” in Proc. INTERSPEECH 2004, Jeju Island, Korea, Oct. 2004, pp. 1689-1692.
    [8] J. Tepperman and S. Narayanan, “Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners,” in Proc. Int. Conf. Acoustic, Speech and Signal Processing, Philadelphia, Pennsylvania, Mar. 2005, pp. 937-940.
    [9] J. P. Arias, N. B. Yoma, and H. Vivanco, “Word stress assessment for computer aided language learning,” in Proc. INTERSPEECH 2009, Brighton, UK, Sep. 2009, pp. 1135-1138.
    [10] M. P. Black, A. Kazemzadeh, J. Tepperman, and S. S. Narayanan, “Automatically Assessing the ABCs: Verification of Children’s Spoken Letter-Names and Letter-Sounds,” ACM Trans. on Speech and Language Processing, vol. 7, no. 4, article 15, Aug. 2011.
    [11] L. Neumeyer, H. Franco, M. Weintraub, and P. Price, “Automatic text-independent pronunciation scoring of foreign language student speech,” in Proc. Int. Conf. Spoken Language Processing, Philadelphia, Pennsylvania, Oct. 1996, pp. 1457-1460.
    [12] L. R. Rabiner, “A tutorial on hidden Markov models and selected application in speech recognition,” in Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
    [13] H. Franco, L. Neumeyer, Y. Kim, and O. Ronen, “Automatic pronunciation scoring for language instruction,” in Proc. Int. Conf. Acoustic, Speech and Signal Processing, Munich, Germany, Apr. 1997, pp. 1471-1474.
    [14] Y. Kim, H. Franco, and L. Neumeyer, “Automatic pronunciation scoring of specific phone segments for language instruction,” in Proc. 5th European Conf. on Speech Communication and Technology (Eurospeech ’97), Rhodes, Greece, Sep. 1997, pp. 649-652.
    [15] L. Y. Chen and J. S. R. Jang, “Automatic pronunciation scoring using learning to rank and DP-based score segmentation,” in Proc. INTERSPEECH 2010, Makuhari, Japan, Sep. 2010, pp. 761-764.
    [16] L. Y. Chen and J. S. R. Jang, “Improvement in automatic pronunciation scoring using additional basic scores and learning to rank,” in Proc. INTERSPEECH 2012, Portland, Oregon, Sep. 2012.
    [17] H. Franco, L. Neumeyer, V. Digalakis, and O. Ronen, “Combination of machine scores for automatic grading of pronunciation quality,” Speech Communication, vol. 30, no. 2-3, pp. 121-130, Feb. 2000.
    [18] T. Cincared, R. Gruhn, C. Hacker, E. Nöth, and S. Nakamura, “Automatic pronunciation scoring of words and sentences independent from the non-native’s first language,” Computer Speech and Language, vol. 23, no. 1, pp. 65-88, Jan. 2009.
    [19] Y. H. Yang and H. H. Chen, “Ranking-based emotion recognition for music organization and retrieval,” IEEE Trans. Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 762-774, May 2011.
    [20] MIR-Stress Dataset, Multimedia Information Retrieval Lab, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, [Online]. Available: http://mirlab.org/dataset/public/.
    [21] E. Charniak, D. Blaheta, N. Ge, K. Hall, J. Hale, and M. Johnson, BLLIP 1987-89 WSJ Corpus Release 1, Linguistic Data Consortium, Philadelphia, 2000, [Online]. Available: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2000T43.
    [22] X. He and Y. Zhao, “Model complexity optimization for nonnative English speakers,” in Proc. 7th European Conf. on Speech Communication and Technology (Eurospeech ’01), Aalborg, Denmark, Sep. 2001, pp. 1461-1464.
    [23] K. Hirabayashi and S. Nakagawa, “Automatic Evaluation of English Pronunciation by Japanese Speakers Using Various Acoustic Features and Pattern Recognition Techniques,” in Proc. INTERSPEECH 2010, Makuhari, Japan, Sep. 2010, pp. 598-601.
    [24] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, TIMIT Acoustic-Phonetic Continuous Speech Corpus, Linguistic Data Consortium, Philadelphia, 1993, [Online]. Available: http://catalog.ldc.upenn.edu/LDC93S1.
    [25] J.-P. Hosom, “Automatic Phoneme Alignment based on Acoustic-Phonetic Modeling,” in Proc. INTERSPEECH 2002, Denver, Colorado, Sep. 2002.
    [26] M. Raab, R. Gruhn, and E. Noeth, “Non-native speech databases,” in Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Kyoto, Japan, Dec. 2007, pp. 413-418.
    [27] J. C. Chen, J. S. R. Jang, and T. L. Tsai, “Automatic Pronunciation Assessment for Mandarin Chinese: Approaches and System Overview,” Int. J. Computational Linguistics and Chinese Language Processing, vol. 12, No. 4, pp. 443-458, Dec. 2007.
    [28] J. C. Chen, J. L. Lo, and J. S. R. Jang, “Computer assisted spoken English learning for Chinese in Taiwan,” in Proc. Int. Symposium on Chinese Spoken Language Processing, Hong Kong, Oct. 2004, pp. 337-340.
    [29] T. Y. Liu, Learning to Rank for Information Retrieval, Foundations and Trends in Information Retrieval, vol. 3, no. 3, Mar. 2009, ch. 1.
    [30] K. Crammer and Y. Singer, “Pranking with ranking,” in Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada, Dec. 2001, pp.641-647.
    [31] T. Joachims, “Training linear SVMs in linear time,” in Proc. 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, Aug. 2006, pp. 217-226.
    [32] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer, “An efficient boosting algorithm for combining preferences,” J. Machine Learning Research, vol. 4, pp. 933-969, Nov. 2003.
    [33] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, “Learning to rank using gradient descent,” in Proc. 22nd Int. Conf. on Machine Learning, Bonn, Germany, Aug. 2005, pp. 89-96.
    [34] Z. Cao, T. Qin, T. Y. Liu, M. F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach,” in Proc. 24th Int. Conf. on Machine Learning, Corvallis, OR, Jun 2007, pp. 129-136.
    [35] K. Zhou, G. R. Xue, H. Zha, and Y. Yu, “Learning to rank with ties,” in Proc. 31st Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Singapore, Jul 2008, pp. 275-282.
    [36] W. W. Cohen, R. E. Schapire, and Y. Singer, “Learning to order things,” J. Artificial Intelligence Research, vol. 10, pp. 243-270, May 1999.
    [37] C. Cortes and V. N. Vapnik. “Support-vector networks,” J. Machine Learning, vol. 20, no. 3, pp. 273-297, Sep 1995.
    [38] Z. Cao, T. Qin, T. Y. Liu, M. F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach,” in Technical Report MSR-TR-2007-40. Microsoft Corporation, Apr 2007.
    [39] J. Hardin and J. Joseph, Generalized Linear Models and Extensions, 2nd ed. Stata Press, Feb 2007, ch15.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE