簡易檢索 / 詳目顯示

研究生: 曾泓熹
Tseng, Hong-Hsi
論文名稱: 以句尾母音模型與鼻濁音發音變異來改善日語語音模型
Improving Japanese Acoustic Models by Sentence-end Vowel Models and Bidakuon Allophones
指導教授: 張智星
Jang, Jyh-Shing Roger
口試委員: 呂仁園
江永進
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2011
畢業學年度: 99
語文別: 中文
論文頁數: 43
中文關鍵詞: 語音辨識發音評量電腦輔助發音訓練鼻濁音日語隱藏馬可夫模型
外文關鍵詞: auto speech recognition, scoring, Computer-Assisted Language Learning, bidakuon, japanese, HMM
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於句尾母音無聲化與鼻濁音是日語語音辨識中經常遇到的問題,本論文將使用加入句尾母音專用模型的方式克服句尾母音無聲化的情況,並對鼻濁音使用一自動修正發音變異機制修正標音,最後將使用評量相關的量測方式測試改良後與未改良之差距。
    我們將以梅爾倒頻譜係數 (mel-frequency cepstral coefficients,MFCCs)和對數能量 (log energy)取得訓練語料特徵,並且將訓練語料每句句尾加入母音專用模型的方式訓練,以改善日語句尾效能;而在鼻濁音發音變異上,也將以自動修正發音變異機制進行疊代式修正,以排名評分設定一門檻值做為修正標音之依據,有系統地將訓練語料逐步修正為最貼近真實發音的標音。
    為測試模型於發音評量時的表現,我們使用三種評量相關的測試方法,分別是以排名為基礎的信心度量測、自由拍解碼與整句辨識。經實驗,將鼻濁音標音校正與加入句尾母音專用模型的兩方法一起使用訓練之模型優於基礎模型。


    Sentence-end vowel devoicing and bidakuon allophones are common problems in Japanese speech recognition. This thesis proposes the use of specialized models for sentence-end vowel phones to overcome the devoicing problem and an automatic transcription correction framework for bidakuon allophones.

    In this study, Mel-frequency cepstral coefficients (MFCC) and log energy are used as features for training speech recognition models. Sentence-end vowel models are adopted for each sentence during the training phase in order to improve the recognition performance at the end of the sentence. On the other hand, we use an automatic transcription correction framework to resolve the bidakuon allophone problem by an iterative correction method. The iterative correction method is based on thresholds trained from the ranking scores. The transcription is corrected gradually towards the actual pronunciation recorded in the training data.

    We use three types of performance measure to evaluate the effectiveness of the proposed methods. They are confidence measure based on phone model ranking, free-mola decoding, and sentence recognition. The experimental results show that using both of the proposed methods can effectively enhance the recognition performance of the baseline system.

    表目次 7 圖目次 8 導論 9 簡介 9 研究目的 9 名詞解釋 10 日語的發音單位:拍 (mola) 10 母音無聲化 (devoicing) 10 濁音與鼻濁音 12 相關研究 14 根基於自動語音辨識的電腦輔助發音訓練 14 發音評分 14 發音偵錯 14 語音特徵與模型 14 發音變異與自動語音辨識 15 訓練語料的標示調整 15 論文方法 16 訓練語料簡介 16 基礎聲學模型 17 多重語音特徵簡介 17 對數能量 19 加入句尾母音專用模型 19 句尾母音專用模型方法 19 自動修正發音變異機制 20 排名評分 (rank ratio score) 21 自動修正發音變異機制流程 23 實驗方法與結果分析 26 測試語料簡介 26 效能評估方法 26 以排名為基礎的信心度量測 (ranking-based confidence measure) 26 自由拍解碼 (free mola decoding) 27 整句辨識 (sentence recognition) 28 實驗設定與流程 28 實驗:模型一、二、三與模型四聲學模型之比較 30 實驗目的 30 實驗結果與分析 30 探討與分析 32 結論與未來展望 35 結論 35 未來研究方向 35 參考文獻 37

    教育部中教司網站 http://www.edu.tw/high-school/index.aspx, 2010 06/15。 
    辻亮、 桐山伸也、北澤茂良, “音響的特徴の分析に基づく 美しい日本語発話の習得支援” in The 19th Annual Conference of the Japanese Society for Artificial Intelligence, 2005。 
    Tomothy J. Hazen, “Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings” in INTERSPEECH 2006 – ICSLP, 2006。 
    郭獻尹, ”台湾人日本語学習者に対する日本語音声教育の一考察-母音無声化の習得・指導について-” , 南榮技術學院暨日本熊本大學學術交流二週年紀念英日語教育文化國際學術研討會, 2008。 
    郡史郎, “東京っぽい発音と大阪っぽい発音の音声的特徴-東京•大阪方言とも頭高アクセントの語だけから成る文を素材として-”, 2004。 
    三松国宏、福盛貴弘、菅井康祐、宇都木昭、島田武, “日本語の母音の無声化について — 東京方言の 2 音節連続無声化の音響分析 —”, 1999。 
    黃華章, “華人的日語語言學”, 2004。 
    NHK放送文化研究所・日本放送協会放送文化研究所, “NHKことばのハンドブック”, 2005。 
    Kim, Y., Franco, H., and Neumeyer, L., ”Automatic Pronunciation Scoring of Specific phoneme Segments for Language Instruction”, in Proceedings of the 4th European Conference on Speech Communication and Technology, pp. 649-652, Rhodes, 1997。 
    Witt, S. M. and Young, S. J., “Phoneme-level Pronunciation Scoring using Learning to Rank and DP-based Score Segmentation”, International Speech Communication Association, 2010。 
    Rabiner, L. and Juang, B.H., ”Fundamentals of Speech Recognition”, Prentile Hall PTR, Upper Saddle River, New Jersey, 1993。 
    Huang, X., Acero, A., and Hon, H.W., “Spoken Language Processing, New Jersey”, Prentice Hall, 2001。 
    A. Kipp, M.-B. Wesenick, F.Schiel, “Pronunciation Modeling Applied to Automatic Segmentation of Spontaneous Speech” in Proc of Eurospeech, 1997。 
    S. Stefan, “Generating Non-Native Pronunciation Lexicons by Phonological Rules” in Proc. Of International Conference of Phonetic Sciences (ICPhS), 2007。 
    S. Downey and R. Wiseman, “Dynamic and Static Improvements to Lexical Basefoms, ESCA Workshop on Modeling Pronunciation Variation, 1998。 
    A. Kipp, M.-B. Wesenick, and F. Schiel, “Automatic Detection and Segmentation of Pronunciation Variants in German Speech Corpora”in Proc. of the International Conference on Spoken Language Processing (ICSLP), 1996。 
    G. Bouselmi, D. Fohr, and I. Illina, “Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition” in Proc. of Interspeech, 2007。 
    Y-R. Oh, J.-S. Yoon, and H.-K kim, “Acoustic Model Adaptation based on Pronunciation Variability Analysis for Non-Native Speech Recognition” in Proc. of ICASSP, 2006。 
    G. Bouselmi, D. Fohr, I. Illina, “Multi-Accent and Accent-Independent Non-Native Speech Recognition” in Proc. of Interspeech, 2008。 
    N. Cremelie, J.-p. Martens, “Automatic Rule-Based Generation of Word Pronunciation Networks”in Proc. of Eurospeech, 1997。 
    J. Yang, P. Wu, D.Xu, “Mandarin Speech Recognition for Nonnative Speakers Based on Pronunciation Dictionary Adaption”in Proc. of International Symposium on Chinese Spoken Language Processing (ISCSLP), 2008。 
    蔡佩姍、沈涵平、吳宗憲, “發音事件驗證於多語辨識發音變異模型之產生”, 2010。 
    林宏俊, “華語混淆音與耦合音之自動切分”, 2008。 
    董姵汝, “使用音高資訊來改進日文發音評量”, 2010。 
    Rabiner, L. and Juang, B.H., “Fundamentals of Speech Recognition”, Prentice Hall PTR, Upper Saddle River, New Jersey, 1993。 
    Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P., The HTK (Hidden Markov Model Toolkit) Book V3.2 Cambridge University Engineering Department, 2002。 
    Huang, X., Acero, A., and Hon, H.W., “Spoken Language Proccessing New Jersey”, Prentice Hall, 2001。 
    中島梓, 日本語発音に関する習熟度自動判定, 2004。
    J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice Hall, 1996。
    Steve Young, The HTK Book version 3, Microsoft Corporation, 2000。

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE