研究生: |
卓楷斌 |
---|---|
論文名稱: |
適用於華英雙語語音辨識之聲學單位合併方法 Merging Acoustic Models for Improving Mandarin-English Bilingual Speech Recognition |
指導教授: | 張智星 |
口試委員: |
呂仁園
江永進 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 80 |
中文關鍵詞: | 華英雙語辨識系統 、聲學模型合併 、華英雙語問題集 |
外文關鍵詞: | Mandarin-English bilingual recognition system, mergence of bilingual acoustic models, Mandarin-English bilingual question sets |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文的應用情境是在儲存空間有限的汽車裝置上建置一個給台灣人使用的華英雙語語音辨識系統,預期在不使用語言辨識的情況下,有效縮減模型空間大小,並和單語辨識系統有相當的辨識效能。
本研究透過合併雙語間相似的聲學單位來縮減模型大小,透過華英雙語不同標音的整合方式,尋找華英雙語之間適合合併的聲學單位,並且實作各種聲學單位的合併方法來建構雙語系統。
除了進行聲學單位的合併之外,本論文以決策樹的方式進行狀態單位之合併,實驗結果顯示以決策樹建構的分類、合併原則,能以更細微的角度合併華英雙語之間相似的狀態,不僅能有效縮減模型空間,亦能增加模型的強健程度。以決策樹進行模型合併的實驗可以將模型大小縮減成原來的三分之一,並且擁有比基礎模型高出1.2%的雙語整體辨識效能。
The long-term goal of this research is to construct a Mandarin-English bilingual speech recognition system on devices mounted on automobiles with limited storage size. Thus, the purpose of this thesis is to effectively reduce the model size and to maintain considerable performance as a unilingual system without using language identification.
In this thesis, similar acoustic models are merged to reduce the number of model parameters. Similar acoustic units between the two languages are found by analyzing different phonetic notations with either knowledge-driven or data-driven techniques.
In addition to directly merging the two acoustic models, this thesis also proposes the use of decision trees to merge states of different HMMs (hidden Markov models). Experimental result shows that, merging the models in a finer level via decision trees not only effectively reduces the model size but also enhances robustness of the bilingual models. By comparing to the baseline models, the state mergence using decision trees can reduce model size to one third of the original one and achieve an improvement of 1.2% in correction rate of bilingual recognition.
【1】 Lawrence Rabiner, B.H Juang, Fundamentals of speech recognition, Prentice Hall, 1993
【2】 Steve Young, The HTK Book version 3, Microsoft Corporation, 2000
【3】 Bin Ma and Qiang Huo, “Benchmark results of triphone-based acoustic modeling on HKU96 and HKU99 putonghua corpora,” International Symposium on Chinese Spoken Language Processing, (ISCSLP), 2000
【4】 Dau-Cheng Lyu, Speaker Independent Acoustic Modeling for Large Vocabulary Bi-lingual Mandarin/Taiwanese Continuous Speech Recognition, CGU, 2001
【5】 Shengmin Yu, Shuwu Zhang, Bo Xu, ”Chinese-English bilingual phone modeling for cross-language speech recognition”, ICASSP, 2004
【6】 Miao-Ru Wu, Initial Study on Chinese/English Bilingual Speech Recognition based on Lecture Recording, NTU, 2007
【7】 Ya-Chi Chuang, A Study on L1-assisted Personalized Recognition Networks for Pronunciation Error-Spotting in English Learning, NTHU, 2007
【8】 Ting-Wei Xu, An Initial Study on English Continuous Speech Recognition, NTNU, 2007
【9】 Cai-Lu Cai, A Study on Mixed Hakka-Mandarin Chinese Bilingual Speech Recognition, NCTU, 2010
【10】 Ching-Feng Yeh, Bilingual Code-Mixed Acoustic Modeling by Unit Mapping and Model Recovery, NTU, 2011
【11】 The Institute for Signal and Information Processing, Phonetic Questions, http://www.isip.piconepress.com/projects/speech/software/tutorials/conferences/srstw01/program/session_07/model_design/html/isip_questions.html
【12】 葛本儀,“語言學概論”,五南圖書出版股份有限公司,2002年出版
【13】 呂道誠, 呂仁園, 江永進, 許鈞南,"多語聲學單位分類之最佳化研究",中文計算語言學期刊,2007年
【14】 臺灣師範大學國音教材編輯委員會,國音學(修訂第八版),2008年