研究生: |
林宏俊 Hong-Jyun Lin |
---|---|
論文名稱: |
華語混淆音與耦合音之自動切分 Automatic Segmentation of Confusing Syllables and Highly Coarticulated Syllables in Mandarin Chinese |
指導教授: |
張智星
Jyh-Shing Roger Jang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2008 |
畢業學年度: | 96 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 混淆音 、耦合音 、自動切分 |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
Abstract
Currently the automatic phoneme segmentation is done through a technique called “forced alignment”. Its advantage is to efficiently label the boundaries between phonemes from a massive amount of audio files. However, forced alignment might produce inaccurate or erroneous labeling of boundaries. After analyzing these error cases, it is found that the noisy recording environment, inappropriate recording devices or mispronunciation resulted from the unfamiliarity of the recording language often causes the forced alignment algorithm to label the boundaries incorrectly. In addition, the traditional segmentation methods do not perform well under specific conditions where the last phoneme of the previous character in a term is the same as the first phoneme of the next character.
This thesis attempts to solve the above problems by proposing two methods: in the case where the inaccuracy problem is caused by the strong first language accent, Automatic Generation of Pronunciation Confusion Network (AGPCN) is proposed which combines the forced alignment algorithm with the pronunciation confusion network (PCN); in the case where the two adjacent characters connect to each other with the same phoneme, tonal feature is used in conjunction with the forced alignment algorithm. Experiments show that the accuracy increases when applying the two proposed methods.
中文摘要
目前傳統的自動切音是採取強迫對位(Forced alignment)的方式進行切音,優點是能夠大量且快速的標定出音檔內容的邊界值(boundary),但使用強迫對位的切音方法卻常會發生音標邊界值標定錯誤或是不夠準確的情形,對這些情況做進一步的分析後,發現原因通常與錄音者的錄音環境與錄音設備不夠完善、或是錄音者因對欲錄音的語言不夠熟悉,以致於錄音內容的口音不夠正確、另外,由於發音的關係,對於某些特定的特定詞句(如:蘇武、回憶、記憶…等),傳統切音的效果普遍不理想,這些原因都會使切音的效果下降,影響切音的準確度。
本論文便是根據上述問題,提出兩種實驗方法來改進:就錄音者的語音可能帶有明顯的母語口音,導致切音效果不理想的部分,我們便將傳統的強迫對位(Forced alignment)切音,結合發音混淆網路(Pronunciation Confusion Network, PCN)的概念,提出發音混淆網路的自動產生(Automatic Generation of Pronunciation Confusion Network, AGPCN)切音;而就傳統切音對於某些特定的詞句,切音效果普遍不理想的部分,我們則將傳統的強迫對位切音,加入音調的特徵,提出音調特徵(Tonal feature)切音,目標為在經過上述兩種方法實作後,切音效果的準確率能夠獲得提升。
參考文獻
[1]
Ronen, O., Neumeyer, L., and Franco, H. “Automatic detection of
mispronunciation for language instruction,” in Proc. Eurospeech, 1997, pp.649-652.
[2]
Hide, O., “Interlanguage phonology: implications for a remedial pronunciation
course for Chinese learners of English,” In Antwerp papers in linguistics, 2002,pp. 17-46
[3]
Witt, S. M. and Young, S. J., "Off-line Acoustic Modeling of Non-native
Accents," in Proc. Eurospeech, 1999, pp. 1367-1370.
[4]
Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “CALL system
for Japanese students of English using pronunciatiom error prediction and
formant structure estimation,” in Proc. InSTILL, 2002.
[5]
Shih-Min Tang, “Error pattern analysis for computer assisted English
pronunciation learning,” thesis from NCKU, 2005.
[6]
Wei-Tang Hsu, “Error-Spotting in pronunciation of English vowels based on
speech recognition technologies,” thesis from NTHU, 2005.
[7]
Forney, G.D., Jr.,“The viterbi algorithm”,IEEE ,1973.
[8]
Jyh-Shing Roger Jang, “ Audio Signal Processing and Recognition ”, http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/
[9]
Jyh-Shing Roger Jang, “ Data Clustering and Pattern Recognition ”, http://neural.cs.nthu.edu.tw/jang/books/dcpr/
[10]
TCC-300 Corpus, http://www.aclclp.org.tw/use_mat.php#tcc300edu
[11]
王小川,“語音訊號處理”,全華科技圖書股份有限公司,台北,民國93年.
[12]
The HTK Book (for HTK Version 3.4) , COPYRIGHT 2001-2006 Cambridge University Engineering Department.
[13] Modeling pronunciation variation for ASR A survey of the literature.pdf
[14] http://www.speech.kth.se/wavesurfer/download.html