華語混淆音與耦合音之自動切分｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	林宏俊 Hong-Jyun Lin
論文名稱：	華語混淆音與耦合音之自動切分 Automatic Segmentation of Confusing Syllables and Highly Coarticulated Syllables in Mandarin Chinese
指導教授：	張智星 Jyh-Shing Roger Jang
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2008
畢業學年度：	96
語文別：	中文
論文頁數：	52
中文關鍵詞：	混淆音、耦合音、自動切分
相關次數：	點閱：51 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

Abstract
Currently the automatic phoneme segmentation is done through a technique called “forced alignment”. Its advantage is to efficiently label the boundaries between phonemes from a massive amount of audio files. However, forced alignment might produce inaccurate or erroneous labeling of boundaries. After analyzing these error cases, it is found that the noisy recording environment, inappropriate recording devices or mispronunciation resulted from the unfamiliarity of the recording language often causes the forced alignment algorithm to label the boundaries incorrectly. In addition, the traditional segmentation methods do not perform well under specific conditions where the last phoneme of the previous character in a term is the same as the first phoneme of the next character.
This thesis attempts to solve the above problems by proposing two methods: in the case where the inaccuracy problem is caused by the strong first language accent, Automatic Generation of Pronunciation Confusion Network (AGPCN) is proposed which combines the forced alignment algorithm with the pronunciation confusion network (PCN); in the case where the two adjacent characters connect to each other with the same phoneme, tonal feature is used in conjunction with the forced alignment algorithm. Experiments show that the accuracy increases when applying the two proposed methods.

中文摘要
目前傳統的自動切音是採取強迫對位（Forced alignment）的方式進行切音，優點是能夠大量且快速的標定出音檔內容的邊界值（boundary），但使用強迫對位的切音方法卻常會發生音標邊界值標定錯誤或是不夠準確的情形，對這些情況做進一步的分析後，發現原因通常與錄音者的錄音環境與錄音設備不夠完善、或是錄音者因對欲錄音的語言不夠熟悉，以致於錄音內容的口音不夠正確、另外，由於發音的關係，對於某些特定的特定詞句（如：蘇武、回憶、記憶…等），傳統切音的效果普遍不理想，這些原因都會使切音的效果下降，影響切音的準確度。
本論文便是根據上述問題，提出兩種實驗方法來改進：就錄音者的語音可能帶有明顯的母語口音，導致切音效果不理想的部分，我們便將傳統的強迫對位（Forced alignment）切音，結合發音混淆網路（Pronunciation Confusion Network, PCN）的概念，提出發音混淆網路的自動產生（Automatic Generation of Pronunciation Confusion Network, AGPCN）切音；而就傳統切音對於某些特定的詞句，切音效果普遍不理想的部分，我們則將傳統的強迫對位切音，加入音調的特徵，提出音調特徵（Tonal feature）切音，目標為在經過上述兩種方法實作後，切音效果的準確率能夠獲得提升。

目錄
Abstract    II
中文摘要    III
目錄    IV
表格目錄：    IX
第1章.    緒論    10
1.    研究動機    10
2.    相關研究    10
3.    研究簡介    11
4.    章節概要    12
第2章.    研究方法    13
1.    研究一：發音混淆網路的自動產生切音    13
1.1.    研究對象    13
1.2.    原理說明    13
1.3.    實驗步驟    14
1.4.    整體架構圖    15
1.5.    實驗流程圖    15
2.    研究二：音調特徵切音    16
2.1.    研究對象    16
2.2.    原理說明    16
2.3.    實驗步驟    18
2.4.    實驗流程圖    18
第3章.    實驗結果與討論    20
1.    發音混淆網路的自動產生切音    20
1.1.    語料介紹與模型參數之設定    20
1.2.    切音方法介紹    21
1.3.    實驗1：日本人說中文語料    23
1.3.1.    切音語料介紹    23
1.3.2.    混淆音排名表    24
1.3.3.    混淆音門檻值設定    25
1.3.4.    實驗結果：Log probability    28
1.3.5.    實驗結果：切音準確率    29
1.3.6.    實驗結果分析    29
1.4.    實驗2：越南人說中文語料    35
1.4.1.    切音語料介紹    35
1.4.2.    混淆音排名表    36
1.4.3.    混淆音門檻值設定    37
1.4.4.    錯誤分析    38
1.4.5.    實驗結果：Log probability    41
1.4.6.    實驗結果：切音準確率    41
2.    音調特徵切音    42
2.1.    語料介紹與模型參數之設定    42
2.2.    切音語料介紹    43
2.3.    實驗結果    44
2.3.1.    Tonal syllable    45
2.3.2.    Initial (mono) tonal - final    45
2.3.3.    Initial (bi) tonal – final    46
2.3.4.    Tonal feature (All)    47
2.4.    實驗結果分析    47
第4章.    結論與未來展望    50
參考文獻    52

                                

參考文獻
[1]
Ronen, O., Neumeyer, L., and Franco, H. “Automatic detection of
mispronunciation for language instruction,” in Proc. Eurospeech, 1997, pp.649-652.
[2]
Hide, O., “Interlanguage phonology: implications for a remedial pronunciation
course for Chinese learners of English,” In Antwerp papers in linguistics, 2002,pp. 17-46
[3]
Witt, S. M. and Young, S. J., "Off-line Acoustic Modeling of Non-native
Accents," in Proc. Eurospeech, 1999, pp. 1367-1370.
[4]
Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “CALL system
for Japanese students of English using pronunciatiom error prediction and
formant structure estimation,” in Proc. InSTILL, 2002.
[5]
Shih-Min Tang, “Error pattern analysis for computer assisted English
pronunciation learning,” thesis from NCKU, 2005.
[6]
Wei-Tang Hsu, “Error-Spotting in pronunciation of English vowels based on
speech recognition technologies,” thesis from NTHU, 2005.
[7]
Forney, G.D., Jr.,“The viterbi algorithm”,IEEE ,1973.
[8]
Jyh-Shing Roger Jang, “ Audio Signal Processing and Recognition ”, http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/

[9]
Jyh-Shing Roger Jang, “ Data Clustering and Pattern Recognition ”, http://neural.cs.nthu.edu.tw/jang/books/dcpr/

[10]
TCC-300 Corpus, http://www.aclclp.org.tw/use_mat.php#tcc300edu
[11]
王小川,“語音訊號處理”,全華科技圖書股份有限公司,台北,民國93年.
[12]
The HTK Book (for HTK Version 3.4) , COPYRIGHT 2001-2006 Cambridge University Engineering Department.
[13] Modeling pronunciation variation for ASR A survey of the literature.pdf
[14] http://www.speech.kth.se/wavesurfer/download.html

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文