簡易檢索 / 詳目顯示

研究生: 林宏俊
Hong-Jyun Lin
論文名稱: 華語混淆音與耦合音之自動切分
Automatic Segmentation of Confusing Syllables and Highly Coarticulated Syllables in Mandarin Chinese
指導教授: 張智星
Jyh-Shing Roger Jang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2008
畢業學年度: 96
語文別: 中文
論文頁數: 52
中文關鍵詞: 混淆音耦合音自動切分
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Abstract
    Currently the automatic phoneme segmentation is done through a technique called “forced alignment”. Its advantage is to efficiently label the boundaries between phonemes from a massive amount of audio files. However, forced alignment might produce inaccurate or erroneous labeling of boundaries. After analyzing these error cases, it is found that the noisy recording environment, inappropriate recording devices or mispronunciation resulted from the unfamiliarity of the recording language often causes the forced alignment algorithm to label the boundaries incorrectly. In addition, the traditional segmentation methods do not perform well under specific conditions where the last phoneme of the previous character in a term is the same as the first phoneme of the next character.
    This thesis attempts to solve the above problems by proposing two methods: in the case where the inaccuracy problem is caused by the strong first language accent, Automatic Generation of Pronunciation Confusion Network (AGPCN) is proposed which combines the forced alignment algorithm with the pronunciation confusion network (PCN); in the case where the two adjacent characters connect to each other with the same phoneme, tonal feature is used in conjunction with the forced alignment algorithm. Experiments show that the accuracy increases when applying the two proposed methods.

    中文摘要
    目前傳統的自動切音是採取強迫對位(Forced alignment)的方式進行切音,優點是能夠大量且快速的標定出音檔內容的邊界值(boundary),但使用強迫對位的切音方法卻常會發生音標邊界值標定錯誤或是不夠準確的情形,對這些情況做進一步的分析後,發現原因通常與錄音者的錄音環境與錄音設備不夠完善、或是錄音者因對欲錄音的語言不夠熟悉,以致於錄音內容的口音不夠正確、另外,由於發音的關係,對於某些特定的特定詞句(如:蘇武、回憶、記憶…等),傳統切音的效果普遍不理想,這些原因都會使切音的效果下降,影響切音的準確度。
    本論文便是根據上述問題,提出兩種實驗方法來改進:就錄音者的語音可能帶有明顯的母語口音,導致切音效果不理想的部分,我們便將傳統的強迫對位(Forced alignment)切音,結合發音混淆網路(Pronunciation Confusion Network, PCN)的概念,提出發音混淆網路的自動產生(Automatic Generation of Pronunciation Confusion Network, AGPCN)切音;而就傳統切音對於某些特定的詞句,切音效果普遍不理想的部分,我們則將傳統的強迫對位切音,加入音調的特徵,提出音調特徵(Tonal feature)切音,目標為在經過上述兩種方法實作後,切音效果的準確率能夠獲得提升。


    目錄 Abstract II 中文摘要 III 目錄 IV 表格目錄: IX 第1章. 緒論 10 1.1. 研究動機 10 1.2. 相關研究 10 1.3. 研究簡介 11 1.4. 章節概要 12 第2章. 研究方法 13 2.1. 研究一:發音混淆網路的自動產生切音 13 2.1.1. 研究對象 13 2.1.2. 原理說明 13 2.1.3. 實驗步驟 14 2.1.4. 整體架構圖 15 2.1.5. 實驗流程圖 15 2.2. 研究二:音調特徵切音 16 2.2.1. 研究對象 16 2.2.2. 原理說明 16 2.2.3. 實驗步驟 18 2.2.4. 實驗流程圖 18 第3章. 實驗結果與討論 20 3.1. 發音混淆網路的自動產生切音 20 3.1.1. 語料介紹與模型參數之設定 20 3.1.2. 切音方法介紹 21 3.1.3. 實驗1:日本人說中文語料 23 3.1.3.1. 切音語料介紹 23 3.1.3.2. 混淆音排名表 24 3.1.3.3. 混淆音門檻值設定 25 3.1.3.4. 實驗結果:Log probability 28 3.1.3.5. 實驗結果:切音準確率 29 3.1.3.6. 實驗結果分析 29 3.1.4. 實驗2:越南人說中文語料 35 3.1.4.1. 切音語料介紹 35 3.1.4.2. 混淆音排名表 36 3.1.4.3. 混淆音門檻值設定 37 3.1.4.4. 錯誤分析 38 3.1.4.5. 實驗結果:Log probability 41 3.1.4.6. 實驗結果:切音準確率 41 3.2. 音調特徵切音 42 3.2.1. 語料介紹與模型參數之設定 42 3.2.2. 切音語料介紹 43 3.2.3. 實驗結果 44 3.2.3.1. Tonal syllable 45 3.2.3.2. Initial (mono) tonal - final 45 3.2.3.3. Initial (bi) tonal – final 46 3.2.3.4. Tonal feature (All) 47 3.2.4. 實驗結果分析 47 第4章. 結論與未來展望 50 參考文獻 52

    參考文獻
    [1]
    Ronen, O., Neumeyer, L., and Franco, H. “Automatic detection of
    mispronunciation for language instruction,” in Proc. Eurospeech, 1997, pp.649-652.
    [2]
    Hide, O., “Interlanguage phonology: implications for a remedial pronunciation
    course for Chinese learners of English,” In Antwerp papers in linguistics, 2002,pp. 17-46
    [3]
    Witt, S. M. and Young, S. J., "Off-line Acoustic Modeling of Non-native
    Accents," in Proc. Eurospeech, 1999, pp. 1367-1370.
    [4]
    Yasushi Tsubota, Tatsuya Kawahara, and Masatake Dantsuji. “CALL system
    for Japanese students of English using pronunciatiom error prediction and
    formant structure estimation,” in Proc. InSTILL, 2002.
    [5]
    Shih-Min Tang, “Error pattern analysis for computer assisted English
    pronunciation learning,” thesis from NCKU, 2005.
    [6]
    Wei-Tang Hsu, “Error-Spotting in pronunciation of English vowels based on
    speech recognition technologies,” thesis from NTHU, 2005.
    [7]
    Forney, G.D., Jr.,“The viterbi algorithm”,IEEE ,1973.
    [8]
    Jyh-Shing Roger Jang, “ Audio Signal Processing and Recognition ”, http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/

    [9]
    Jyh-Shing Roger Jang, “ Data Clustering and Pattern Recognition ”, http://neural.cs.nthu.edu.tw/jang/books/dcpr/

    [10]
    TCC-300 Corpus, http://www.aclclp.org.tw/use_mat.php#tcc300edu
    [11]
    王小川,“語音訊號處理”,全華科技圖書股份有限公司,台北,民國93年.
    [12]
    The HTK Book (for HTK Version 3.4) , COPYRIGHT 2001-2006 Cambridge University Engineering Department.
    [13] Modeling pronunciation variation for ASR A survey of the literature.pdf
    [14] http://www.speech.kth.se/wavesurfer/download.html

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE