研究生: |
陳冠廷 Kuan-Ting Chen |
---|---|
論文名稱: |
基於混合式方法的華語語料庫之自動切音研究 A Hybrid Approach to Automatic Speech Segmentation for Mandarin Speech Corpora |
指導教授: |
張智星
Jyh-Shing Roger Jang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2005 |
畢業學年度: | 93 |
語文別: | 英文 |
論文頁數: | 47 |
中文關鍵詞: | automatic segmentation 、phonetic labeling 、HMM-based recognizer 、sequential forward selection 、k-nearest neighbor rule 、leave-one-out |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
精確的標音對於以大語料為基礎的語音合成系統(corpus-based TTS)相當重要,然而以維特比(Viterbi)進行強制對位(forced alignment)的自動切音結果並不夠精確,加上適合某種語言的自動標音方式並不完全可以套用在另一種語言,因此,我們針對華語語料提供一種新的分界點微調(boundary refinement )方式。
本論文所使用的方法,針對華語的語音特性分成四大類,接著針對不同的分界點組合,我們利用圖形識別(pattern recognition)的方式選擇合適的聲學特徵,各自進行分界點微調,其中連音類(“periodic voiced + periodic voiced”)的分界點微調結果並不理想,關於此類我們採用以共振峰(formant)為基礎的新特徵進行特別處理。
為了驗證我們所提出方法的可行性,我們比較前人以CART(Classification and Regression Tree)為基礎的分界點微調方式,並提供許多實驗數據比較,根據實驗結果,我們所使用的分界點微調方式能夠得到相當高的切音準確率。
Precise phone/syllable boundary labeling of the utterances in a speech corpus plays an important role in constructing a corpus-based TTS (text-to-speech) system. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this thesis, we propose a new procedure for refining the boundaries of utterances in a Mandarin speech corpus. This procedure employs different sets of acoustic features for four different phonetic categories. In addition, a new scheme is proposed to deal with the “periodic voiced + periodic voiced” case, which produced most of the segmentation errors in our experiment. Several experiments were conducted to demonstrate the feasibility of the proposed approach.
[1] Cheng-Yuan Lin, Jyh-Shing Roger Jang, Kuan-Ting Chen, "Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS", International Journal of Computational Linguistics and Chinese Language Processing, 2005.
[2] LiJuan Wang et al. “Refining Segmental Boundaries for TTS database Using Fine Contextual-Dependent Boundary Models”, ICASSP 2004.
[3] Sethy, A. Narayanan, S, “Refined Speech Segmentation for Concatenative Speech Synthesis”, ICSLP, 2002, pp. 149-152.
[4] D. Torre Toledano et al. “Trying to Mimic Human Segmentation of Speech Using HMM and Fuzzy Logic Post-correction Rules”, Proc. Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS, 1998.
[5] Jan P. H. van Santen, J., Sproat, R. “High-accuracy automatic segmentation”, Proceedings of European Conference on Speech Communication and Technology, 1990.
[6] Kris Demuynck and Tom Laureys. “A Comparison of Different Approaches to Automatic Speech Segmentation”, Proceedings of International Conference on Text, Speech and Dialogue, 2002, pp. 277--284.
[7] Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern classification, 2nd edition”, New York, Wiley, 2001.
[8] Chen, K. J. and S. H. Liu, “Word identification for mandarin Chinese sentences,” Proceedings of the Fifteenth International Conference on Computational Linguistics, 1992, pp. 101-107.
[9] Yeh, C. L. and H. J. Lee, “Rule-based word identification for Mandarin Chinese sentences - A unification approach,” Computer Processing of Chinese and Oriental Languages, 1991, pp. 97-118.
[10] Sproat, R. and C. Shih, “A statistical method for finding word boundaries in Chinese text,” Computer Processing of Chinese and Oriental Languages, 1990, pp.336-351.
[11] Huang, X., A. Acero and H. W. Hon, Spoken language processing, Prentice Hall, New Jersey, 2001.
[12] Odell, J., D. Ollason, P. Woodland, S. Young and J. Jansen, The HTK Book for HTK V2.0, Cambridge University Press, Cambridge UK, 1995.
[13] http://www.linguiste.org/
[14] Lu, H.-M., “An implementation and Analysis of Mandarin Speech Synthesis Technologies,” MD thesis, National Chiao Tung University at Taiwan, 2002.
[15] Shen, J.-L., J.-W. Hung and L.-S. Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments,” Proceedings of International Conference on Spoken Language Processing, 1998.
[16] Fu-chiang Chou, Chiu-Yu Tseng and Lin-shan Lee, “A Set of Corpus-based Text-to-speech Synthesis Technologies for Mandarin Chinese”, IEEE Transactions on Speech and Audio Processing, Vol.10, No.7, 2002, pp.481-494.
[17] Whitney, A., "A direct method of nonparametric measurement selection," IEEE Transactions on Computers, 20(9), 1971, pp.1100-1103.