簡易檢索 / 詳目顯示

研究生: 陳冠廷
Kuan-Ting Chen
論文名稱: 基於混合式方法的華語語料庫之自動切音研究
A Hybrid Approach to Automatic Speech Segmentation for Mandarin Speech Corpora
指導教授: 張智星
Jyh-Shing Roger Jang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 47
中文關鍵詞: automatic segmentationphonetic labelingHMM-based recognizersequential forward selectionk-nearest neighbor ruleleave-one-out
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 精確的標音對於以大語料為基礎的語音合成系統(corpus-based TTS)相當重要,然而以維特比(Viterbi)進行強制對位(forced alignment)的自動切音結果並不夠精確,加上適合某種語言的自動標音方式並不完全可以套用在另一種語言,因此,我們針對華語語料提供一種新的分界點微調(boundary refinement )方式。
    本論文所使用的方法,針對華語的語音特性分成四大類,接著針對不同的分界點組合,我們利用圖形識別(pattern recognition)的方式選擇合適的聲學特徵,各自進行分界點微調,其中連音類(“periodic voiced + periodic voiced”)的分界點微調結果並不理想,關於此類我們採用以共振峰(formant)為基礎的新特徵進行特別處理。
    為了驗證我們所提出方法的可行性,我們比較前人以CART(Classification and Regression Tree)為基礎的分界點微調方式,並提供許多實驗數據比較,根據實驗結果,我們所使用的分界點微調方式能夠得到相當高的切音準確率。


    Precise phone/syllable boundary labeling of the utterances in a speech corpus plays an important role in constructing a corpus-based TTS (text-to-speech) system. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this thesis, we propose a new procedure for refining the boundaries of utterances in a Mandarin speech corpus. This procedure employs different sets of acoustic features for four different phonetic categories. In addition, a new scheme is proposed to deal with the “periodic voiced + periodic voiced” case, which produced most of the segmentation errors in our experiment. Several experiments were conducted to demonstrate the feasibility of the proposed approach.

    1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 RELATED WORK 1 1.3 SUMMARY OF THE THESIS 3 1.4 ORGANIZATION OF THE THESIS 5 2 HMM BASED RECOGNIZER 6 2.1 SPEECH CORPUS INTRODUCTION 6 2.2 FROM ORTHOGRAPHIC TRANSCRIPTION TO PHONETIC TRANSCRIPTION 8 2.3 TRAINING DIFFERENT ACOUSTIC MODELS OF HMM-BASED RECOGNIZERS 10 3 DESIGN OF THE REFINEMENT PROCEDURE 12 3.1 FOUR PHONETIC CATEGORIES 12 3.2 FEATURE DEFINITION 14 3.2.1 Bisector Frequency 15 3.2.2 Burst Degree 16 3.3 FEATURE SELECTION BASED ON PHONETIC CATEGORIES 18 3.3.1 Defining candidate boundaries for training data 20 3.3.2 Feature definition 21 3.3.3 Feature selection by SFS, KNNR and LOO 21 3.3.4 Classification rates for phonetic category transitions 22 3.4 FURTHER IMPROVEMENT FOR “PERIODIC VOICED + PERIODIC VOICED” CASES 25 3.4.1 “Divide and conquer” method 25 3.4.2 A heuristic approach for group W 25 4 EXPERIMENT RESULTS AND ERROR ANALYSIS 30 4.1 THE PERFORMANCE OF DIFFERENT ACOUSTIC MODES FOR LABELING THE TRAIN-455 CORPUS 30 4.2 A COMPARISON OF THE SEGMENTATION ACCURACY BETWEEN FORCED ALIGNMENT AND OUR REFINEMENT PROCEDURE 31 4.3 A COMPARISON WITH CART-BASED REFINEMENT PROCEDURE AND OUR REFINEMENT PROCEDURE 36 4.3.1 CART-Based Refinement Procedure 36 4.3.2 The Comparison between our refinement method and CART-based method 38 4.4 RESULTS AND DISCUSSIONS 39 5 CONCLUSIONS 41 A APPENDIX 42 A.1 AN OVERVIEW OF THE CART METHODOLOGY 42 A.1.1 Splitting Rules 42 A.1.2 Class assignment 43 A.1.3 Decide when to stop splitting 45 A.2 DISCUSSION 45 BIBLIOGRAPHY 46

    [1] Cheng-Yuan Lin, Jyh-Shing Roger Jang, Kuan-Ting Chen, "Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS", International Journal of Computational Linguistics and Chinese Language Processing, 2005.
    [2] LiJuan Wang et al. “Refining Segmental Boundaries for TTS database Using Fine Contextual-Dependent Boundary Models”, ICASSP 2004.
    [3] Sethy, A. Narayanan, S, “Refined Speech Segmentation for Concatenative Speech Synthesis”, ICSLP, 2002, pp. 149-152.
    [4] D. Torre Toledano et al. “Trying to Mimic Human Segmentation of Speech Using HMM and Fuzzy Logic Post-correction Rules”, Proc. Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS, 1998.
    [5] Jan P. H. van Santen, J., Sproat, R. “High-accuracy automatic segmentation”, Proceedings of European Conference on Speech Communication and Technology, 1990.
    [6] Kris Demuynck and Tom Laureys. “A Comparison of Different Approaches to Automatic Speech Segmentation”, Proceedings of International Conference on Text, Speech and Dialogue, 2002, pp. 277--284.
    [7] Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern classification, 2nd edition”, New York, Wiley, 2001.
    [8] Chen, K. J. and S. H. Liu, “Word identification for mandarin Chinese sentences,” Proceedings of the Fifteenth International Conference on Computational Linguistics, 1992, pp. 101-107.

    [9] Yeh, C. L. and H. J. Lee, “Rule-based word identification for Mandarin Chinese sentences - A unification approach,” Computer Processing of Chinese and Oriental Languages, 1991, pp. 97-118.
    [10] Sproat, R. and C. Shih, “A statistical method for finding word boundaries in Chinese text,” Computer Processing of Chinese and Oriental Languages, 1990, pp.336-351.
    [11] Huang, X., A. Acero and H. W. Hon, Spoken language processing, Prentice Hall, New Jersey, 2001.
    [12] Odell, J., D. Ollason, P. Woodland, S. Young and J. Jansen, The HTK Book for HTK V2.0, Cambridge University Press, Cambridge UK, 1995.
    [13] http://www.linguiste.org/
    [14] Lu, H.-M., “An implementation and Analysis of Mandarin Speech Synthesis Technologies,” MD thesis, National Chiao Tung University at Taiwan, 2002.
    [15] Shen, J.-L., J.-W. Hung and L.-S. Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments,” Proceedings of International Conference on Spoken Language Processing, 1998.
    [16] Fu-chiang Chou, Chiu-Yu Tseng and Lin-shan Lee, “A Set of Corpus-based Text-to-speech Synthesis Technologies for Mandarin Chinese”, IEEE Transactions on Speech and Audio Processing, Vol.10, No.7, 2002, pp.481-494.
    [17] Whitney, A., "A direct method of nonparametric measurement selection," IEEE Transactions on Computers, 20(9), 1971, pp.1100-1103.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE