基於混合式方法的華語語料庫之自動切音研究

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳冠廷 Kuan-Ting Chen
論文名稱：	基於混合式方法的華語語料庫之自動切音研究 A Hybrid Approach to Automatic Speech Segmentation for Mandarin Speech Corpora
指導教授：	張智星 Jyh-Shing Roger Jang
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2005
畢業學年度：	93
語文別：	英文
論文頁數：	47
中文關鍵詞：	automatic segmentation 、phonetic labeling 、HMM-based recognizer 、sequential forward selection 、k-nearest neighbor rule 、leave-one-out
相關次數：	點閱：54 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

精確的標音對於以大語料為基礎的語音合成系統(corpus-based TTS)相當重要，然而以維特比(Viterbi)進行強制對位(forced alignment)的自動切音結果並不夠精確，加上適合某種語言的自動標音方式並不完全可以套用在另一種語言，因此，我們針對華語語料提供一種新的分界點微調(boundary refinement )方式。
本論文所使用的方法，針對華語的語音特性分成四大類，接著針對不同的分界點組合，我們利用圖形識別(pattern recognition)的方式選擇合適的聲學特徵，各自進行分界點微調，其中連音類(“periodic voiced + periodic voiced”)的分界點微調結果並不理想，關於此類我們採用以共振峰(formant)為基礎的新特徵進行特別處理。
為了驗證我們所提出方法的可行性，我們比較前人以CART(Classification and Regression Tree)為基礎的分界點微調方式，並提供許多實驗數據比較，根據實驗結果，我們所使用的分界點微調方式能夠得到相當高的切音準確率。

Precise phone/syllable boundary labeling of the utterances in a speech corpus plays an important role in constructing a corpus-based TTS (text-to-speech) system. However, automatic labeling based on Viterbi forced alignment does not always produce satisfactory results. Moreover, a suitable labeling method for one language does not necessarily produce desirable results for another language. Hence in this thesis, we propose a new procedure for refining the boundaries of utterances in a Mandarin speech corpus. This procedure employs different sets of acoustic features for four different phonetic categories. In addition, a new scheme is proposed to deal with the “periodic voiced + periodic voiced” case, which produced most of the segmentation errors in our experiment. Several experiments were conducted to demonstrate the feasibility of the proposed approach.

  INTRODUCTION    1
1    MOTIVATION    1
2    RELATED WORK    1
3    SUMMARY OF THE THESIS    3
4    ORGANIZATION OF THE THESIS    5
  HMM BASED RECOGNIZER    6
1    SPEECH CORPUS INTRODUCTION    6
2    FROM ORTHOGRAPHIC TRANSCRIPTION TO PHONETIC TRANSCRIPTION    8
3    TRAINING DIFFERENT ACOUSTIC MODELS OF HMM-BASED RECOGNIZERS    10
  DESIGN OF THE REFINEMENT PROCEDURE    12
1    FOUR PHONETIC CATEGORIES    12
2    FEATURE DEFINITION    14
2.1    Bisector Frequency    15
2.2    Burst Degree    16
3    FEATURE SELECTION BASED ON PHONETIC CATEGORIES    18
3.1    Defining candidate boundaries for training data    20
3.2    Feature definition    21
3.3    Feature selection by SFS, KNNR and LOO    21
3.4    Classification rates for phonetic category transitions    22
4    FURTHER IMPROVEMENT FOR “PERIODIC VOICED + PERIODIC VOICED” CASES            25
4.1    “Divide and conquer” method    25
4.2    A heuristic approach for group W    25
  EXPERIMENT RESULTS AND ERROR ANALYSIS    30
1    THE PERFORMANCE OF DIFFERENT ACOUSTIC MODES FOR LABELING THE TRAIN-455 CORPUS    30
2    A COMPARISON OF THE SEGMENTATION ACCURACY BETWEEN FORCED ALIGNMENT AND OUR REFINEMENT PROCEDURE    31
3    A COMPARISON WITH CART-BASED REFINEMENT PROCEDURE AND OUR REFINEMENT PROCEDURE    36
3.1    CART-Based Refinement Procedure    36
3.2    The Comparison between our refinement method and CART-based method        38
4    RESULTS AND DISCUSSIONS    39
  CONCLUSIONS    41
A    APPENDIX    42
A.1    AN OVERVIEW OF THE CART METHODOLOGY    42
A.1.1    Splitting Rules    42
A.1.2    Class assignment    43
A.1.3    Decide when to stop splitting    45
A.2    DISCUSSION    45
BIBLIOGRAPHY    46

                                

[1] Cheng-Yuan Lin, Jyh-Shing Roger Jang, Kuan-Ting Chen, "Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpus for Concatenation-based TTS", International Journal of Computational Linguistics and Chinese Language Processing, 2005.
[2] LiJuan Wang et al. “Refining Segmental Boundaries for TTS database Using Fine Contextual-Dependent Boundary Models”, ICASSP 2004.
[3] Sethy, A. Narayanan, S, “Refined Speech Segmentation for Concatenative Speech Synthesis”, ICSLP, 2002, pp. 149-152.
[4] D. Torre Toledano et al. “Trying to Mimic Human Segmentation of Speech Using HMM and Fuzzy Logic Post-correction Rules”, Proc. Third ESCA/COCOSDA Workshop on SPEECH SYNTHESIS, 1998.
[5] Jan P. H. van Santen, J., Sproat, R. “High-accuracy automatic segmentation”, Proceedings of European Conference on Speech Communication and Technology, 1990.
[6] Kris Demuynck and Tom Laureys. “A Comparison of Different Approaches to Automatic Speech Segmentation”, Proceedings of International Conference on Text, Speech and Dialogue, 2002, pp. 277--284.
[7] Richard O. Duda, Peter E. Hart, David G. Stork, “Pattern classification, 2nd edition”, New York, Wiley, 2001.
[8] Chen, K. J. and S. H. Liu, “Word identification for mandarin Chinese sentences,” Proceedings of the Fifteenth International Conference on Computational Linguistics, 1992, pp. 101-107.

[9] Yeh, C. L. and H. J. Lee, “Rule-based word identification for Mandarin Chinese sentences - A unification approach,” Computer Processing of Chinese and Oriental Languages, 1991, pp. 97-118.
[10] Sproat, R. and C. Shih, “A statistical method for finding word boundaries in Chinese text,” Computer Processing of Chinese and Oriental Languages, 1990, pp.336-351.
[11] Huang, X., A. Acero and H. W. Hon, Spoken language processing, Prentice Hall, New Jersey, 2001.
[12] Odell, J., D. Ollason, P. Woodland, S. Young and J. Jansen, The HTK Book for HTK V2.0, Cambridge University Press, Cambridge UK, 1995.
[13] http://www.linguiste.org/
[14] Lu, H.-M., “An implementation and Analysis of Mandarin Speech Synthesis Technologies,” MD thesis, National Chiao Tung University at Taiwan, 2002.
[15] Shen, J.-L., J.-W. Hung and L.-S. Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments,” Proceedings of International Conference on Spoken Language Processing, 1998.
[16] Fu-chiang Chou, Chiu-Yu Tseng and Lin-shan Lee, “A Set of Corpus-based Text-to-speech Synthesis Technologies for Mandarin Chinese”, IEEE Transactions on Speech and Audio Processing, Vol.10, No.7, 2002, pp.481-494.
[17] Whitney, A., "A direct method of nonparametric measurement selection," IEEE Transactions on Computers, 20(9), 1971, pp.1100-1103.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文