研究生: |
蔡昀庭 Tsai, Yun-Ting |
---|---|
論文名稱: |
基於隱藏式馬可夫模型之中文語音合成系統 HMM-base speech synthesis for Mandarin Chinese |
指導教授: |
王小川
Wang, Hsiao-Chuan |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 65 |
中文關鍵詞: | 隱藏式馬可夫模型 、中文語音合成系統 、語音合成 |
外文關鍵詞: | HMM-based, speech synthesis, Chinese speech synthesis |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
基於隱藏式馬可夫模型實作中文語音合成的技術,只需要使用少量的語料,就能合出品質還不錯的語音,最大的優勢即在於參數化之語音表示法本身的彈性,在一些延伸的合成應用上,例如加入情緒特徵、聲音轉換(voice conversion)、說話風格等,較為容易。在建構不同語言的語音合成系統時,也只需要做少量的修改。
由於上述的優勢,基於隱藏式馬可夫模型的中文語音合成已經越來越廣泛的應用在各種實際系統之中。然而如何使得合成語音具有像自然語音那樣生動的韻律和節奏,一直是語音合成領域中的一大挑戰。本論文透過在標記(label)中加上音素與句、詞、字三層結構之相關位置的韻律資訊,增加合成語音的韻律、節奏之自然度。
本論文更進一步實驗隱藏式馬可夫模型的狀態個數、語音標記的方式、聲調之韻律資訊對合成語音的影響,經過增加前後音節的聲調資訊,能夠有效提升合成語音聲調的自然度,並解決中文語音發音之中,對於連續兩個聲調為三聲的音節,前音節必須發為二聲的問題。
本論文之語音合成系統所用的文句分析器較為陽春,由於詞庫使用windows XP內建之小型詞庫,演算法也較為簡單,因此斷詞的準確度不夠,使得無論是在訓練或是合成上,對於詞的韻律掌控仍有改進的空間,若能取得較龐大的詞庫,以及套用較好的演算法,合成語音的韻律性應該能獲得進一步的提升。
[1] Spoken Language Groups, MIT, http://www.sls.csail.mit.edu/sls/sls-orange-nospec.html
[2] MIT Project Oxygen
Computer Science and Artificial Intelligence Laboratory
http://oxygen.csail.mit.edu/
[3] J. Olive, A. Greenwood, and J. Coleman, Acoustics of American English Speech: A Dynamic Approach, Springer Verlag, 1993.
[4] Y. Sagisaka, N. Kaiki, N. Iwahashi, and K. Mimura, ATR ν-TALK
speech synthesis system, ICSLP, 1992, pp. 483–486.
[5] A. Hunt and A. Black, Unit selection in a concatenative speech synthesis system using a large speech database, ICASSP, 1996, pp.
373–376.
[6] A. Black and K. Lenzo, Limited domain synthesis, ICSLP, 2000,
pp. 411–414.
[7] E. Eide, A. Aaron, R. Bakis, W. Hamza, M. Picheny, and J. Pitrelli,
“A corpus-based approach to <AHEM/> expressive speech synthesis
authors,” in ISCA SSW5, 2004.
[8] C. Bennett, “Large scale evaluation of corpus-based synthesizers: Results and lessons from the Blizzard Challenge 2005,” in Interspeech,
2005, pp. 105–108.
[9] H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, “Restructuring
speech representations using a pitch-adaptive time-frequency smoothing
and an instantaneous-frequency-based f0 extraction: possible role
of a repetitive structure in sounds,” Speech Communication, vol. 27,
pp. 187–207, 1999.
[10] A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in ICASSP, 1996, pp.
373–376.
[11] Alan W Black, Heiga Zen, Keiichi Tockuda, Statistical parameter speech synthesis, ICASSP 2007.
[12] Sami Lemmetty, Review of Speech Synthesis Technology, master's thesis, Helsinki University of Technology Department of Electrical and Communications Engineering, 1999
[13] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, Proc. of ICASSP, June 2000.
[14] Heiga Zen, Tomoki Toda, An overview of Nitech HMM-based speech synthesis system for Blizzard Challenge 2005, INTERSPEECH 2005.
[15] S. Imai, Cepstral analysis synthesis on the mel frequency scale, ICASSP 1983.
[16] 王小川, 語音訊號處理.
[17] K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Mel-generalized cepstral analysis – A unified approach to speech spectral estimation," Proc. ICASSP, pp.1043–1046, 1994.
[18] T. Kobayashi and S. Imai, “Spectral analysis using generalized cepstrum,” IEEE Trans. Acoust., Speech, Signal processing, vol. ASSP-32, pp.1087–1089, Oct. 1984.
[19] T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, and T. Kitamura, “Duration modeling for HMM-based speech synthesis,” Proc. ICSLP-98, vol.2, Tu3A4, pp.29--32, Nov. 1998.
[20] H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Hidden semi-Markov model based speech synthesis," Proc. ICSLP 2004
[21] K. Tokuda, T. Kobayashi, and S. Imai, “Speech parameter generation from HMM using dynamic features,” in Proc. of ICASSP,
1995, pp. 660–663.
[22] H.A. Engelbrecht, J.A. du Preez, “Hidden Markov model-based modeling of context-dependent phonemes using decision tree-based state clustering”.
[23] HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
[24] Hidden Markov Model Toolkit (HTK), http://htk.eng.cam.ac.uk/
[25] Speech Signal Processing Toolkit (SPTK), http://sp-tk.sourceforge.net/
[26] HTS engine, http://hts-engine.sourceforge.net/
[27] K. Shinoda and T. Watanabe, “Acoustic modeling based on the
MDL criterion for speech recognition,” in Proc. Eurospeech,
1997, pp. 99–102.
[28] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration
in HMM-basedspeech synthesis,” in Proc. Eurospeech, 1999, pp.
2347–2350.
[29] K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Hidden
Markov models based on multi-space probability distribution for
pitch pattern modeling,” in Proc. ICASSP, 1999, pp. 229–232.
[30] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Durationmodeling for HMM-basedspeech synthesis,” in
Proc. ICSLP, 1998, pp. 29–32.
[31] T. Toda and K. Tokuda, “A speechparametergenerationalgorithm
considering global variance for HMM-based speech synthesis,” IEICE Trans. Inf. & Syst., vol. E90-D, no. 5, pp. 816–824, 2007