研究生: |
徐培霖 HSU, PEI-LIN |
---|---|
論文名稱: |
基於特徵替換法對語者調適語音合成之改進 On the Use of Speech Feature Substitution for Speaker Adaption within HMM-based TTS |
指導教授: | 張智星 |
口試委員: |
李俊仁
林政源 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 語音合成 、語者調適 、文字轉語音 |
外文關鍵詞: | speech synthesis, speaker adaptaion, text-to-speech |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文實作一線上語者調適及中文語音合成系統並提出特徵替換法用以改善合成語音。使用者在此系統中輸入欲合成文字,此系統會為該段文字進行斷詞、標聲調,以使用者選擇之聲學模型進行語音合成。
此系統也提供語者調適的功能,使用者在線上進行錄音,此系統依據文本及音檔進行語音評分,決定是否接受此語料。使用者錄製完畢後,系統後臺程式自動進行語者調適,訓練該使用者之聲學模型。
此外,本論文針對語者調適之合成語音,提出一個使用特徵替換的方法來改善其效果。這個方法使用真實語音片段的頻譜特徵,取代由聲學模型估計的頻譜特徵,藉此提升合成音檔與目標語者發音的相似度。在MOS評分中此方法較原始語者調適合成音檔的分數高了0.4分。
This study implements an online Mandarin speech synthesis system with speaker adaptation and proposes a speech feature substitution approach to improve the quality of the synthesized speech. The system takes texts provided by users as input and performs POS and tone tagging. The synthesis can be done with the acoustic models of users’ choices.
This system also provides a speaker adaptation function. First, the user is asked to record a few sentences through a web interface. A speech scoring technique is used to validate the quality of the recorded utterances. The system then uses these utterances to perform speaker adaptation to adjust the acoustic models for speech synthesis.
Moreover, this study proposes a speech feature substitution method to improve the quality of speaker adaptation. This method adopts the spectral features extracted from real speech utterances instead of estimating them from acoustic models. The similarity between the synthesized speech and target speech is therefore increased. The experimental result shows that the proposed method is able to improve upon the original method with an 0.4 increase in MOS score.
【1】 F. C. Chou, C. Y. Tseng, and L. S. Lee, “A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese,” IEEE Trans. on Speech and Audio Processing, vol. 10, pp. 481–494, 2002.
【2】 A. Hunt and A. Black, “Unit selection in a concatenative speechsynthesis system using a large speech database” , ICASSP, pp. 373–376, 1996.
【3】 Alan W. Black and Nick Campbell, “Optimising Selection of Units from Speech Databases for Concatenative Synthesis,” in Proc. of EUROSPEECH, pp.581–584, Sep. 1995.
【4】 E. Moulines, F. Charpentier, “Pitch Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis using Diphones”, Speech Communication 9 (5,6), pp. 453-467, 1990.
【5】 W. Verhelst, and M. Roelands, “An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech” Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, p.554-557, 1993.
【6】 Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura, “Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis,” in Proc. of EUROSPEECH,pp.2347–2350, 1999.
【7】 K. Tokuda, T. Yoshimura, T. Masuko, T.Kobayashi, and T. Kitamura, “Speech parameter generation algorithms for HMM-based speech synthesis,” in Proc. of ICASSP, 2000, pp. 1315–1318.
【8】 Satoshi Imai, “Cepstral Analysis Synthesis on the Mel Frequency Scale,” in Proc. of ICASSP, pp.93–96, 1983.
【9】 S. Imai, K. Sumita and C. Furuichi, “Mel log spectrum approximation (MLSA) filter for speech synthesis,” Electron. Comm. Jpn, vol. 66, no. 2, pp. 10-18, 1983.
【10】 C.-H. Lee, C.-H. Lin, and B.-H. Juang, “A study on speaker adaptation of the parameters of continuous density hidden Markov models,” IEEE Trans. Signal Processing, vol. 39, pp. 806–814, 1991.
【11】 Yao Qian, Zhi-jie Yan,Yijian Wu, Frank Soong, Xin Zhuang, Shengyi Kong, “An HMM Trajectory Tiling (HTT) Approach to High Quality TTS” INTERSPEECH 2010
【12】 林政源,「應用於文字轉語音系統的語者調適方法回顧」, Vol.139, 電腦與通訊, 2011
【13】 唐若華,張智星,「基於詞性之斷詞方法以改善華語語音合成系統」,國立清華大學資訊工程學系碩士論文,2010。
【14】 羅珝瑩,張智星,「根基於HMM之華語語音合成初步研究」,國立清華大學資訊工程學系碩士論文,2007。
【15】 吳尚鴻,王小川,「基於隱藏式馬可夫模型之中文語音合成與吼叫情緒轉換」,2010
【16】 Hidden Markov Model Toolkit (HTK), 單位 http://htk.eng.cam.ac.uk/
【17】 Speech Signal Processing Toolkit (SPTK), http://sp-tk.sourceforge.net/
【18】 HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
【19】 HTS engine, http://hts-engine.sourceforge.net/
【20】 The jQuery HTML5 Audio / Video Library, http://jplayer.org/
【21】 jQuery, http://jquery.com/