研究生: |
羅珝瑩 Lo, Hsu-Ying |
---|---|
論文名稱: |
根基於 HMM 之華語語音合成初步研究 An Initial Study on HMM-based TTS for Mandarin Chinese |
指導教授: |
張智星
Jang, Jyh-Shing Roger |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 37 |
中文關鍵詞: | 隱藏式馬可夫模型 、語音合成 、聲學模型 、音高追蹤 |
外文關鍵詞: | Hidden Markov Model, Speech Synthesis, Acoustic Model, Pitch Tracking |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在針對華語語音合成系統進行改進,以根基於HMM之語音合成系統為架構,探討不同的聲學模型:「聲母、韻母標示法」、「聲母、帶聲調韻母標示法」、「音節內右相關標示法」,與音高追蹤方法:「UPDUDP」、「RAPT」,對於語音合成結果的影響,從而實作出自然且流暢的華語語音合成系統。
我們採用偏好測試對合成語音進行自然度評估,根據評估結果,最後採用「音節內右相關標示法」作為本系統的聲學模型;「RAPT」作為本系統音高追蹤的方法。所建構完成的華語語音合成系統展示於http://mirlab.org/Demo/TTS/。
In this study, we focus on improving the performance of Hidden Markov Model-based Text-to-Speech system for Mandarin Chinese to achieve better smoothness and fluency of synthesized speech. Two factors are taken into consideration in our work: the design of acoustic model and pitch tracking algorithm for the training process. We implement three acoustic models, “consonants and vowels”, “consonants and tonal vowels”, and “right context dependent phonemes of syllables”. As for pitch tracking, we compare “RAPT” against “UPDUDP”.
We employed preference tests to evaluate the synthesized speech. According to the result, we choose “right context dependent phonemes of syllables” as the acoustic model and “RAPT” as pitch tracking algorithm to construct our speech synthesis system. The implemented system is publicly available at http://mirlab.org/Demo/TTS/.
【1】Alan W. Black and Nick Campbell, “Optimising Selection of Units from Speech Databases for Concatenative Synthesis,” in Proc. of EUROSPEECH, pp.581–584, Sep. 1995.
【2】Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura, “Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis,” in Proc. of EUROSPEECH, pp.2347–2350, 1999.
【3】Keiichi Tokuda, Heiga Zen, Junichi Yamagishi, Takashi Masuko, Shinji Sako, Alan W. Black, and Takashi Nose, “The HMM-based Speech Synthesis System (HTS),” http://hts.sp.nitech.ac.jp/ .
【4】Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, and Keiichi Tokuda, “The HMM-based Speech Synthesis System Version 2.0,” in Proc. of ISCA SSW6, pp.294–299, Aug. 2007.
【5】Toshiaki Fukada, Keiichi Tokuda, Takao Kobayashi and Satoshi Imai, “An Adaptive Algorithm for Melcepstral Analysis of Speech,” in Proc. of ICASSP, vol.1, pp.137–140, 1992.
【6】Julian James Odell, “The Use of Context in Large Vocabulary Speech Recognition,” PhD dissertation, Cambridge University, 1995.
【7】Keiichi Tokuda, Takao Kobayashi and Satoshi Imai, “Speech Parameter Generation from HMM Using Dynamic Features,” in Proc. of ICASSP, pp.660–663, 1995.
【8】Takashi Masuko, Keiichi Tokuda, Takao Kobayashi and Satoshi Imai, “Speech Synthesis from HMMs Using Dynamic Features,” in Proc. of ICASSP, pp.389–392, 1996.
【9】Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi and Tadashi Kitamura, “Speech Parameter Generation Algorithms for HMM-based Speech Synthesis,” in Proc. of ICASSP, vol.3, pp.1315–1318, June 2000.
【10】Satoshi Imai, “Cepstral Analysis Synthesis on the Mel Frequency Scale,” in Proc. of ICASSP, pp.93–96, 1983.
【11】Keiichi Tokuda, Heiga Zen, and Alan W. Black, “An HMM-based Speech Synthesis System Applied to English,” IEEE Speech Synthesis Workshop, 2002.
【12】Sacha Krstulović, Anna Hunecke, and Marc Schröeder, “An HMM-based Speech Synthesis System Applied to German and its Adaptation to a Limited Set of Expressive Football Announcements,” in Proc. of Interspeech, 2007.
【13】Xavi Gonzalvo, Ignasi Iriondo, Joan Claudi Socoró, Francesc Alías, Carlos Monzo, “HMM-based Spanish Speech Synthesis using CBR as F0 Estimator,” in ITRW on NOLISP, 2007.
【14】張智星,“離散的隱藏式馬可夫模型” URL:http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/hmmDiscrete.asp?title=15-2 Discrete HMM
【15】教育部重編國語辭典修訂本URL:http://dict.revised.moe.edu.tw/ .
【16】Jiang-Chun Chen, and Jyh-Shing Roger Jang, “TRUES: Tone Recognition Using Extended Segments,” ACM Transactions on Asian Language Information Processing, 2008.
【17】Myron J. Ross, Harry L. Shaffer, Andrew Cohen, Richard Freudberg, and Harold J. Manley, “Average Magnitude Difference Function Pitch Extractor,” IEEE Transactions on Acoustics, Speech and Signal Processing, 1974.
【18】David Talkin, “A Robust Algorithm for Pitch Tracking (RAPT),” in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal Eds., pp.495–518, 1995.
【19】B. S. Atal, “Automatic Speaker Recognition Based on Pitch Contours,” PhD thesis, Polytechnic Institute of Brooklyn, 1968.