簡易檢索 / 詳目顯示

研究生: 羅珝瑩
Lo, Hsu-Ying
論文名稱: 根基於 HMM 之華語語音合成初步研究
An Initial Study on HMM-based TTS for Mandarin Chinese
指導教授: 張智星
Jang, Jyh-Shing Roger
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 37
中文關鍵詞: 隱藏式馬可夫模型語音合成聲學模型音高追蹤
外文關鍵詞: Hidden Markov Model, Speech Synthesis, Acoustic Model, Pitch Tracking
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在針對華語語音合成系統進行改進,以根基於HMM之語音合成系統為架構,探討不同的聲學模型:「聲母、韻母標示法」、「聲母、帶聲調韻母標示法」、「音節內右相關標示法」,與音高追蹤方法:「UPDUDP」、「RAPT」,對於語音合成結果的影響,從而實作出自然且流暢的華語語音合成系統。
    我們採用偏好測試對合成語音進行自然度評估,根據評估結果,最後採用「音節內右相關標示法」作為本系統的聲學模型;「RAPT」作為本系統音高追蹤的方法。所建構完成的華語語音合成系統展示於http://mirlab.org/Demo/TTS/。


    In this study, we focus on improving the performance of Hidden Markov Model-based Text-to-Speech system for Mandarin Chinese to achieve better smoothness and fluency of synthesized speech. Two factors are taken into consideration in our work: the design of acoustic model and pitch tracking algorithm for the training process. We implement three acoustic models, “consonants and vowels”, “consonants and tonal vowels”, and “right context dependent phonemes of syllables”. As for pitch tracking, we compare “RAPT” against “UPDUDP”.
    We employed preference tests to evaluate the synthesized speech. According to the result, we choose “right context dependent phonemes of syllables” as the acoustic model and “RAPT” as pitch tracking algorithm to construct our speech synthesis system. The implemented system is publicly available at http://mirlab.org/Demo/TTS/.

    摘要 i Abstract ii 誌謝 iii 目錄 iv 圖目錄 vi 表目錄 vii 第1章 緒論 1 1.1 研究動機 1 1.2 相關研究 1 1.3 根基於HMM之華語語音合成系統架構 2 1.4 章節概述 4 第2章 根基於HMM之華語語音合成系統之建構 5 2.1 華語聲學模型 5 2.1.1 聲母、韻母標示法 6 2.1.2 聲母、帶聲調韻母標示法 7 2.1.3 音節內右相關標示法 8 2.2 自動邊界標註 8 2.3 聲調標註 10 2.3.1 三聲變調 10 2.3.2 「一」的變調 11 2.3.3 「不」的變調 12 2.4 文脈相關資訊 12 2.5 決策樹問題集 14 2.6 決策樹 17 2.7 音高追蹤 19 2.7.1 UPDUDP 19 2.7.2 RAPT 21 第3章 評估結果與分析 25 3.1 實驗設定 25 3.1.1 訓練資料 25 3.1.2 評估方法 26 3.2 實驗結果 27 3.3 探討與分析 30 第4章 結論與未來研究方向 34 參考文獻 35

    【1】Alan W. Black and Nick Campbell, “Optimising Selection of Units from Speech Databases for Concatenative Synthesis,” in Proc. of EUROSPEECH, pp.581–584, Sep. 1995.
    【2】Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura, “Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis,” in Proc. of EUROSPEECH, pp.2347–2350, 1999.
    【3】Keiichi Tokuda, Heiga Zen, Junichi Yamagishi, Takashi Masuko, Shinji Sako, Alan W. Black, and Takashi Nose, “The HMM-based Speech Synthesis System (HTS),” http://hts.sp.nitech.ac.jp/ .
    【4】Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, and Keiichi Tokuda, “The HMM-based Speech Synthesis System Version 2.0,” in Proc. of ISCA SSW6, pp.294–299, Aug. 2007.
    【5】Toshiaki Fukada, Keiichi Tokuda, Takao Kobayashi and Satoshi Imai, “An Adaptive Algorithm for Melcepstral Analysis of Speech,” in Proc. of ICASSP, vol.1, pp.137–140, 1992.
    【6】Julian James Odell, “The Use of Context in Large Vocabulary Speech Recognition,” PhD dissertation, Cambridge University, 1995.
    【7】Keiichi Tokuda, Takao Kobayashi and Satoshi Imai, “Speech Parameter Generation from HMM Using Dynamic Features,” in Proc. of ICASSP, pp.660–663, 1995.
    【8】Takashi Masuko, Keiichi Tokuda, Takao Kobayashi and Satoshi Imai, “Speech Synthesis from HMMs Using Dynamic Features,” in Proc. of ICASSP, pp.389–392, 1996.
    【9】Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi and Tadashi Kitamura, “Speech Parameter Generation Algorithms for HMM-based Speech Synthesis,” in Proc. of ICASSP, vol.3, pp.1315–1318, June 2000.
    【10】Satoshi Imai, “Cepstral Analysis Synthesis on the Mel Frequency Scale,” in Proc. of ICASSP, pp.93–96, 1983.
    【11】Keiichi Tokuda, Heiga Zen, and Alan W. Black, “An HMM-based Speech Synthesis System Applied to English,” IEEE Speech Synthesis Workshop, 2002.
    【12】Sacha Krstulović, Anna Hunecke, and Marc Schröeder, “An HMM-based Speech Synthesis System Applied to German and its Adaptation to a Limited Set of Expressive Football Announcements,” in Proc. of Interspeech, 2007.
    【13】Xavi Gonzalvo, Ignasi Iriondo, Joan Claudi Socoró, Francesc Alías, Carlos Monzo, “HMM-based Spanish Speech Synthesis using CBR as F0 Estimator,” in ITRW on NOLISP, 2007.
    【14】張智星,“離散的隱藏式馬可夫模型” URL:http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/hmmDiscrete.asp?title=15-2 Discrete HMM
    【15】教育部重編國語辭典修訂本URL:http://dict.revised.moe.edu.tw/ .
    【16】Jiang-Chun Chen, and Jyh-Shing Roger Jang, “TRUES: Tone Recognition Using Extended Segments,” ACM Transactions on Asian Language Information Processing, 2008.
    【17】Myron J. Ross, Harry L. Shaffer, Andrew Cohen, Richard Freudberg, and Harold J. Manley, “Average Magnitude Difference Function Pitch Extractor,” IEEE Transactions on Acoustics, Speech and Signal Processing, 1974.
    【18】David Talkin, “A Robust Algorithm for Pitch Tracking (RAPT),” in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal Eds., pp.495–518, 1995.
    【19】B. S. Atal, “Automatic Speaker Recognition Based on Pitch Contours,” PhD thesis, Polytechnic Institute of Brooklyn, 1968.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE