簡易檢索 / 詳目顯示

研究生: 李函軒
Li, Han-Xuan
論文名稱: 基於隱藏式半馬可夫模型之中文文句轉語音系統及其模型調適與聲音轉換
Mandarin Chinese Text-to-Speech System Based on Hidden Semi-Markov Models and its Model Adaptation and Voice Conversion
指導教授: 王小川
Wang, Hsiao-Chuan
鐘太郎
Jong, Tai-Lang
口試委員: 陳信宏
Chen, Sin-Horng
王逸如
Wang, Yih-Ru
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 76
中文關鍵詞: 隱藏式半馬可夫模型模型調適聲音轉換
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 基於隱藏式半馬可夫模型(Hidden Semi-Markov Model)的文句轉語音系統(text-to-speech system),是以統計模型描述語音的合成單元及其狀態時長,將輸入的文句表示成一序列的語音合成單元,然後轉換成語音輸出。改變語音合成單元的模型參數,就可以改變合成的聲音,因此可以利用模型調適方法,使合成的語音接近於目標語者的聲音特質、情緒特徵或說話韻律節奏,達到聲音轉換(voice conversion)的目的。
    本論文更進一步利用目標語者語音的剩餘訊號(residual signal),加入其語音產生模型的激發訊號中,使合成語音更接近目標語者聲音。論文中提出兩種剩餘訊號加入的方法,並對於合成的語音進行主觀評量與客觀評量。在主觀實驗中,發現其中一種剩餘訊號加入法會在聽覺上感覺到不連續聲音,而另一種方法則沒有不連續的狀況。在客觀評量中則是計算合成語音與目標語者語音的高斯混合模型,量測各高斯混合模型之間的KL距離,看出兩種加入剩餘訊號的方法,都使得其合成語音更接近目標語者語音。


    摘要 2 目錄 3 圖目錄 5 表目錄 7 第一章 緒論 8 1.1 研究動機與目的 8 1.2 研究背景 9 1.2.1 單元選取合成法 9 1.2.2 統計參數合成法 11 1.2.3 聲音轉換 12 1.3 研究方向與系統架構 14 1.4 章節概要 17 第二章 使用方法與背景理論 18 2.1 中文語音特性 18 2.2 發音腔道模型 20 2.3 特徵參數擷取 21 2.3.1 頻譜特徵參數 22 2.3.2 激發訊號 24 2.4 隱藏式馬可夫模型與隱藏式半馬可夫模型 25 2.4.1 隱藏式馬可夫模型 26 2.4.2 隱藏式馬可夫狀態持續時間機率模型 29 2.4.3 隱藏式半馬可夫模型 30 2.5 文本相關模型與決策樹 32 2.5.1 文本相關模型 32 2.5.2 決策樹建立 34 2.5.3決策樹應用在調適隱藏式馬可夫模型語音與合成系統 37 2.6 隱藏式半馬可夫模型調適 39 2.6.1 設限轉換與不設限轉換 40 2.6.2 設限線性轉換(CMLLR) 41 2.6.3 設限決策樹結構最大事後機率線性轉換(CSMAPLR) 44 2.7 隱藏式馬可夫模型語音合成 46 第三章 系統實作 48 3.1 系統之建構 48 3.2 語料庫內容 48 3.3 特徵參數抽取 50 3.4 問題集 50 3.5 模型訓練及調適 51 3.6 剩餘訊號選取及加入合成 53 第四章 實驗與成果 58 4.1 主觀聽覺實驗 58 4.2 客觀評量 65 第五章 結論與未來展望 67 參考資料及文獻 68 附錄一 71 附錄二 73 附錄三 74 附錄四 75 附錄五 76

    [1] M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara. "Voice conversion through vector quantization. " J. Acoust. Soc. Jpn. (E), Vol. 11, No. 2, pp.71–76, 1990.
    [2]T. Toda, A.W. Black, K. Tokuda. "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. " IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 8, pp. 2222-2235, Nov. 2007.
    [3] M. Charlier, Y. Ohtani, T. Toda, A. Moinet, T. Dutoit. "Cross-language voice conversion based on eigenvoices. " Proc. INTERSPEECH, pp. 1635-1638, Brighton, UK, Sep. 2009.
    [4] A. Kain and M.W. Macon. "Spectral voice conversion for text-to-speech synthesis. " Proc. ICASSP, pp. 285–288, Seattle, USA, May 1998.
    [5] A. Hunt and A.W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database." in Proc. ICASSP, 1996, pp. 373-376.
    [6] H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Hidden semi-Markov model based speech synthesis. "¸in Proc. of ICSLP, 2004, pp. 1185–1180.
    [7] K. Tokuda ,T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura. "Speech parameter generation algorigthms for HMM-based speech synthesis. " In Proc. ICASSP 2000, pages 1315–1318, June 2000.
    [8] T. Toda and K. Tokuda, “A speech parameter generation algorithm considering global variance for HMM-based speech synthesis,” IEICE Trans. Inf. & Syst., vol. E90-D, no. 5, pp. 816–824, May 2007
    [9] S. Imai, “Cepstral analysis synthesis on the mel frequency scale,”Proc. of ICASSP, pp.93–96, 1983.
    [10]J.Yamagishil,H.Zen,T Toda and K. Tokuda ,“Speaker-Independent HMM-based Speech Synthesis System— HTS-2007 System for the Blizzard Challenge 2007.”Blizzard 2007,2007
    [11] T. Kobayashi and S. Imai, “Spectral analysis using generalized cepstrum,”IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32,
    pp.1087–1089, Oct. 1984.
    [12] K. Tokuda, T. Kobayashi, T. Masuko and S. Imai,” Mel-generalized cepstral analysis— a unified approach to speech spectral estimation.” Proc.ICASSP,pp.1043-1064,1994
    [13]G. Muhammad“Extended Average Magnitude Difference Function Based Pitch Detection“ IAJIT Vol. 8, No. 2, April 2011 pp.197-203

    [14] K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi. Hidden Markov
    models based on multi-space probability distribution for pitch pattern
    modeling. In Proc. ICASSP-99, pages 229–232, March 1999.
    [15] T. Yoshimuray, K.Tokuday, T. Masukoyy, T. Kobayashiyy and T. Kitamuray.” Simultaneous modeling of spectrum, pitch and duration in hmm-based speech synthesis” Proc. of Eurospeech 1999. Budapest, Hungary.
    [16] H. Zen, T. Masuko, T. Yoshimura, K. Tokuda, T. Kobayashi, and
    T. Kitamura, “State duration modeling for HMM-based speech synthesis,” IEICE Trans. on Inf. & Syst., vol. E90-D, no. 3, pp.692–693, 2007.
    [17] K. Shinoda and T. Watanabe. MDL-based context-dependent subword
    modeling for speech recognition. J. Acoust. Soc. Japan (E), 21:79–86,March 2000.
    [18] L.F. Uebel, P.C. Woodland, “An Investigation into Vocal Tract Length Normalisation”, Proc. Eurospeech, Vol. 6, pp. 2527- 2530, Budapest, Hungary, Sep. 1999.
    [19] C.J. Leggetter & P.C. Woodland.”Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models.” Computer Speech & Language, Vol. 9, pp. 171-185.
    [20] O. Shiohan, T. Myrvoll, and C. Lee, “Structural maximum a posteriori
    linear regression for fast HMM adaptation,” Computer Speech & Language,
    vol. 16, no. 3, pp. 5–24, 2002.
    [21] J. Yamagishi, T. Kobayashi, Y.Nakano,K.Ogata and J.Isogai “Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm .“, IEEE.Speech,Audio,Lang.Process., Vol.17 JANUARY 2009
    [22] O. Shiohan, T. Myrvoll, and C. Lee, “Structural maximum a posteriori linear regression for fast HMM adaptation,” Computer Speech & Language, vol. 16, no. 3, pp. 5–24, 2002.

    [23] Jacob Goldberger, Shiri Gordon, and Hayit Greenspan, “An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures,” in Proc. of ICCV 2003, Nice, October 2003, vol. 1, pp. 487–493.
    [24] John R. Hershey and Peder A. Olsen “Approximating the kullback leibler divergence between gaussian mixture models”in ICASSP 2007,vol 4 pp. 317-320
    [25] Speech Signal Processing Toolkit (SPTK)
    http://sp-tk.sourceforge.net/
    [26] The Hidden Markov Model Toolkit (HTK)
    http://htk.eng.cam.ac.uk/
    [27] HMM-based Speech Synthesis System (HTS)
    http://hts.sp.nitech.ac.jp/
    [28] HTS Engine
    http://hts-engine.sourceforge.net/
    [29]wavesurfer
    http://www.speech.kth.se/wavesurfer/
    [30]中研院斷詞系統
    http://ckipsvr.iis.sinica.edu.tw/
    [31] 王小川,語音訊號處理, 全華圖書公司 2007

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE