研究生: |
吳嘉彧 Wu, Chia-Yu |
---|---|
論文名稱: |
頻譜頻軸映對結合線頻譜頻率映對之語者特質轉換系統 A Study on Voice Conversion System using Frequency Mapping and LSF Mapping |
指導教授: |
王小川
Wang, Hsiao-Chuan |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2009 |
畢業學年度: | 97 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 語者特質轉換 、平行句語料 、共振峰映對 、線頻譜頻率映對 |
外文關鍵詞: | voice conversion, parallel sentence corpus, formant mapping, LSF mapping |
相關次數: | 點閱:56 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語者特質轉換的研究已被探討多年,且有廣泛的運用:文字轉語音系統的後端處理(整合不同語者語料庫、增加輸出語音情緒起伏)、語言翻譯系統的後端處理(保持語者特質)、改善聽障者聽辨能力及語言學習系統輔助(聽力補償、轉換為使用者較熟悉的聲音使其較易接受)等等。
除了提升轉換相似度以及保持語音品質,也要考慮實用層面會遇到的問題。從早期使用的向量量化碼本對照到目前被廣為使用的高斯混合模型,用於訓練的語料多使用經動態時軸校準的平行對應語句。若考量平行對應句錄製問題和使用者期望的便利性,用來訓練的語料數量必須減少,而且近年來也出現針對跨語言語者特質轉換的研究,因為難以取得平行對應句,研究使用的方法和語料都有所調整。
目前已有依據語者共振峰特性不同所做的頻譜頻軸映對轉換、結合不同方法的線頻譜頻率轉換、及調整語者間轉換函式特徵參數以減少非平行句轉換錯誤等方法被提出,使用的訓練語料較以往減少,也開始使用非平行句的語料進行語者特質轉換。
考慮來源和目標語者沒有內容相同的語句可供訓練,頂多會有發出相同音節的情況,本論文的研究目標即為不採用平行句訓練,而僅依據語者音節共振峰特性做頻譜頻軸映對,並結合線頻譜頻率轉換,進行語者特質轉換的研究。
Voice conversion has been used in many applications. The methods based on vector quantization codebook and Gaussian mixture models need dynamic time warping on parallel sentence corpus for generating mapping functions. Recent study tries to use less training data, and even without parallel sentence corpus. This paper presents a voice conversion method without using parallel sentence corpus. It applies the formant mapping and line spectral frequency mapping to accomplish a voice conversion system.
[1] Abe, M., Nakamura, S., Shikano, K., Kuwabara, H., “Voice Conversion through Vector Quantization,” Proc. IEEE ICASSP pp.665-658, 1988.
[2] M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, “Transformation of formants for voice conversion using artificial neural networks,” Speech Communication, vol. 16, pp. 207-216, 1995.
[3] E. K. Kim, S. Lee, and Y. H. Oh, “Hidden Markov model based voice conversion using dynamic characteristics of speaker,” Proc. EUROSPEECH, vol. 5, Rhodes, Greece, 1997.
[4] Stylianou, Y., Cappe, O., and Moulines E., “Continuous Probabilistic Transform for Voice Conversion,” IEEE Trans.on Speech and Audio Processing, vol.6, no.2, pp.131-142, 1998.
[5] Tomoki Toda, Hiroshi Saruwatari, and Kiyohiro Shikano, “Voice Conversion Algorithm based on Gaussian Mixture Model with Dynamic Frequency Warping of STRAIGHT Spectrum,” Proc. IEEE ICASSP, pp. 841-844, 2001.
[6] Zhiwei Shuang, Raimo Bakis, and Yong Qin, “Voice Conversion Based On Mapping Formants,” TC-STAR Workshop on Speech-to-Speech Translation, pp.219-223, 2006.
[7] Zdenek Hanzlicek, Jindrich Matousek, “On Using Warping Function for LSFs Transformation in a Voice Conversion System,” Proc. IEEE ICSP 2008.
[8] Kun Liu, Jianping Zhang, and Yonghong Yan, “High Quality Voice Conversion through Combining Modified GMM and Formant Mapping for Mandarin,” IEEE ICDT 2007.
[9] Elina Helander, Jani Nurminen, and Moncef Gabbouj, “LSF Mapping for Voice Conversion with very small training sets,” IEEE ICASSP 2008.
[10] H. Höge, “Project Proposal TC-STAR - Make Speech to Speech Translation Real,” Proc. LREC’, Las Palmas, Spain, 2002.
[11] D. Sündermann, H. Höge, A. Bonafonte, H. Ney, and J. Hirschberg, “TC-Star: cross-language voice conversion revisited,” TC-Star Workshop on Speech-to-Speech Translation, 2006.
[12] Athanasios Mouchtaris, Jan Van der Spiegel, and Paul Mueller, “Nonparallel Training for Voice Conversion Based on a Parameter Adaptation Approach, ” IEEE Trans.on Speech and Audio Processing, vol.14, no.3, 2006.
[13] H. Valbret, E. Moulines, and J. P. Tubach, “Voice Transformation using PSOLA Technique, ” Proc. IEEE ICASSP. San Francisco, USA, pp. 145-148, 1992.
[14] Yinqiu Gao, Zhen Yang, “Pitch modification based on syllable units for voice morphing system,” IFIP International Conference on Network and Parallel Computing Workshops 2007.
[15] 王小川,“語音訊號處理”全華科技圖書, 2004