簡易檢索 / 詳目顯示

研究生: 江蕙如
Chiang, Hui-ju
論文名稱: 華語韻律移植的改良
Improvement of Prosody Transplant for Mandarin Chinese
指導教授: 張智星
Jang, Jyh-Shing Roger
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 38
中文關鍵詞: 韻律移植
外文關鍵詞: UPDUDP, PSOLA, WSOLA
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文主要目的是改良華語韻律移植的系統,將原始語音的韻律參數:聲調、長度及音量轉換成目標語音。不僅保有原始語音的音色,更著重於自然度與清晰度。
    我們先對整句語句使用UPDUDP (unbroken pitch determination using dynamic programming)進行音高追蹤。首先對整個語句取出平均幅度差函數法的(average magnitude difference function, AMDF)時域特徵,再以動態規劃(dynamic programming, DP)擷取出連續而不中斷的音高特徵曲線。同時我們也用維特比譯碼器(viterbi decoding)進行強制對位(forced alignment)切出音素。
    求取基週標位(pitch mark)時,則是採用基於動態規劃方法來求取基週標位。首先對於非氣音部分語音,先決定波峰或波谷做為基週標位的選取方式,再對語音中每個固定範圍內選取三個候選的基週標位後,計算其State probability與Transition probability,最後以動態規劃方式求取最佳的基週標位。
    接下來取出非氣音部分的韻律參數進行韻律移植,合成方式是在時域上利用基週同步疊加法(pitch synchronous overlap and add, PSOLA)調整基週頻率以達成聲調轉換;用波形相似性疊加法(waveform similarity overlap and add, WSOLA)調整音長;再將原始語音與目標語音的每個基週標位到下一個基週標位的間隔當成音框大小(frame size)求取其音量矩陣,再利用音量矩陣調整原始語音的音量以達成音量轉換。


    The purpose of this study was to show that an enhanced prosody transplantation system for Mandarin Chinese not only achieve pitch conversion, duration conversion, and energy conversion, but also upgrade naturalness and eliminate distortions.
    UPDUDP (unbroken pitch determination using dynamic programming) is first applied to each utterance for pitch tracking which extracts an unbroken pitch contour from a given utterance based on time-domain acoustic feature of AMDF (average magnitude difference function) and DP (dynamic programming). Utterances are then segmented into phonemes using Viterbi decoding for forced alignment.
    A DP-based pitch marking method is utilized for detecting pitch marks in a reliable manner. First we select either peaks (local maxima) or valleys (local minima) for pitch mark candidates according to its similarity to an estimated pitch curve. Based on the candidates, we define state and transition probabilities and then employ DP to find the most likely pitch marks.
    The voiced characteristics of each utterance are then extracted to perform prosody transplantation. The PSOLA (pitch synchronous overlap and add) technique in time domain is used to adjust the fundamental frequency to achieve pitch conversion. WSOLA (waveform similarity overlap and add) is employed to adjust duration. Frame size is set to be the interval between one pitch mark and the next pitch mark of source wave and target wave. The volume for each frame is thus computed and linear mapping is executed. Energy conversion is then achieved by adjusting the source energy to the target energy.

    圖表目錄 VI 第一章 導論 - 1 - 1.1 研究主題 - 1 - 1.2 相關研究簡介 - 2 - 1.3 語音合成介紹 - 3 - 1.3.1語音合成器的架構 - 3 - 1.3.2 語音合成方法簡介 - 4 - 1.4 本論文研究方法 - 5 - 1.5 章節概要 - 6 - 第二章 韻律移植的前處理 - 7 - 2.1 華語語音特徵與韻律參數介紹 - 7 - 2.1.1華語語音特徵 - 7 - 2.1.2韻律參數介紹 - 7 - 2.2 音高追蹤的方法 - 8 - 2.3 基週標位的方法 - 10 - 第三章 華語韻律移植系統 - 14 - 3.1 聲調轉換 - 14 - 3.1.1 PSOLA合成技術 - 14 - 3.2 長度轉換 - 15 - 3.3 音量轉換 - 16 - 第四章 實驗結果與分析 - 18 - 4.1 實驗一:音量轉換方式對波形的影響 - 18 - 4.1.1實驗目的 - 18 - 4.1.2實驗環境 - 18 - 4.1.3實驗設定 - 18 - 4.1.4實驗流程 - 19 - 4.1.5實驗結果與分析 - 20 - 4.2 實驗二:基週標位方法對聲調轉換及音量轉換的影響 - 22 - 4.2.1實驗目的 - 22 - 4.2.2實驗環境 - 22 - 4.2.3實驗設定 - 22 - 4.2.4實驗流程 - 23 - 4.2.5實驗結果與分析 - 23 - A.聲調轉換的結果 - 23 - B.音量轉換的結果 - 26 - 4.3 實驗三:氣音部分調整音量與否對合成語音的影響 - 28 - 4.3.1實驗目的 - 28 - 4.3.2實驗環境 - 29 - 4.3.3實驗設定 - 29 - 4.3.4實驗流程 - 29 - 4.3.5實驗結果與分析 - 29 - 4.4 錯誤分析 - 30 - 4.4.1音高追蹤 - 30 - 4.4.2聲調轉換 - 33 - 第五章 結論與未來工作 - 35 - 參考文獻 - 36 - 附錄、實驗語句 - 38 -

    [1] W. Verhelst, T. Ceyssens, P. Wambacq, “On Inter-Signal Transplantation
    of Voice Characteristics”, Proc. 3rd IEEE Benelux Signal Processing
    Symposium (SPS-2002),Leuven, Belgium, March 21-22, 2002.
    [2] 江克敬, 華語韻律轉換之研究與實作., 國立清華大學資訊工程所, 碩士論
    文, 2008.
    [3] E. Helander and J. Nurminen, “A novel method for prosody prediction in voice
    conversion”, in Proc. of ICASSP, vol. 4, pp. 509–512, 2007.
    [4] W. Verhelst, M. Borger, “Intra-Speaker Transplantation of Speech Char-
    acteristics”, proceedings of Eurospeech‘91, pp.1319-1322, 1991.
    [5] W. Verhelst, D. Van Compemolle and P. Wambacq. “A Unified View On
    Synchronized Overlap-Add Methoods for Prosodic Modification of Speech”, in
    Proc. of ICSLP 2000., pp. 11.63-11.66, 2000.
    [6] W. Verhelst and H. Brouckxon, “Voice modification for lip synchronisation,
    voice dubbing and karaoke”, Proc.1st IEEE Benelux Workshop on Model based
    Processing and Coding of Audio (MPCA-2002), pp.41–44, 2002.
    [7] 石文俐, 中文語音合成之韻律產生器的改良與研究., 國立清華大學資訊工
    程所, 碩士論文, 2006.
    [8] 詹詩涵, 基於音高調節之歌聲合成系統., 國立清華大學資訊系統及應用研究所, 碩士論文, 2006.
    [9] T. Styger, E. Keller, “Formant synthesis”, In E. Keller (Ed.), Fundamentals in Speech Synthesis and Speech Recognition, pp. 109–128, 1994.
    [10] E. Moulines, F. Charpentier, “Pitch Synchronous Waveform Processing
    Techniques for Text-to-Speech Synthesis using Diphones”, Speech Communication 9 (5,6), pp. 453-467, 1990.
    [11] W. Verhelst, and M. Roelands, “An overlap-add technique based on waveform
    similarity (WSOLA) for high quality time-scale modification of speech”
    Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE
    International Conference on, p.554-557, 1993.
    [12] Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, 2008.
    [13] Y. R. Chao, “A grammar of spoken Chinese”, University of California Press, Berkeley and Los Angeles, California, 1968.
    [14] M. J. Ross et al., “Average Magnitude Difference Function Pitch Extractor,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. ASSP 22, pp. 353-362, 1974
    [15] Cheng-Yuan Lin and J.-S. Roger Jang, "A Two-Phase Pitch Marking Method for TD-PSOLA Synthesis", GESTS International Transaction on Speech Science and Engineering, No. 2, Vol. 1, pp. 211-221, Dec 2004.
    [16] Speech Filing System: UCL open tools for speech research. Software available at http://www.phon.ucl.ac.uk/resource/sfs/

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE