研究生: |
詹詩涵 Shih-Han Chan |
---|---|
論文名稱: |
基於音高調節之歌聲合成系統 A Singing Voice Synthesis System Based On Pitch Curve Modulation |
指導教授: |
張智星
Jyh-Shing Roger Jang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 42 |
中文關鍵詞: | 歌聲合成 、音高曲線 、基頻軌跡 |
外文關鍵詞: | Singing Voice Synthesis, Pitch Curve, Pitch Contour |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在本論文中,我們藉由調整音高曲線的方式來提高合成歌聲的自然度。論文的重點在於探討如何產生與實際歌聲相近的音高曲線,以作為合成歌聲的依據,並且提出兩種方式實作:(1)使用支撐向量機(SVM, Support Vector Machine)方法來預測音高曲線; (2)使用我們提出的規則式基礎的音高預測方程式,來模擬十種不同條件下的音高曲線。此外,我們使用基於基週同步為基礎的cross-fading 方法來解決語音接合不連續的問題,且加入了抖音 (Vibrato)和回響音(Reverberation)等特效來美化合成歌聲。最後,經由聽測實驗證實,相較於傳統歌聲合成方法,使用我們提出的規則式音高預測方程式將能使合成音色更自然悅耳。
In this study, a singing voice synthesis system is proposed. We improve the naturalness of the synthetic singing voice via the modification of pitch curves.
Our goal is to produce a pitch curve similar to that of actual singing voice. We employ two methods for pitch-curve prediction: In the first method, we use support vector machine (SVM) to train a regression model to predict pitch curves. In the second method, we propose a rule-based approach comprising 10 manually-tuned equations for the pitch curves under different conditions.
In the second half of the thesis, we discuss the signal processing techniques that are applied to modify pitch, duration and volume. We further solve the problems of ill-articulated pronunciation and discontinuity in the syllable concatenation by using pitch synchronous based crossing fading approach. Moreover, we also create some euphonious effects, such as vibrato and reverberation.
Finally, we assess the performance of the proposed methods via pitch curve observation and a listening test experiment. It is verified that the proposed rule-based approach actually is able to make the synthetic singing voices more natural as compared with other traditional singing voice synthesis approaches.
[1] Homer W. Dudley, The vocoder, Bell Laboratories record, 1939.
[2] J. Makhoul, Linear prediction: A tutorial review, Proc. IEEE, Vol. 63,
pp.561-580, 1975.
[3] R. J. McAulay and T. Quatieri, Speech analysis/synthesis based on a
sinusoidal representation, IEEE Transactions on Acoustics, Speech, and
Signal Processing, vol. 34, pp.744-754, 1986.
[4] Yi-Ru Wang, Vector Quantization of Pitch Information in Mandarin Speech,
IEEE Transaction on Communications, Vol. 38, No. 9, 1990.
[5] P. R. Cook., Identification of control parameters in an articulatory vocal tract
model with applications to the synthesis of singing, PhD thesis, Stanford
University, 1990.
[6] H. Valbret and E. Moulines and J.P. Tubach, Voice transformation using
PSOLA technique, Acoustics, Speech, and Signal Processing, 1992.
ICASSP-92, 1992. IEEE International Conference, Vol. 1, 1992.
[7] William H. Press, Numerical Recipes in C, The Art of Scientific Computing,
Cambridge University Press, 1992.
[8] W. Verhelst and M. Roelands, An overlap-add technique based on waveform
similiarity (WSOLA) for high-quality time-scale modifications of speech,
International Conference on Acoustics, Speech, and Signal Processing, 1993.
[9] Fang-Wen Shaw, Synthesis of Chinese Songs, Master thesis, NCTU, 1994.
[10] Ken. C. Pohlmann, Principles of Digital Audio, McGraw-Hill, New York, pp. 360,
1995.
[11] Sundberg J., The Human Singing Voice, Chapter 139 in Encyclopedia of
Acoustics, Malcolm J. Crocker, Ed., pp.1687-1695., John Wiley and Sons, Inc.,
1997.
[12] M. W. Macon and L. Jensen-Link and J. Oliverio and M. Clements and E. B.
George, Concatenation-based MIDI-to-singing voice synthesis, 103rd Meeting
of the Audio Engineering Society, 1997.
[13] Cheng-Yuan Lin, The Synthesis and Implementation of Mandarin Chinese
Songs, Master thesis, NTHU, 2001
[14] Jordi Bonada and Oscar Celma and Alex Loscos and Jaume Ortola and Xavier
Serra, Singing Voice Synthesis Combining Excitation plus Resonance and
Sinusoidal plus Residual Models, Pro. International Computer Music
Conference, 2001.
- 42 -
[15] Xavier Rodet, Synthesis and Processing of the Singing Voice, 1st IEEE
Benelux Workshop on Model based Processing and Coding of Audio
(MPCA-2002), 2002.
[16] Chih-Jen Lin, Training ν-Support Vector Regression: Theory and Algorithms,
MIT Press Journals, Neural Computation, Vol. 14, No. 8, pp.1959-1977, 2002
[17] Matthew E. Lee, Mark J. T. Smith, Digital Singing Voice Synthesis Using A New
Alternating Reflection Model, Proc. ISCAS-2002, Vol. 2, pp.341-344, 2002.
[18] Sheng-Szu Hao, Real-Time Singing Voice Synthesis System and Integration
with the Instrument-Sound Synthesis, Master thesis, NTUST, 2002.
[19] Jordi Bonada, Alex Loscos, Sample-based Singing Voice Synthesizer by
Spectral Concatenation, Stockholm Music Acoustic Conference, 2003.
[20] Mao-Yuan Hsu, A Study of Naturalness improvement for Mandarin Chinese
Singing Voice Synthesis, Master thesis, NTHU, 2004
[21] Tzu-Ying Lin, A Corpus-based Singing Voice Synthesis System for Mandarin
Chinese, Master thesis, NTHU, 2005
[22] Ying-Kae Tzeng , The Synthesis of Voice Signal, Master thesis, NTU, 2005
[23] Huang-Liang Liao, Improving of Signal Quality for Mandarin Singing Voice
Synthesis, Master thesis, NTUST, 2006
[24] OGI CSLU Speech Syntheis Research Group, Flinger, Festival Singer, URL
http://cslu.cse.ogi.edu/tts/flinger/
[25] Yamaha Corporation Advanced System Development Center, New Yamaha
VOCALOID Singing Synthesis Software Generates Superb Vocal on a PC,
2003-2005, URL http://www.vocaloid.com/en/
[26] http://gnese.free.fr/Projects/KaraokeTime/Fichiers/karfaq.html