以歌聲語料庫為主的中文歌聲合成系統｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	林姿瑩 Tzu-Ying Lin
論文名稱：	以歌聲語料庫為主的中文歌聲合成系統 A Corpus-based Singing Voice Synthesis System for Mandarin Chinese
指導教授：	張智星 Jyh-Shing Roger Jang
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2005
畢業學年度：	93
語文別：	英文
論文頁數：	48
中文關鍵詞：	歌聲合成、語料庫設計、基週同步疊加法、相似波形疊加法、抖音
外文關鍵詞：	Singing Voice Synthesis, Corpus Design, PSOLA, WSOLA, Vibrato
相關次數：	點閱：67 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文主要研究方向為以大語料庫為主的中文歌聲合成系統，並同時闡述在歌聲合成技術上相關的基本實作與其他加強技巧。
首先，介紹歌聲合成的概況並針對歌聲與語音的不同特性簡單說明。另對於聲音分析與歌聲合成相關的前人研究做一概括性的描述。
在系統整體方面，我們針對中文歌聲合成而設計了三種歌聲語料庫，並說明在語料單元選取的相關演算法設計，之後則詳述單元合成技術以及針對歌聲特性所產生的進一步處理，也同時介紹數種加強的音效處理，以期達到更富自然度的中文合成歌聲系統。最後，進行相關測試並歸納實驗結果與未來可待加強之處。

In this research, the application of the Corpus-based Singing Voice Synthesis for Mandarin Chinese was proposed. The basic extensions and improvements of the techniques for singing voice synthesis were also proposed.
First, the design rules of three corpora for singing voice synthesis for Mandarin Chinese have been presented. Methods of corpora design and preprocessing specifically for the singing voice were also developed. Dynamic programming (similar to Viterbi search) was then applied to select the optimum synthesis units based on the combination of two distance functions. Furthermore, several sound effects, such as echo, vibrato, background music, etc. were implemented to enhance the naturalness of synthesized sounds. Finally, a simple listening test was done for three kinds of synthesis setups to verify the feasibility of this system.

Summary    i
Summary(摘要)    ii
Acknowledgments    iii
Contents    iv
List of Figures    vii
Chapter 1.    Introduction    1
1.1.    Singing Voice Synthesis    1
1.2.    Singing Voice and Speech    1
1.3.    Research Overview    3
Chapter 2.    Background    4
2.1.    Voice Analysis and Synthesis    4
2.2.    Approaches of Singing Voice Synthesis    6
2.2.1.    Physical Models    6
2.2.2.    Formant-based Synthesis    7
2.2.3.    Sinusoid-based Synthesis    8
Chapter 3.    Singing Corpus Construction    9
3.1.    Design of Mandarin Singing Corpus    9
3.1.1.    Single-syllable-based Corpus (SSC)    10
3.1.2.    Coarticulation-based Corpus (CC)    11
3.1.3.    Song-based Corpus (SC)    13
3.2.    Processing of Singing Voice Corpora    14
Chapter 4.    Synthesis of the Singing Voice    16
4.1.    System Overview    16
4.2.    Unit Selection    18
4.2.1.    Design of two distance functions    18
4.3.    Unit Modification & Concatenation    22
4.3.1.    Time Stretching    23
4.3.2.    Pitch Shifting    24
4.4.    Advanced Vocal Control for Singing Voice    27
4.4.1.    Pitch Contour Smoothing    27
4.4.2.    Unvoiced Ratio Modification    28
4.4.3.    Jitter and Shimmer    30
4.4.4.    Vibrato    31
4.4.5.    Echo    33
Chapter 5.    Results and Discussions    35
Chapter 6.    Conclusions    37

                                

[1]
A. Spanias. Speech coding: A tutorial review. Proceedings of the IEEE, 82:1539–
1582, 1994.
[2]
Alan W. Black and N. Campbell, Optimising selection of units from speech databases for concatenative synthesis, In Eurospeech95, volume 1, pages 581-584, Madrid, Spain, 1995.
[3]
Bonada, J. Celma, O. Loscos, A. Ortolà, J. Serra, X. ,Singing Voice Synthesis Combining Excitation plus Resonance and Sinusoidal plus Residual Models, Proceedings of International Computer Music Conference 2001. Havana, Cuba.
[4]
Chowning, J. Frequency modulation synthesis of the singing voice. In Mathews, M. and Pierce, J., editors, Current Directions in Computer Music Research, chapter 6, pages pp. 57–63. MIT Press, Cambridge, Massachusetts.1989.
[5]
Dudley, H.. The vocoder. Bell Laboratories record. 1939.
[6]
F. J. Charpentier and M. G. Stella, Diphone synthesis using an overlap-add technique for speech waveforms concatenation, International Conference on Acoustics, Speech, and Signal Processing, 1986.
[7]
Fu-chiang Chou, Chiu-yu Tseng and Lin-shan Lee, “Automatic Segmental and Prosodic Labeling of Mandarin Speech”, Proceedings of International Conference on Spoken Language Processing, 1998, pp. 1263-1266.
[8]
Hung-Yan Gu, and Kuo-Hsian Wang, An Acoustic and Articulatory Knowledge Integrated Method for Improving Synthetic Mandarin Speech's Fluency, International Symposium on Chinese Spoken Language Processing 2004, Hong Kong, pp. 205-208, 2004.
[9]
J. A.Moorer. The use of the phase vocoder in computer music applications. Journal of the Audio Engineering Society, 26(1):42–45, 1978.
[10]
J. Makhoul. Linear prediction: A tutorial review. Proceedings of the IEEE, 63:1973–1986, 1975.
[11]
J. Sundberg, Synthesis of singing by rule, in Current Directions in Computer Music Research (M.V. Mathews and J. R.Pierce, eds.) , pp. 45-56, MIT Press, 1989.
[12]
Jon Rong-Wei Yi. Corpus-Based Unit Selection for Natural-Sounding Speech Synthesis. PhD thesis, Massachusetts Institute of Technology, May 2003.
[13]
Kelly, J. and Lochbaum, C. Speech synthesis. In Proceedings of the Fourth International Congress on Acoustics, pages pp. 1–4. 1962.
[14]
Kob, Malte. Physical modeling of the singing voice. 2002.
[15]
L. R. Rabiner and R.W. Schafer. Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs, NJ, 1978.
[16]
Macon M.W., L. Jensen-Link, J. Oliviero, M.A. Clements and E. Bryan George, A Singing Voice System Based on Sinusoidal Modeling, Proc. of International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 435-438, 1997.
[17]
Meron Y., High quality singing synthesis using the selection-base synthesis scheme, PhD dissertation, Univ. of Tokyo, 1999.
[18]
P. R. Cook. Identification of control parameters in an articulatory vocal tract model with applications to the synthesis of singing. PhD thesis, Stanford University, 1990.
[19]
Portele, T., Just CONcatenation - A corpus-based approach and its limits, Proc. 3rd ESCA/ COCOSDA Workshop on Speech Synthesis, Jenolan Caves, Australia, pp. 61-71. 1998.
[20]
R. J. McAulay and T. Quatieri. Speech analysis/synthesis based on a sinusoidal
representation. IEEE Transactions on Acoustics, Speech, and Signal Processing,
34:744–754, 1986.
[21]
Rodet, X. The Chant project: From the synthesis fo the singing voice to synthesis in general. Computer Music Journal, 8(3):pp. 15–31. 1984.
[22]
SAMPA, computer readable phonetic alphabet <http://www.phon.ucl.ac.uk/home/sampa/home.htm>

[23]
Sundberg, J. Synthesis of singing by rule. In Mathews, M. and Pierce, J., editors, Current Directions in Computer Music Research, chapter 5, pages pp. 45–55. MIT Press, Cambridge, Massachusetts. 1989.
[24]
TCC-300, <http://rocling.iis.sinica.edu.tw/ROCLING/MAT/Tcc_300brief.htm>

[25]
Titze, I. and Story, B. The Iowa singing synthesis. In Proceedings of the Stockholm Music Acoustics Conference. 1993.
[26]
W. Verhelst and M. Roelands, An overlap-add technique based on waveform similiarity (WSOLA) for high-quality time-scale modifications of speech, International Conference on Acoustics, Speech, and Signal Processing, 1993.
[27]
X. Rodet. Time-domain formant-wave-function synthesis. Computer Music Journal, 8(3):9–14, 1984.
[28]
Yamaha Corporation Advanced System Development Center. New Yamaha VOCALOID Singing Synthesis Software Generates Superb Vocals on a PC, 2003. <http://www.global.yamaha.com/news/20030304b.html.>

[29]
歌詞帝國, <http://www.kikikoko.idv.tw/>

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文