研究生: |
白宗儒 Pai, Tsung-Ju |
---|---|
論文名稱: |
一個適用於複音音樂之音高追蹤的混成法 A Hybrid Method for Pitch Tracking of Polyphonic Audio Music |
指導教授: |
張智星
Jang, Jyh-Shing |
口試委員: |
王新民
Wang, Hsin-Min 蔡偉和 Tsai, Wei-Ho |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2011 |
畢業學年度: | 99 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 音高追蹤 、隱藏式馬可夫模型 、頻譜分析 |
外文關鍵詞: | Pitch Tracking, Hidden Markov Model |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
一首音樂的主旋律通常很容易就能被人類辨識,尤其是歌唱類型的音樂,在這類的音樂中歌唱音高通常就是主旋律。但是要利用電腦來直接辨識音樂中的歌唱音高是一件相當困難的事情,對於電腦來說背景音樂就像是干擾人聲的雜訊。在本論文中我們使用了一個基本方法,此方法是以一連串的頻譜分析演算法所構成,大部分的演算法目的都在於提高人聲並且降低音樂,藉此使動態規劃方法的音高追蹤準確度提高。但此方法在人聲端點與頻率快速變化區域容易得到錯誤的音高,所以我們利用反轉訊號的時間軸得到相異的結果,並使用疊合找出不穩定的音高,再輔以隱藏式馬可夫模型訓練的音高抽取方法,使用投票法來對不穩定的音高進行修正。在本論文的方法裡,我們改進了基本方法在弱點區域的準確度,使得整體辨識率得到明顯的提升。
Human can easily recognize the main melody of a piece of music, especially of a song with singing voice, because pitch of singing voice usually represents the main melody of a song. However, it is not as easy for a computer to automatically detect the singing pitch from a song, because the background music acts as an interfering signal to the singing voice. In this thesis we propose a method that is composed of a series of spectrum analysis algorithms. Most of the existing algorithms focus on enhancing singing voice while reducing the background music so that the accuracy of singing pitch extraction can be improved. But these methods tend to yield incorrect pitch values near both endpoints of singing voice and sound segments with fast-varying frequencies. We therefore reverse the time axis of the song signal and overlap it with the original signal to find the segments that have unreliable pitch values. This method is assisted with another singing pitch extraction method incorporating hidden Markov models. A voting mechanism is adopted to justify and correct the unreliable pitch values. Experimental results show that the proposed method yields a better performance than the original baseline system in terms of raw pitch accuracy.
[1] M. Goto, “A Real-Time Music Scene Description System: Predominant-F0 Estimation for Detecting Melody and Bass Lines in Real-World Audio Signals,” Speech Communication, vol. 43, no. 4, pp.311–329, 2004.
[2] Y. Li and D. L. Wang, “Detecting Pitch of Singing Voice in Polyphonic Audio,” IEEE ICASSP, pp.17–20, 2005.
[3] G. E. Poliner and D. P. W. Ellis, “A Classification Approach to Melody Transcription,” 6th ISMIR, pp.161-166, 2005.
[4] K. Dressler, “An Auditory Streaming Approach on Melody Extraction,” Extended abstract for 7th ISMIR, 2006.
[5] C. Cao, M. Li, J. Liu and Y. Yan, “Singing Melody Extraction in Polyphonic Music by Harmonic Tracking,” 8th ISMIR, 2007.
[6] V. Rao and P. Rao, “Melody Extraction Using Harmonic Matching,” Extended abstract for 9th ISMIR, 2008.
[7] J.-L. Durrieu, G. Richard and B. David, “An Iterative Approach to Monaural Musical Mixture De-soloing,” IEEE ICASSP, pp. 105-108, 2009.
[8] M. Ryynanen and A. Klapuri, "Transcription of the Singing Melody in Polyphonic Music," 7th ISMIR, pp. 222-227, 2006.
[9] Chao-Ling Hsu, Liang-Yu Chen, Jyh-Shing Roger Jang, and Hsing-Ji Li, “Singing Pitch Extraction From Monaural Polyphonic Songs By Contextual Audio Modeling and Singing Harmonic Enhancement”, International Society for Music Information Retrieval, Kobe, Japan, Oct. 2009.
[10] Chao-Ling Hsu and Roger Jang, “SINGING PITCH EXTRACTION AT MIREX 2010,” The Music Information Retrieval Evaluation Exchange, 2010.
[11] K. Dressler, “Sinusoidal extraction using an efficient implementation of a multi-resolution FFT,” DAFx, pp. 247–252, 2006.
[12] H. Tachibana, T. Ono, N. Ono, and S. Sagayama, “Melody line estimation in homophonic music audio signals based on temporal-variability of melody source”, IEEE ICASSP, pp. 425-428, 2010
[13] N. Ono, K. Miyamoto, J. Le Roux, H. Kameoka, and S. Sagayama, “Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram,” Proceedings of EUSIPCO, 2008.
[14] Hermes, D. J. (1988). "Measurement of pitch by subharmonic summation," J. Acoust. Soc. Am. 83, 257-264.