研究生: |
邱莉婷 Chiu, Li-Ting |
---|---|
論文名稱: |
適用於複音音樂之HMM音高追蹤器的改良 Improving HMM-based Pitch Trackers for Polyphonic Music |
指導教授: | 張智星 |
口試委員: |
蘇文鈺
李俊仁 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 66 |
中文關鍵詞: | 音高追蹤 、隱藏式馬可夫模型 |
外文關鍵詞: | Pitch Tracking, Hidden Markov Model |
相關次數: | 點閱:3 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一個以隱藏式馬可夫模型(Hidden Markov Model, HMM)為基礎的旋律抽取法,針對含有歌唱成分的複音音樂進行歌唱音高抽取,配合不同的方式進行特徵擷取,挑選出更趨穩定的峰值特徵,以提高音高曲線的raw pitch準確性。
首先,我們使用快速傅立葉轉換(Fast Fourier Transform, FFT)將合理的人聲音高頻率範圍切割為數個頻段,將其視為隱藏式馬可夫模型中的不同狀態;同時利用人聲與樂器聲在時域及頻域上的差異,自原始音樂訊號中過濾掉部分背景音樂,再以normalized sub-harmonic summation等不同的方法,對每個音框擷取出強度最為明顯的峰值特徵,將每個狀態訓練出高斯混合模型以計算似然率(Likelihood);最後,配合狀態間的轉移機率(Transposition Probability)與似然率值,以維特比解碼(Viterbi Decoding)得出連續歌唱音高曲線。經實驗分析後,本論文方法的辨識效果較使用單一的峰值抽取法有明顯的提升。
This thesis proposes an audio melody extraction based on hidden Markov model (HMM) to extract the singing pitches from polyphonic music. The goal is to improve the raw pitch accuracy by applying different methods to select more stable peaks.
First, fast Fourier transform is applied to divide a reasonable frequency range of human voice into several frequency bins, and these bins are considered as different states in the HMM. Second, we take advantage of the temporal and spectral variability between human voice and instruments to filter out most of the background instruments from the origin signal. After that, we apply several methods, such as normalized sub-harmonic summation, to select the peaks at each frame as features, and each state is modeled as a GMM. Lastly, Viterbi decoding is performed based on the transition probability between states and state occupation likelihood to find out a continuous singing pitch contour. Experimental result shows that the proposed approach achieves a better performance than the single peak extraction approach.
【1】 M. Goto, “A Real-Time Music Scene Description System: Predominant-F0 Estimation for Detecting Melody and Bass Lines in Real-World Audio Signals,” Speech Communication, vol. 43, no. 4, pp.311–329, 2004.
【2】 G. E. Poliner and D. P. W. Ellis, “A Classification Approach to Melody Transcription,” 6th ISMIR, pp.161-166, 2005.
【3】 Chao-Ling Hsu and Roger Jang, “SINGING PITCH EXTRACTION AT MIREX 2010,” The Music Information Retrieval Evaluation Exchange, 2010.
【4】 白宗儒,“一個適用於複音音樂之音高追蹤的混成法”,清華大學碩士論文,2011年
【5】 M. Ryynänen and A. Klapuri, “Transcription of the Singing Melody in Polyphonic Music,” 7th ISMIR, pp. 222-227, 2006.
【6】 Chao-Ling Hsu, Liang-Yu Chen, Jyh-Shing Roger Jang, and Hsing-Ji Li, “Singing Pitch Extraction From Monaural Polyphonic Songs By Contextual Audio Modeling and Singing Harmonic Enhancement,” International Society for Music Information Retrieval, Kobe, Japan, Oct. 2009.
【7】 J. Salamon and E. Gómez, “Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics,” IEEE Transactions on Audio, Speech and Language Processing, 20(6):1759-1770, Aug. 2012.
【8】 N. Ono, K. Miyamoto, J. Le Roux, H. Kameoka, and S. Sagayama, “Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram,” Proceedings of EUSIPCO, 2008.
【9】 H. Tachibana, T. Ono, N. Ono, and S. Sagayama, “Melody line estimation in homophonic music audio signals based on temporal-variability of melody source,” IEEE ICASSP, pp. 425-428, 2010.
【10】 D. J. Hermes, “Measurement of Pitch by Subharmonic Summation,” Journal of Acoustic Society of America, vol.83, pp. 257-264, 1988.
【11】 K. Dressler, “Sinusoidal extraction using an efficient implementation of a multi-resolution FFT,” DAFx, pp. 247–252, 2006.
【12】 D. W. Robinson and R. S. Dadson, “A re-determination of the equal-loudness relations for pure tones,” British J. of Applied Physics, vol. 7, pp. 166–181, 1956.
【13】 H. Fletcher and W. A. Munson, “Loudness, its definition, measurement and calculation,” Journal of the Acoustic Society of America, vol.5, pp. 82-108, 1933.
【14】 J. L. Flanagan and R. M. Golden, “Phase vocoder,” Bell Systems Technical Journal, vol. 45, pp. 1493–1509, 1966.
【15】 A. Klapuri, “Multiple fundamental frequency estimation by summing harmonic amplitudes,” in Proc. 7th Int. Conf. on Music Inform. Retrieval, Victoria, Canada, pp. 216–221, Oct 2006.
【16】 Lawrencer R. Rabiner, “On the Use of Autocorrelation Analysis for Pitch Detection,” IEEE Trans. ASSP, vol. 25, pp. 24-33, Feb 1977.
【17】 M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H. J.Manley, “Average magnitude difference function pitch extractor,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-22, pp. 353-362, Oct 1974.
【18】 P. McLeod and G. Wyvill, “A Smarter Way to Find Pitch,” Proc. International Computer Music Conference, Barcelona, Spain, pp. 138-141, September 2005.