簡易檢索 / 詳目顯示

研究生: 邱莉婷
Chiu, Li-Ting
論文名稱: 適用於複音音樂之HMM音高追蹤器的改良
Improving HMM-based Pitch Trackers for Polyphonic Music
指導教授: 張智星
口試委員: 蘇文鈺
李俊仁
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 66
中文關鍵詞: 音高追蹤隱藏式馬可夫模型
外文關鍵詞: Pitch Tracking, Hidden Markov Model
相關次數: 點閱:3下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出一個以隱藏式馬可夫模型(Hidden Markov Model, HMM)為基礎的旋律抽取法,針對含有歌唱成分的複音音樂進行歌唱音高抽取,配合不同的方式進行特徵擷取,挑選出更趨穩定的峰值特徵,以提高音高曲線的raw pitch準確性。
    首先,我們使用快速傅立葉轉換(Fast Fourier Transform, FFT)將合理的人聲音高頻率範圍切割為數個頻段,將其視為隱藏式馬可夫模型中的不同狀態;同時利用人聲與樂器聲在時域及頻域上的差異,自原始音樂訊號中過濾掉部分背景音樂,再以normalized sub-harmonic summation等不同的方法,對每個音框擷取出強度最為明顯的峰值特徵,將每個狀態訓練出高斯混合模型以計算似然率(Likelihood);最後,配合狀態間的轉移機率(Transposition Probability)與似然率值,以維特比解碼(Viterbi Decoding)得出連續歌唱音高曲線。經實驗分析後,本論文方法的辨識效果較使用單一的峰值抽取法有明顯的提升。


    This thesis proposes an audio melody extraction based on hidden Markov model (HMM) to extract the singing pitches from polyphonic music. The goal is to improve the raw pitch accuracy by applying different methods to select more stable peaks.
    First, fast Fourier transform is applied to divide a reasonable frequency range of human voice into several frequency bins, and these bins are considered as different states in the HMM. Second, we take advantage of the temporal and spectral variability between human voice and instruments to filter out most of the background instruments from the origin signal. After that, we apply several methods, such as normalized sub-harmonic summation, to select the peaks at each frame as features, and each state is modeled as a GMM. Lastly, Viterbi decoding is performed based on the transition probability between states and state occupation likelihood to find out a continuous singing pitch contour. Experimental result shows that the proposed approach achieves a better performance than the single peak extraction approach.

    摘要 I Abstract II 謝誌 III 目錄 IV 表目次 VII 圖目次 VIII 第一章 緒論 10 1.1 研究背景 10 1.2 研究目的 10 1.3 章節概要 11 第二章 相關研究 12 2.1 Hsu’s Method 14 2.1.1 Harmonic/Percussive Sound Separation 14 2.1.2 Normalized Sub-harmonic Summation 17 2.1.3 Singing Pitch Trend Estimation 20 2.1.4 DP-based Pitch Extraction 24 2.2 以隱藏式馬可夫模型為基礎的旋律抽取法 26 2.2.1 特徵擷取 27 2.2.2 狀態轉移機率 27 2.3 Pai’s Method 29 2.3.1 系統架構 29 2.3.2 不穩定音高判定分析 30 2.4 Salamon’s Method 32 2.4.1 Sinusoid Extraction 33 2.4.2 Salience Function 37 2.4.3 Pitch Contour Creation 38 2.4.4 Melody Selection 40 第三章 研究方法 41 3.1 隱藏式馬可夫模型 41 3.2 系統架構 43 3.3 特徵擷取與標準化 44 3.3.1 NSHS 45 3.3.2 ACF(Auto-correlation Function) 45 3.3.3 AMDF(Average Magnitude Difference Function) 47 3.3.4 NSDF(Normalized Square Difference Function) 48 3.4 狀態轉移機率及似然率 50 3.4.1 頻段資料分布 50 3.4.2 轉移機率之計算 51 3.4.3 似然率之計算 52 3.5 維特比解碼 53 第四章 實驗結果與分析 54 4.1 訓練及測試語料簡介 54 4.2 實驗一:使用不同高斯混合個數之比較 55 4.2.1 實驗目的及設定 55 4.2.2 實驗結果與分析 55 4.3 實驗二:不包含靜音狀態時使用不同特徵擷取方式之比較 56 4.3.1 實驗目的及設定 56 4.3.2 實驗結果與分析 58 4.4 實驗三:包含靜音狀態時使用不同特徵擷取方式之比較 59 4.4.1 實驗目的及設定 59 4.4.2 實驗結果與分析 59 4.5 整體結果 60 第五章 結論與建議 62 5.1 結論 62 5.2 未來研究方向 62 參考文獻 64

    【1】 M. Goto, “A Real-Time Music Scene Description System: Predominant-F0 Estimation for Detecting Melody and Bass Lines in Real-World Audio Signals,” Speech Communication, vol. 43, no. 4, pp.311–329, 2004.
    【2】 G. E. Poliner and D. P. W. Ellis, “A Classification Approach to Melody Transcription,” 6th ISMIR, pp.161-166, 2005.
    【3】 Chao-Ling Hsu and Roger Jang, “SINGING PITCH EXTRACTION AT MIREX 2010,” The Music Information Retrieval Evaluation Exchange, 2010.
    【4】 白宗儒,“一個適用於複音音樂之音高追蹤的混成法”,清華大學碩士論文,2011年
    【5】 M. Ryynänen and A. Klapuri, “Transcription of the Singing Melody in Polyphonic Music,” 7th ISMIR, pp. 222-227, 2006.
    【6】 Chao-Ling Hsu, Liang-Yu Chen, Jyh-Shing Roger Jang, and Hsing-Ji Li, “Singing Pitch Extraction From Monaural Polyphonic Songs By Contextual Audio Modeling and Singing Harmonic Enhancement,” International Society for Music Information Retrieval, Kobe, Japan, Oct. 2009.
    【7】 J. Salamon and E. Gómez, “Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics,” IEEE Transactions on Audio, Speech and Language Processing, 20(6):1759-1770, Aug. 2012.
    【8】 N. Ono, K. Miyamoto, J. Le Roux, H. Kameoka, and S. Sagayama, “Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram,” Proceedings of EUSIPCO, 2008.
    【9】 H. Tachibana, T. Ono, N. Ono, and S. Sagayama, “Melody line estimation in homophonic music audio signals based on temporal-variability of melody source,” IEEE ICASSP, pp. 425-428, 2010.
    【10】 D. J. Hermes, “Measurement of Pitch by Subharmonic Summation,” Journal of Acoustic Society of America, vol.83, pp. 257-264, 1988.
    【11】 K. Dressler, “Sinusoidal extraction using an efficient implementation of a multi-resolution FFT,” DAFx, pp. 247–252, 2006.
    【12】 D. W. Robinson and R. S. Dadson, “A re-determination of the equal-loudness relations for pure tones,” British J. of Applied Physics, vol. 7, pp. 166–181, 1956.
    【13】 H. Fletcher and W. A. Munson, “Loudness, its definition, measurement and calculation,” Journal of the Acoustic Society of America, vol.5, pp. 82-108, 1933.
    【14】 J. L. Flanagan and R. M. Golden, “Phase vocoder,” Bell Systems Technical Journal, vol. 45, pp. 1493–1509, 1966.
    【15】 A. Klapuri, “Multiple fundamental frequency estimation by summing harmonic amplitudes,” in Proc. 7th Int. Conf. on Music Inform. Retrieval, Victoria, Canada, pp. 216–221, Oct 2006.
    【16】 Lawrencer R. Rabiner, “On the Use of Autocorrelation Analysis for Pitch Detection,” IEEE Trans. ASSP, vol. 25, pp. 24-33, Feb 1977.
    【17】 M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H. J.Manley, “Average magnitude difference function pitch extractor,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-22, pp. 353-362, Oct 1974.
    【18】 P. McLeod and G. Wyvill, “A Smarter Way to Find Pitch,” Proc. International Computer Music Conference, Barcelona, Spain, pp. 138-141, September 2005.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE