研究生: |
饒彥章 Jao, Yen-Chang |
---|---|
論文名稱: |
改進線性伸縮以用於哼唱選歌 Improving Linear Scaling for Query-by- Singing/Humming |
指導教授: |
張智星
Jang, Jyh-Shing 張俊盛 Chang, Jason S. |
口試委員: |
呂仁園
Renyuan Lyu 徐嘉連 Jia-Lien Hsu |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 45 |
中文關鍵詞: | 音樂檢索 、哼唱選歌 、線性伸縮 、黃金比例搜尋法 、序列誤差向量 |
外文關鍵詞: | music retrieval, query-by-singing/humming, linear scaling, golden section search, sorted error vector |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文中,我們提出了一種有效改善哼唱選歌(query by singing/humming, QBSH)的整合架構。其中包含了三種不同的改進方法。第一種方法,是利用黃金比例搜尋法(golden section search)減少傳統線性伸縮(linear scaling)的比對耗時。第二種方法,是針對音高向量(包括使用者的哼唱以及資料庫歌曲)中的休止符加入不同的權重,以減少休止符對距離計算的影響。第三種方法,則是在比對音高向量時,利用序列誤差向量(sorted error vector)的概念,忽略一部分差異過大的距離值,而改使用剩餘的距離值作為比對距離。這是為了減少因使用者哼唱技巧不足或是音高追蹤錯誤,導致的短暫音高偏差所造成的影響。
我們提出的整合方案,不僅能夠縮短辨識所需的時間(方法一),同時也提升了辨識的正確率(方法二、方法三)。根據我們在MIR-QBSH資料庫與測試語料的實驗中,我們獲得了21.4%的誤差縮減比例(error reduction rate)並減少了49.3%的比對耗時。
This thesis proposes an improved framework for improving both the efficiency and the effectiveness of a query by singing/humming (QBSH) system. The proposed framework is based on three methods. Method 1 uses golden section search to reduce the computation time in traditional linear scaling (LS) algorithm. Method 2 assigns different weights for rests (in both database songs and in queries) so that these rests now have less effect on computing the weighted distance. Method 3 utilizes a sorted error vector to ignore the LS distances that are overly large and only considers the rest of the LS distances in the computation. This reduces the effect of pitch deviation in a short time span, probably due to the singer being out of tune or errors in pitch track-ing.
The proposed framework improves the baseline system in both the computation time reduction (via scheme 1) and recognition accuracy (via schemes 2 and 3) of LS-based QBSH. Our experiment shows an error reduction rate of 21.4% in accuracy and 49.3% decrease in computation time on the MIR-QBSH dataset.
[1] SoundHound, http://www.soundhound.com
[2] Shazam, http://www.shazam.com/
[3] Rodger J. McNab, Lloyd A. Smith, Ian H. Witten, Clare L. Henderson, Sally Jo Cunningham, “Towards the Digital Music Library: Tune Retrieval from Acoustic input,” in Proc. the 1st ACM international conference, pp. 11–18, 1996.
[4] J.-S. Roger Jang and Ming-Yang Gao, “A Query-by-Singing System based on Dynamic Programming”, International Workshop on Intelligent Systems Resolu-tions(the 8th Bellman Continuum), pp. 85-89, 2000.
[5] J.-S. Roger Jang, Hong-Ru Lee, Ming-Yang Kao, “Content-based Music Retriev-al Using Linear Scaling and Branch-and-bound Tree Search”, IEEE International Conference on Multimedia and Expo, pp. 289-292, 2001.
[6] Norman H. Adams, Mark A. Bartsch, Gregory H. Wakefield, “Note Segmenta-tion and Quantization for Music Information Retrieval”, IEEE Transactions on Audio, Speech, and Language Processing, Volume 14, pp 131-141, 2006
[7] M. Ryynänen and A. Klapuri, “ Query by Humming of MIDI and Audio Using Locality Sensitive Hashing, ” in Proc. 2008 IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP'08),pp , 2008-2012
[8] L. Wang, S. Huang, S. Hu, J. Liang, B. Xu, “Improving Searching Speed and Accuracy of Query by Humming System Based on Three Methods: Feature Fu-sion, Candidates Set Reduction and Multiple Similarity Measurement Rescoring”, 9th Annual Conference of the International Speech Communication Association(INTERSPEECH 2008), pp. 2024-2027, 2008.
[9] “Golden section search”, from Wikipedia, http://en.wikipedia.org/wiki/Golden_section_search
[10] Kiefer, J., “Sequential minimax search for a maximum”, Proceedings of the American Mathematical Society 4(3), pp 502–506, 1953
[11] X. Wu, M. Li, J. Liu, J. Yang, Y. Yan, “A top-down approach to melody match in pitch contour for query by humming,” in Proc. International Conference of Chi-nese Spoken Language Processing, 2006.
[12] D. Ke, B. Xu, “Chinese intonation assessment using SEV features”, in Proc. In-ternational Conference on Acoustics, Speech and Signal Processing(ICASSP ‘09), pp. 4853-4856, 2009
[13] L. Wang, “MIREX 2012 QBSH Task: YINLONG’s Solution”, Extended Ab-stract in 8th Music Information Retrieval Evaluation eXchange(MIREX ‘12)
[14] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, “Numerical Reci-pes: The Art of Scientific Computing(3rd ed.)”, “Section 10.2. Golden Section Search in One Dimension”, ISBN 978-0-521-88068-8, 2007
[15] C.-H. Chen, “Speedup Mechanism for Comparison of Query by Sing-ing/Humming over GPUs”, National Tsing Hua University, 2012