研究生: |
廖珮妤 Liao, Pei-Yu |
---|---|
論文名稱: |
用於音樂檢索的聲紋辨識改良 Improving Audio Fingerprinting for Music Retrieval |
指導教授: |
張智星
張俊盛 |
口試委員: |
張智星
張俊盛 呂仁園 王新民 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 71 |
中文關鍵詞: | 音樂檢索 、聲紋辨識 、支援向量機 、信心度測量 、分段查詢 |
外文關鍵詞: | confidence measure, segmental music query |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文中,我們針對現有的音樂聲紋辨識 (Audio Fingerprinting, AFP)技術進行改良。音樂聲紋辨識是一種快速的音樂檢索方式,使用者可在噪音環境下錄製一段正在播放的音樂片段,作為搜尋目標,在音樂聲紋辨識系統中找到最符合此播放音樂的歌曲。
為了提升本系統之辨識率,我們將查詢片段分類成易/不易尋找出正確答案的兩類,並建立一個分類機制:查詢片段在辨識之前,先以SVM作為分類器,進行查詢片段的分類,依據分類的結果進行四次或八次的特徵擷取,再進行辨識比對。此方法實驗得到的辨識率為 84.18%,接近特徵擷取八次的84.28%,且辨識時間比特徵擷取八次減少了2%的時間。
此外,在系統中加入查詢片段信心度指標的機制,來決定查詢片段是否在資料庫當中,若信心度不足的查詢片段則予以拒絕。在設置每秒與資料庫相符合的landmark個數為1.5時,可有效過濾約86%不在資料庫的查詢片段。
最後,使用者在系統預設的秒數下,若查詢片段與資料庫相符合的landmark個數已大於信心度門檻值,則可直接回傳結果,否則才延長秒數進行查詢。因此,在實驗方法上,我們將前面所用到的查詢片段等分兩段進行實驗。為了解決兩段邊界可能造成landmark遺失的問題,在第二段進行搜尋時先向前重疊15個音框。在尋找landmark時,也改以單一方向向後尋找,可解決於分段時,來回尋找landmark所造成會重複尋找前一個分段的landmark的問題。經此方法試驗,可比原始方法減少21%的回應時間,並提升2%的辨識率效果。
關鍵字:音樂檢索、聲紋辨識、landmark、支援向量機、信心度測量、分段查詢
The goal of this research is to improve the current audio fingerprinting technique. Audio fingerprinting is a fast and convenient music retrieval method that allows a user to retrieve an intended song and related information by recording a portion of the song under a noisy environment.
In order to improve the recognition rate of our system, we classify the queried segment into one of the two classes: easy or difficult to find the intended song. The recognition mechanism is as follows. Before the queried segment is recognized, we adopt SVM as our classifier to classify the queried segment. Depending on its class, we conduct 4 or 8 times of landmark finding on this query and then perform the matching step as usual. The recognition rate by using our method is 84.18%, which is close to 84.28% by using 8 times of landmarks finding, and the matching time is also reduced by 2% of the time required by using 8 times of landmarks finding.
In addition, we employ a verification mechanism using confidence measure in our audio fingerprinting system to determine if the query is in our database or not. If the confidence result is lower than a certain threshold, our system rejects this query. When we set the matched landmark count per second as 1.5, we can filter about 86% of queried segments which are not in our database.
At last, if the matched landmark count of the user-defined duration of the queried segment is greater than the confidence threshold, our system returns the result directly. Otherwise, the system extends the duration of the queried segment for searching and matching. Therefore, we divide a query into two parts with equal length to conduct the experiment. To solve the problem of missing landmarks on edge between two parts, we overlap 15 frames towards the front for the second part of the query segment. And we also find the landmarks forwards only. This effectively solves the problem of finding duplicate landmarks of the former segment when finding landmarks bidirectionally. Comparing to the original method, this method achieves a 21% reduction in response time and a 2% improvement in recognition rate.
Keywords: music retrieval, audio fingerprinting, landmark, SVM, confidence measure, segmental music query
【1】 SoundHound. http://www.soundhound.com/
【2】 Shazam. http://www.shazam.com/
【3】 Track ID in Google play. https://play.google.com/store/apps/details?id=com.sonyericsson.trackid
【4】 YouTube. http://www.youtube.com/
【5】 Y. Ke, D. Hoiem, and R. Sukthankar. Computer vision for music identification, CVPR, 2005.
【6】 S. Baluja and M. Covell. Audio fingerprinting: Combining computer vision and data stream processing, ICASSP, 2007.
【7】 S. Baluja and M. Covell. Waveprint: Efficient wavelet-based audio fingerprinting. In Proceedings of Pattern Recognition. 2008, 3467-3480.
【8】 Avery Wang. An Industrial-Strength Audio Search Algorithm, ISMIR, 2003
【9】 Vijay Chandrasekhar, Matt Sharifi, David A. Ross. Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-by-Example Applications, ISMIR, 2011
【10】 Jacobs, Finkelstein, Salesin. Fast Multiresolution Image Querying. Proc SIGGRAPH, 1995.
【11】 Cohen, et al. Finding interesting associations without support pruning. Knowledge and Data Engineering, 13(1):64-78, 2001.
【12】 Gionis, Indyk, Motwani. Similarity search in high dimensions via hashing. Proc. VLDB, pp. 518-529, 1999.
【13】 Myers, Rabiner. A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60(7): 1389-1409, 1981.
【14】 Y. Ke, D. Hoiem, and R. Sukthankar. Software. http://www.cs.cmu.edu/~yke/musicretrieval/
【15】 Baluja, Covell. Content fingerprinting using wavelets, Proc. CVMP (2006).
【16】 J. Haitsma and T. Kalker. A highly robust audio fingerprinting system. In Proceeding of International Conference on Music Information Retrieval, 2002
【17】 P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proceeding of Computer Vision and Patter Recognition, 2001.
【18】 Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceeding of International Conference on Machine Learning, 1996.
【19】 R. Schapire and Y. Singer. Improved boosting algorithm using confidence-rated predictions. Machine Learning, 37(1), 1999.
【20】 J. Haitsma and T. Kalker. A highly robust audio fingerprinting system. In Proceeding of International Conference on Music Information Retrieval (ISMIR), 2002.
【21】 P. Indyk and R. Motwani. Approximate nearest neighbor-towards removing the curse of dimensionality. In Proceeding of Symposium on Theory of Computing, 1998.
【22】 D. Lowe. Object recognition from local scale-invariant features. In Proceeding of International Conference on Computer Vision, 1999.
【23】 M. Fischler and R. Bolles. Random sample consensus: a paradigm for model fitting with applications image analysis and automated cartography. Communications of the ACM, 24(6). 1981.
【24】 A. Wang. The shazam music recognition service. Communications of the ACM, 49(8):44-48, 2006.
【25】 Dan Ellis. Robust Landmark-Based Audio Fingerprinting. http://labrosa.ee.columbia.edu/matlab/fingerprint/