簡易檢索 / 詳目顯示

研究生: 卓真弘
Zenhon Zhuo
論文名稱: 改進以地標為基礎的音訊指紋辨識
On the Improvement of Landmark-based Audio Fingerprinting
指導教授: 張智星
Jyh-Shing Roger Jang
王炳豐
Biing-Feng Wang
口試委員: 冀泰石
Tai-Shih Chi
劉奕汶
Yi-Wen Liu
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 54
中文關鍵詞: 音樂檢索聲紋辨識重新排序PrankingRanking SVM基因演算法
外文關鍵詞: music retrieval, audio fingerprinting, re-ranking, Pranking, Ranking SVM, Genetic Algorithm
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 摘要
    音樂聲紋辨識是一種快速的音樂檢索方式,透過麥克風收音,將錄製的歌曲傳送到辨識系統進行運算,最後將最符合的結果回傳給使用者。在本論文中介紹了幾種不同的音樂聲紋辨識方法,並且提出關於前處理的改良方法,以及使用了新的特徵和機器學習改良原來的重新排序方法。我們在前處理的時候將power spectrum能量小於0的數值設成0,由於這些部分無法抗噪,並且加上了一個頻率方向的高通濾波器。我們在改良重新排序的部分,使用Haistma [8]的方法抽取新的特徵並比較相似度,最後使用機器學習方法找出新特徵和原有的兩個特徵相似度的加權總和來改進原有的重新排序的辨識率。使用的三種機器學習方法是Pranking,Ranking SVM,基因演算法,其中表現最好的是基因演算法。最後的結果可以讓辨識率從81.21%提高到86.04%。


    Abstract
    Audio Fingerprint (AFP) is a fast way of music retrieve. It first records a segment of a music through the microphone on a cellphone or tablet device, and sends the recorded segment to the server for AFP computation. This paper describes several audio fingerprint methods, and put forward improved methods for preprocessing, and using new feature with machine learning to improve the original re-ranking method used. In the preprocessing phase, we set those value in power spectrum with energy smaller than 0 to 0, because these values are not robust to noise, and a high pass filter in the frequency direction added. To improve the original re-ranking method, we extract the new feature with the method proposed by Haistma [8] and compare the similarity of the new feature. Then, using machine learning methods to find a weighted sum of the simlilarities of the new feature and two original features. The machine learning methods used are Pranking, Ranking SVM and Genetic Algorithm. Genetic Algorithm have the best performance among the three methods. The final result is recognition rate raised from 81.21% to 86.04%.

    目錄 摘要 I Abstract II 謝誌 III 目錄 IV 圖目次 VI 表目次 VIIII 緒論.....................................................................................................1 1.1 研究背景.............................................................................................1 1.2 研究目的.............................................................................................1 1.3 相關研究.............................................................................................2 1.4 章節概要..............................................................................................3 音樂聲紋辨識系統................................................................................4 2.1 系統簡介及流程....................................................................................4 2.2 抽取特徵流程....................................................................................... 5 2.3 辨識歌曲............................................................................................... 8 2.4 重新排序............................................................................................. 10 2.5 Anguera的方法................................................................................. 12 2.6 Burges的方法.....................................................................................13 2.7 Haistma的方法...................................................................................17 研究方法與實作..................................................................................21 3.1 研究方法概念.......................................................................................21 3.2 前處理的改進....................................................................................... 22 3.3 改進比對峰點的方法........................................................................... 25 3.4 Haistma方法特徵的實作................................................................... 27 3.5 機器學習的方法................................................................................. 29 實驗結果.......................................................................................... 37 4.1 實驗環境設定.................................................................................. 37 4.2 實驗資料庫介紹.............................................................................. 37 4.3 原先方法的辨識率和辨識時間...................................................... 38 4.4 改進峰點比對時間的結果............................................................... 38 4.5 前處理改良的實驗結果................................................................... 39 4.6 選取重新排序歌曲的方法............................................................... 45 4.7 選取重新排序歌曲的數量............................................................... 47 4.8 使用機器學習改進重新排序的實驗結果....................................... 48 4.9 同時使用改良的前處理和機器學習............................................... 51 結論與未來的工作..............................................................................53 5.1 結論.................................................................................................. 53 5.2 未來的工作........................................................................................ 53 文獻參考............................................................................................................. 54 圖目次 圖 2.1.1 辨識流程架構圖....................................................................................... 4 圖2.2.1 抽取特徵流程圖..................................................................................... 5 圖 2.2.2頻譜圖,橫軸代表時間,縱軸代表頻率..............................................6 圖 2.2.3 峰點圖 ................................................................................................. 7 圖 2.2.4 峰點組成地標圖.................................................................................... 8 圖 2.3.1從雜湊表取出相對應雜湊鍵的雜湊值................................................. 8 圖 2.3.2偏移時間示意圖..................................................................................... 9 圖 2.3.3統計MLC............................................................................................. 10 圖 2.4.1 重新排序示意圖 .............................................................................. 11 圖 2.4.2 MPC比對示意圖 .............................................................................. 12 圖 2.5.1 Anguera方法的區塊圖.................................................................... 13 圖 2.6.1 雙層類神經網路示意圖..................................................................... 15 圖 2.6.2 投影示意圖 .......................................................................................16 圖 2.7.1 時間差示意圖...................................................................................... 18 圖 2.7.2 Haistma方法特徵圖....................................................................... 19 圖 3.2.1 雙聲道問題示意圖........................................................................... 22 圖 3.2.2 雙聲道問題會造成的影響............................................................... 23 圖 3.2.3 兩個濾波器的方向........................................................................... 25 圖 3.3.1 峰點表比對的示意圖....................................................................... 26 圖 3.4.1 比對Haistma方法特徵的示意圖.................................................. 28 圖 3.5.1投影後的示意圖................................................................................ 31 圖 3.5.2 Pranking修正b圖.......................................................................... 31 圖 3.5.3 Pranking修正w圖.......................................................................... 31 圖 3.5.4 Ranking SVM修正模型示意圖........................................................33 圖 3.5.5 基因演算法流程圖........................................................................... 34 圖 3.5.6 基因演算法-交配............................................................................. 35 圖 3.5.6 基因演算法-突變............................................................................. 36 圖 4.5.1 Baina測試資料th與辨識率的關係圖........................................... 42 圖 4.5.2 Baina測試資料th與平均使用時間的關係圖................................ 42 圖4.6.1 三種挑選重新排序歌曲的方法比較.................................................. 46

    文獻參考
    [1] Shazam, web resource, available: http://www.shazam.com/
    [2] SoundHound, web resource, available: http://www.soundhound.com/
    [3] TrackID in Google play, web resource, available: https://play.google.com/store/apps/details?id=com.sonyericsson.trackid
    [4] Avery Li-Chun Wang, An Industrial-Strength Audio Search Algorithm, ISMIR, 2003.
    [5] CC. Wang, MH Lin, JSR Jang, W Liou, An Effective Re-ranking Method Based on Learning to Rank for Improving Audio Fingerprinting, APSIPA, 2014.
    [6] Christopher J. C. Burges, John C. Platt, and Soumya Jana, Distortion Discriminant Analysis for Audio Fingerprinting, IEEE Transactions on Speech and Audio Processing, 2003.
    [7] Xavier Anguera, Antonio Garzon and Tomasz Adamek, MASK: Robust Local Features for Audio Fingerprinting, IEEE International Conference on Multimedia and Expo, 2012.
    [8] Jaap Haitsma, Ton Kalker, A Highly Robust Audio Fingerprinting System, ISMIR, 2002.
    [9] H. Malvar, Auditory masking in audio compression, in Audio Anecdotes, K. Greenebaum, Ed. New York: Peters, 2001.
    [10] Henrique Malvar, A Modulated Complex Lapped Transform and its Applications to Audio Processing, IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999
    [11] [Online]. Available: http://www.syntrillium.com/cooledit.
    [12] Koby Crammer, Yoram Singer, Pranking with Ranking, Proceedings of the Conference on Neural Information Processing Systems (NIPS), 2002
    [13] K. Diamantaras and S. Kung, Principal Component Neural Networks. New York: Wiley, 1996.
    [14] Thorsten Joachims, Optimizing Search Engines using Clickthrough Data, in Proceeding of the ACM Conference on Knowledge Discovery and Data Mining(KDD), 2002

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE