基於GPU加速之巨量音訊指紋系統｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖維楷 Liao, Wei-Kai
論文名稱：	基於GPU加速之巨量音訊指紋系統 GPU based for Large-scale Audio Fingerprinting System
指導教授：	張智星 Jang, Jyh-Shing 張俊盛 Chang, Jason S.
口試委員:	呂仁園 Ren-Yuan Lyu 徐嘉連 Jia-Lien Hsu
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2014
畢業學年度：	102
語文別：	中文
論文頁數：	44
中文關鍵詞：	音樂檢索、音訊指紋、記憶體、固態硬碟
外文關鍵詞：	music retrieval, audio fingerprinting, SSD
相關次數：	點閱：3 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

在本論文中，我們使用音訊指紋(Audio Fingerprinting, AFP) 建置在75萬首歌曲的巨量資料庫上，並以GPU (graphical processing unit) 進行平行化運算。此系統可以提供使用者利用手機，快速地錄製任何時候、任何地方所聽到的歌曲，並將錄製好的歌曲片段作為搜尋目標，在藉由GPU加速之音訊指紋系統中找到最相似的歌曲與其相關資訊。
為了解決演算法對歌曲長度與曲目總數的限制，我們針對AFP計算中，擷取landmark的步驟進行改良，將因為歌曲長度超過演算法max time限制所產生的不連續landmark區段進行重疊，使特定landmark複製後位移其時間點，避開不連續的時間點後再放入資料庫中。此方法在不同的max time下可以將比對歌曲的landmark個數還原至正常水準，使巨量資料庫維持其辨識效果。
接著為了使巨量的資料能在CPU與GPU的有限的記憶體中運算，我們將單一資料庫分散成數個子資料庫，並改良讀取資料庫的方法，使CPU記憶體與GPU記憶體的需求分別大幅減少99.84% 與 80%，讓資料庫的規模不再受限於記憶體，同時使一般的個人電腦上也可以運作巨量資料庫的音訊指紋系統。
最後，和原始系統相比，改良之後的系統需要較長的硬碟讀取時間，因此我們將資料庫放在SSD (Solid-state Drive) 硬碟中讀取，能夠使讀取時間相較於原本使用HDD (Hard Disk Drive) 加速近6倍的速度，減少讀取時花費的時間。

The goal of this research is to implement an audio fingerprinting system that works on a large-scale song database of 750 thousand songs and performs parallel computing with a GPU (graphical processing unit). Audio fingerprinting is a fast and robust musical retrieval method that allows a user to retrieve an intended song and its related information by recording a snippet of the song, even under a noisy environment.
In order to handle the algorithm’s limitation on maximum song length and the number of songs, we improve the landmark extraction step during AFP computation. If the length of a song exceeds the maximum time limit and causes discontinuity in start time of landmarks, we copy the landmarks which are close to the maximum time and then shift the landmarks to avoid the discontinuity; these shifted landmarks are added to the database. This method is able to maintain the number of landmarks under different maximum time settings and thus ensures a satisfactory performance under a large-scale database.
In addition, we split the database into several subsets and improve the data loading method so that the system is able to work with a large-scale database in the limited memory. In our method, the CPU and GPU memory requirement are drastically decreased by 99.84% and 80% respectively. Thus the system is no longer limited by the capacity of the available memory and can now work in any personal computer.
At last, our system is slower than baseline system due to the frequent reading from the database. To speed up the reading process, we use an SSD (Solid-state Drive) , which allows a 6 times faster reading speed than HDD (Hard Disk Drive) , as the storage device to accelerate the process.

摘要    I
Abstract    II
謝誌    III
目錄    IV
表目次    VI
圖目次    VII
第一章    緒論    1
1    研究動機    1
2    研究方向    2
3    相關研究    3
4    章節概要    4
第二章    音訊指紋系統    5
1    AFP簡介與流程架構    5
2    擷取landmark    6
3    建立資料庫    7
3.1    轉換hash key和hash value    7
3.2    儲存landmark    8
4    比對資料庫方法    9
4.1    從資料庫取回hash value    9
4.2    Landmark lookup : 還原歌曲編號與計算offset time    9
4.3    Landmark analysis : 整理歌曲編號並統計offset time    11
4.4    回傳最佳歌曲編號    12
5    CUDA (Compute Unified Device Architecture)簡介    12
6    LATTE系統簡介    13
7    LATTE於CUDA上之實作    15
7.1    GPU上的landmark lookup實作    15
7.2    GPU上的landmark analysis實作    16
7.3    GPU的landmark儲存方法    17
第三章    研究方法與實作    19
1    重疊時間不連續的landmark    19
2    分散的子資料庫與讀取改良    25
2.1    建立資料庫    25
2.2    循序辨識的架構    26
2.3    Hash value讀取方式改良    27
2.4    系統比較    28
第四章    實驗結果與分析討論    29
1    實驗環境設定    29
2    重疊時間不連續的landmark的結果分析    29
2.1    Landmark的復原數量    30
2.2    Landmark儲存碰撞的問題    31
2.3    資料庫資訊量分析    32
3    建立與讀取資料庫的實驗分析    33
3.1    CPU的記憶體用量    34
3.2    GPU的記憶體用量    36
3.3    辨識時間與結果分析    37
第五章    結論與未來展望    40
1    結論    40
2    未來工作    41
參考文獻    42
                                

【1】 SoundHound. http://www.soundhound.com/
【2】 Shazam. http://www.shazam.com/
【3】 Echonest. http://echonest.com/
【4】 TrackID. https://play.google.com/store/apps/details?id=com.sonyericsson.trackid
【5】 MusicID. http://musicid2.com/
【6】 Echoprint. http://echoprint.me/
【7】 LATTE. http://mirlab.org/demo/audioFingerprinting
【8】 Vijay Chandrasekhar, Matt Sharifi, and David A. Ross, “Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-by-Example Applications,” in Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR), 2011.
【9】 J. Haitsma and T. Kalker, “A Highly Robust Audio Fingerprinting System,” in Proceedings of the International Symposium on Music Information Retrieval, Paris, France, 2002.
【10】 Avery Li-Chun Wang, “An Industrial-Strength Audio Search Algorithm,” in Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR), 2003.
【11】 Avery Li-Chun Wang. The shazam music recognition service. Communications of the ACM, 49(8):44-48, 2006.
【12】 Yan Ke, Derek Hoiem, and Rahul Sukthankar, “Computer Vision for Music Identification,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, USA, June,2005.
【13】 Shumeet Baluja and Michele Covell, “Audio Fingerprinting: Combining Computer Vision & Data Stream Processing,” in Proc. of IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007.
【14】 Gustavo Poli, Alexandre L. M. Levada, João F. Mari, José Hiroki Saito, “Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA),” in Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing , SBAC-PAD 2007, Brazil, pp. 19–25, 2007.
【15】 Jun Li, Shuangping Chen, Yanhui Li, “The Fast Evaluation of Hidden Markov Models on GPU,” in IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, vol. 4:426-430, Nov., 2009.
【16】 J. Michalakes and M. Vachharajani, “GPU acceleration of numerical weather prediction,” in Proc. IEEE Int. Symp. Parallel Distributed Process., 2008, pp. 1–7.
【17】 Pascal Ferraro, Pierre Hanna, Laurent Imbert, and Thomas Izard, “Accelerating Query-by-Humming on GPU,” in Proceedings of the 10th International Conference on Music Information Retrieval, ISMIR 2009, pp. 279–284, 2009.
【18】 Chung-Che Wang, Chieh-Hsing Chen, Chin-Yang Kuo, Li-Ting Chiu and Jyh-Shing Roger Jang, “Accelerating Query by Singing/Humming on GPU: Optimization for Web Deployment,” in Proc. of IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.
【19】 NVIDIA Newsroom, “NVIDIA GPUs Tackle Big-Data Analytics And Search On Growing Number Of Leading Applications”, web resource, available: http://nvidianews.nvidia.com/Releases/NVIDIA-GPUs-Tackle-Big-Data-Analytics-and-Search-on-Growing-Number-of-Leading-Applications-951.aspx
【20】 Natalia Miranda, Fabiana Piccoli, Edgar Chávez, and Antonio Camarena-Ibarrola, “Finding Audio Fingerprinter Using GPU,” in Journal of Mecánica Computacional, Volume XXIX, pp 3127-3141, November 2010.
【21】 Natalia Miranda, Fabiana Piccoli, and Edgar Chávez, “Considering Pure GPU Model for an Audio Fingerprinting System,” in Journal of Mecánica Computacional, Volume XXX, pp 3033-3044, November 2011.
【22】 Adriana Sanabria, Jaime Vitola Oyaga, and C´esar Pedraza Bonilla, “Fast Parallel Algorithm for audio content retrieval on GPUs,” in Proc. of IEEE Conference on Computing Congress (CCC), May 2011.
【23】 NVIDIA, “2012 台灣CUDA程式設計大賽”, 2012 http://www.ccrc.nthu.edu.tw/CCOE_NTHU/GPU/contest.html
【24】 Dan Ellis (2009), “Robust Landmark-Based Audio Fingerprinting”, web resource, available: http://labrosa.ee.columbia.edu/matlab/fingerprint/

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文