簡易檢索 / 詳目顯示

研究生: 郭景揚
Kuo, Chin-Yang
論文名稱: 使用GPU加速哼唱選歌比對
Accelerating query by singing/humming on GPU
指導教授: 張智星
Jang, Jyh-Shing Roger
張俊盛
Chang, Jason S.
口試委員: 李哲榮
鍾葉青
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 52
中文關鍵詞: 音樂檢索哼唱選歌
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 哼唱選歌 (Query by Singing/Humming, QBSH) 是一種將使用者的哼唱聲做為輸入,再從歌曲資料庫中找出最符合使用者哼唱歌曲的技術。本論文將研究如何改善哼唱選歌系統中的比對演算法,利用GPU平行運算的特性,加速比對速度,並且同時提高其辨識率。
    目前本實驗室的哼唱選歌系統,使用兩種比對演算法,一種是Linear Scaling (LS),另一種是時間複雜度較高的Dynamic Time Warping (DTW) 比對方法,本論文將DTW演算法由CPU平台移植到GPU平台,並且提出一種改善演算法的作法,讓DTW的運算可以高度平行化,有效利用GPU的運算能力。當只使用原調做為輸入音高進行測試,不產生其他移調的情況下,使用CPU對整個資料庫做DTW比對查詢,需要149秒的運算時間,而採用GPU的DTW演算法只需耗費0.4至0.8秒,兩相比較之下,藉由GPU的加速可達200~350倍。
    測試用輸入語料來自於MIR實驗室,取自2011年清大資工科學計算課程學生所錄製的哼唱片段,總共有818 組長度約八到十秒的錄音片段,經本論文實作的DTW比對查詢,Top-10的辨識率可以達到69.9%。
    將DTW的比對時間大幅縮短後,就能經由更多次的移調,來提高辨識率,且維持讓使用者能夠接受的系統反應速度,讓使用DTW演算法的哼唱選歌系統能夠同時服務更多使用者。


    A query-by-singing/humming (QBSH) system is a technique that takes the user’s humming sound as input to find the most matching song from the song database. This study intends to discuss how to improve the matching algorithm in a QBSH system using GPU parallel computing features more efficiency, and at the same time to improve the recognition rate.
    The current version of QBSH system discussed in this study adopts two types of matching algorithms: one method is called linear scaling (LS), and the other method is the more time-costly dynamic time warping (DTW). This study ports the DTW algorithms from the original CPU environment to the GPU environment and proposes a method to improve the algorithm so that the DTW computation can utilize the computing power of GPU effectively to achieve a higher level of parallel computing. By using the original tune as input, without using any key transposition, it would take 149 seconds for the CPU to search through the entire song database; on the other hand, DTW algorithm running on GPU only takes 0.4 to 0.8 second, rendering a computing time reduction ratio around 1/200 to 1/350.
    The test corpus was constructed by MIR laboratory, using the humming fragments recorded by students of the 2011 scientific computing class in National Tsing Hua University. A total of 818 8-to-10-seconds audio clips was collected. Experimental result shows that our QBSH system can achieve 69.9% top-10 recognition rate.
    Since the computation time for DTW is greatly reduced, more key transposition trials can be adopted to achieve a higher recognition rate while keeping the response time of the QBSH system within an acceptable limit, and thus the system can provide services to more users simultaneously.

    摘要 I Abstract II 目錄 IV 表目次 VI 圖目次 VII 第一章 緒論 9 1.1 研究主題 9 1.2 相關研究簡介 9 1.3 研究方向及主要成果 10 1.4 章節概要 10 第二章 優化旋律辨識系統MIRACLE 11 2.1 旋律辨識系統MIRACLE簡介 11 2.2 CUDA (Compute Unified Device Architecture) 簡介 14 2.3 動態時間扭曲演算法 (Dynamic Time Warping) 18 2.3.1 調整路徑 21 2.4 實作哼唱選歌核心 22 2.4.1 載入歌曲資料庫 22 2.4.2 調整歌曲資料庫 23 2.4.3 調整使用者輸入音高向量 23 2.4.4 歌曲移調 24 2.4.5 計算Type-1 DTW 26 2.4.6 計算Type-2 DTW 28 2.4.7 回傳辨識結果 31 第三章 實驗結果與分析討論 32 3.1 實驗使用的測試語料及資料庫 32 3.2 實驗環境設定 33 3.3 分析使用CUDA架構前後的系統效能 34 3.4 辨識率分析 38 3.4.1 分析Type-1 辨識率 38 3.4.2 分析Type-2 辨識率 44 第四章 結論與未來工作 49 4.1 結論 49 4.2 未來工作 49 參考文獻 51

    [1] J.-S. R. Jang, H.-R. Lee, and M.-Y. Kao, “Content-based Music Retrieval Using Linear Scaling and Branch-and-Bound Tree search,” in Proc. of IEEE International Conference on Multimedia and Expo, August 2001.
    [2] J.-S. R. Jang, J.-C. Chen, and M.-Y. Kao. “MIRACLE: A Music Information Retrieval System with Clustered Computing Engines,” in Proceedings of the 2nd International Conference on Music Information Retrieval, ISMIR 2001, 2001.
    [3] G. Poli, A. L. M. Levada, J. F. Mari, J. H. Satio, “Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA),” in Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing , SBAC-PAD 2007, Brazil, pp. 19–25, 2007.
    [4] Astrid Yi, Omid Talakoub, “Implementing a Speech Recognition System on a Graphics Processor Unit (GPU) using CUDA”, 2009
    [5] D. Sart, A. Mueen, W. Najjar, E. Keogh, and V. Niennattrakul, “Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs,” in ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining, pp. 1001-1006, 2010.
    [6] Yaodong Zhang, Kiarash Adl, James Glass, "Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical processing Units", ICASSP 2012.
    [7] P. Ferraro, P. Hanna, L. Imbert, and T. Izart, “Accelerating Query-by-Humming on GPU,” in Proceedings of the 10th International Conference on Music Information Retrieval, ISMIR 2009, pp. 279–284, 2009.
    [8] Yi-Huan Li, ”An Extended Dynamic Time Warping Algorithm and Architecture”, 2008
    [9] NVIDIA, “NVIDIA CUDA C Programming Guide Version 5.0.”
    [10] Paulius Micikevicius,“CUDA Optimization”, http://gpgpu.org/wp/wp-content/uploads/2009/11/SC09_Optimization_Micikevicius.pdf , 2009
    [11] Jang, J.-S Roger, Ming-Yang Kao, “A Query-by-Singing System based on Dynamic Programming,” International Workshop on Intelligent Systems Resolutions (the 8th Bellman Continuum), PP. 85-89, Hsinchu, Taiwan, Dec 2000.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE