以旋律及歌詞資訊改良哼唱選歌及其GPU加速｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	王崇喆 Wang, Chung-Che
論文名稱：	以旋律及歌詞資訊改良哼唱選歌及其GPU加速 Improving Query-by-Singing/Humming by Using Melody and Lyric Information, with GPU Acceleration
指導教授：	張智星 Jang, Jyh-Shing 張俊盛 Chang, Jyun-Sheng
口試委員:	劉奕汶 Liu, Yi-Wen 王逸如 Wang, Yih-Ru 廖元甫 Liao, Yuan-Fu
學位類別：	博士 Doctor
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	72
中文關鍵詞：	結合旋律距離與歌詞相似度、哼唱選歌、哼唱分辨、圖形處理器加速
外文關鍵詞：	combined melody distance and lyric similarity, query-by-singing/humming (QBSH), singing/humming discrimination (SHD), GPU acceleration
相關次數：	點閱：96 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文對於哼唱選歌進行了加速與準確度的改良。我們提出了同時使用旋律和歌詞的資訊的方法。首先會進行哼唱分辨，以將「唱」和「哼」分離開來。對於「哼」的查詢，我們套用了只使用音高資訊的旋律辨識方法；對於「唱」的查詢，我們將旋律距離和歌詞相似度的結果合併，以利用額外的歌詞資訊。本論文中也使用了圖形處理器來進行加速旋律辨識的部分，我們選擇最耗時的資料庫比對部分來加速，並嘗試不同的平行方式以達效能最佳化。

This thesis proposes the acceleration and accuracy improvement of a query-by-singing/humming system. We use both melody and lyrics information to achieve better accuracy for query-by-singing/humming. Singing/humming discrimination is first performed to distinguish singing from humming queries. For a humming query, we apply a pitch-only melody recognition method. For a singing query, on the other hand, we combine melody distance and lyrics similarity to take the advantage of extra lyrics information. We also use graphical processing units to accelerate the melody recognition module. We choose to accelerate database comparison, the most time-consuming component of the system, and try different methods to optimize the performance.

Chapter 1.    Introduction    1
1    Related Work: Melody and Textual Lyric Information    3
2    Related Work: Singing Voice Recognition    5
3    Related Work: Singing/Humming Discrimination    6
4    Related Work: Melody/Lyric Information Combination    7
5    Related Work: GPU Acceleration for QBSH    8
Chapter 2.    Basis of QBSH and GPU    10
1    Basis of QBSH    10
2    GPUs' Architecture and Programming    14
Chapter 3.    Improving QBSH Using Melody and Lyrics Information    19
1    Phone and Syllable Similarity    20
2    Singing/Humming Discrimination    24
3    Singing Voice Recognition    27
4    Lyric Matching    30
5    Distance/Similarity Combination    32
6    Parallelization Schemes of LS    36
Chapter 4.    Experiments    39
1    Improved QBSH Using Melody and Lyrics Information    39
1.1    Experimental Setup    39
1.2    Experimental Results of SHD    40
1.3    Melody Recognition Results    47
1.4    Lyric Matching Results    48
1.5    Combined Results    51
2    Accelerating QBSH Using GPU    55
2.1    Dataset    55
2.2    Experiments    57
Chapter 5.    Conclusions and Future Work    64
References    65

                                

[1] A. J. Ghias, D. C. Logan, and B. C. Smith, “Query by humming-musical information retrieval in an audio database,” in Proc. ACM Multimedia '95, San Francisco, 1995, pp. 216-221.
[2] R. J. McNab, L. A. Smith, I. H. Witten, C. L. Henderson, and S. J. Cunningham, “Toward the digital music library: Tune retrieval from acoustic input,” in Proc. ACM Digital Libraries, 1996, pp. 11-18.
[3] J.-S. R. Jang and M.-Y. Gao, “A query-by-Singing system based on dynamic programming,” in Proc. Int. Workshop Intell. Syst. Resolutions (8th Bellman Continuum), Hsinchu, Taiwan, Dec. 2000, pp. 85-89.
[4] C.-Y. Chi, Y.-S. Wu, W.-R. Chu, D. C. Wu, J Y.-J. Hsu, and R. T.-H. Tsai, “The Power of Words: Enhancing Music Mood Estimation with Textual Input of Lyrics,” in Proc. International Conference on Affective Computing & Intelligent Interaction, pp. 1-6, 2009.
[5] T. Wang, D.-J. Kim, K.-S. Hong, and J.-S. Youn, “Music Information Retrieval System using Lyrics and Melody Information,” in Asia-Pacific Conference on Information Processing, pp. 601-604, 2009.
[6] X. Xu, M. Naito, T. Kato, and H. Kawai, “Robust and Fast Lyric Search Based on Phonetic Confusion Matrix,” in Proc. International Symposium on Music Information Retrieval, pp. 417-422, 2009.
[7] J.-S. R. Jang, H.-R. Lee, M.-Y. Kao, “Content-based Music Retrieval Using Linear Scaling and Branch-and-Bound Tree search,” in Proc. IEEE International Conference on Multimedia and Expo, August 2001.
[8] AT&T Labs Research, AT&T Labs Research - FSM Library, [Online]. Available: http://www2.research.att.com/~fsmtools/fsm/ , 2008.
[9] J.-S. R. Jang, “MIR-QBSH Corpus,” MIR Lab, CS Dept, Tsing Hua Univ, Taiwan, [Online]. Available at the "MIR-QBSH Corpus" link at http://mirlab.org/jang.
[10] J.-C. Chen, J.-S. R. Jang, “TRUES: Tone Recognition Using Extended Segments,” ACM Transactions on Asian Language Information Processing, No. 10, Vol. 7, Aug 2008.
[11] MIREX 2009, [Online]. Available: http://www.music-ir.org/mirex/wiki/2009:Query-by-Singing/Humming_Results, 2009
[12] M. Suzuki, T. Hosoya, A. Ito, and S. Makino, “Music Information Retrieval from a Singing Voice Based on Verification of Recognized Hypotheses,” in Proc. International Society for Music Information Retrieval Conference (ISMIR), 2006.
[13] J.-H Chen, “Content-based Music Emotion Analysis and Recognition,” Master Thesis, CS Dept., National Tsing Hua University, Taiwan, June 2006
[14] P. Papiotis and H. Purwins, “A Lyrics-Matching QBH System for Interactive Environments,” in Proc. Sound and Music Computing Conference, 2010.
[15] B. Schuller, G. Rigoll, and M. Lang, “Discrimination of Speech and Monophonic Singing in Continuous Audio Streams Applying Multi-Layer Support Vector Machines,” in Proc. IEEE International Conference on Multimedia and Expo, 2004.
[16] Y. Ohishi, M. Goto, K. Itou, and K. Takeda, “Discrimination between Singing and Speaking Voices,” in Proc. INTERSPEECH, 2005, pp. 1141-1144.
[17] D. Gärtner, “Singing/Rap Classification of Isolated Vocal Tracks,” in Proc. International Society for Music Information Retrieval Conference (ISMIR), pp. 519-524, 2010.
[18] M. Suzuki, T. Hosoya, A. Ito, and S. Makino, “Music Information Retrieval from a Singing Voice Using Lyrics and Melody Information,” EURASIP Journal on Advances in Signal Processing, vol. 2007, Article ID 38727, 8 pages, 2007. doi:10.1155/2007/38727
[19] R. Mayer, R. Neumayer, and A. Rauber, “Rhythm and Style Features for Musical Genre Classification by Song Lyrics,” in Proc. International Society for Music Information Retrieval Conference (ISMIR), 2008.
[20] A. Mesaros and T. Virtanen, “Recognition of Phonemes and words in Singing,” in Proc. 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Dallas, Texas, USA, 2010.
[21] A. Mesaros and T. Virtanen, “Automatic recognition of lyrics in singing,” EURASIP Journal on Audio, Speech and Music Processing, Volume 2010, 2010.
[22] Z. Guo, Q. Wang, G. Liu, J. Guo, and Y. Lu, “A Music Retrieval System Using Melody and Lyric,” in Proc. Multimedia and Expo Workshops (ICMEW), 2012.
[23] X. Wu, M. Li, J. Liu, J. Yang, and Y. Yan, “A top-down approach to melody match in pitch contour for query by humming,” in Proc. International Conference of Chinese Spoken Language Processing, 2006.
[24] S. Huang, L. Wang, S. Hu, H. Jiang, and B. Xu, “Query by humming via multiscale transportation distance in random query occurrence context,” in Proc. ICME, 2008.
[25] Mandarin microphone speech corpus - TCC300, [Online]. Available: http://www.aclclp.org.tw/use_mat.php#tcc300edu.
[26] Tang Poetry Corpus 2002-2006, [Online]. Available: http://mirlab.org/research/corpus/tangpoetry
[27] C.-L. Hsu and J.-S. R. Jang, “On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset,” IEEE Transactions on Audio, Speech, and Language Processing, 18(2):310-319, February 2010.
[28] C.-C. Wang, J.-S. R. Jang, and W. Wang, “An Improved Query by Singing / Humming System Using Melody and Lyrics Information,” in Proc. 11th International Society for Music Information Retrieval Conference, 2010.
[29] ESAC Data Homepage, [Online]. Available: http://www.esac-data.org/ , 2014.
[30] W.-T. Kao, C.-C. Wang, K. C. K Chang, J.-S. R. Jang, and W. S. Liou, “A two-stage query by singing/humming system on GPU,” in Proc. Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013.
[31] T. K. Ho, J. Hull, S. N. Srihari, “Decision Combination in Multiple Classifier Systems,” IEEE Transactions on Patter Analysis and Machine Intelligence (PAMI), Jan., 1994.
[32] A. Degani, M. Dalai, R. Leonardi and P. Migliorati, “A Heuristic for Distance Fusion in Cover Song Identification,” in Proc. 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 2013.
[33] M. McVicar, D. P. W. Ellis, and M. Goto, “Leveraging Repetition for Improved Automatic Lyric Transcription in Popular Music,” in Proc. 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014.
[34] X. Zhao and P. Li, “An Online Database of Phonological Representations for Mandarin Chinese,” in Proc. Behavior Research Methods, Volume 41, Issue 2, pp 575-583, May 2009.
[35] C.-C. Wang and J.-S. R. Jang, “Improving Query-by-Singing/Humming by Combining Melody and Lyric Information,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 4, 2015.
[36] G. Poli, A. L. M. Levada, J. F. Mari, J. H. Satio, “Voice Command Recognition with Dynamic Time Warping (DTW) using Graphics Processing Units (GPU) with Compute Unified Device Architecture (CUDA),” in Proc. 19th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2007, Brazil, pp. 19–25, 2007.
[37] J. Li, S. Chen, Y. Li, “The Fast Evaluation of Hidden Markov Models on GPU,” in Proc. IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, vol. 4:426-430, Nov., 2009.
[38] D. Sart, A. Mueen, W. Najjar, E. Keogh, and V. Niennattrakul, “Accelerating Dynamic Time Warping Subsequence Search with GPUs and FPGAs,” in Proc. IEEE International Conference on Data Mining, pp. 1001-1006, 2010.
[39] P. Ferraro, P. Hanna, L. Imbert, and T. Izart, “Accelerating Query-by-Humming on GPU,” in Proc. 10th International Conference on Music Information Retrieval (ISMIR), pp. 279–284, 2009.
[40] C.-C. Wang, C.-H. Chen, C.-Y. Kuo, L.-T. Chiu and J.-S. R. Jang, “Accelerating Query by Singing/Humming on GPU: Optimization for WEB Deployment,” in Proc. 36th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March 2012.
[41] C.-C. Wang, C.-H. Chen, C.-Y. Kuo, L.-T. Chiu and J.-S. R. Jang, Welcome to Miracle!, [Online]. Available: http://mirlab.org/demo/miracle, 2017
[42] C.-C. Wang, T.-C. Yeh, W.-T. Kao, J.-S. R. Jang, W.-S. Liu, and Y.-M. Huang, “GPU and Cloud Computing for Two Paradigms of Music Information Retrieval,” in “Cloud Computing and Digital Media: Fundamentals, Techniques, and Applications” edited by Kuan-ching Li, Qing Li, and Timothy K. Shih, Chapman & Hall/CRC Computer and Information Science Series, 2014.
[43] C.-C. Wang, C.-H. Chen, C.-Y. Kuo, and J.-S. R. Jang, “Improving Query by Singing/Humming Systems over GPUs,” in Proc. of the 41st International Conference on Parallel Processing Workshops (ICPPW), 2012
[44] C. J. Leggetter and P. C. Woodland. “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech & Language 9.2 (1995): 171-185.

簡易檢索 / 詳目顯示

相關論文