研究生: |
江衍霖 Chiang, Yen-Lin |
---|---|
論文名稱: |
基於音框層級自動標註預測之繪圖式音樂檢索系統 Sketch-based Music Retrieval Based on Frame-level Auto-tagging Predictions |
指導教授: |
楊奕軒
Yang, Yi-Hsuan 許秋婷 Hsu, Chiou-Ting |
口試委員: |
王浩全
Wang, Hao-Chuan 陳宜欣 Chen, Yi-Shin |
學位類別: |
碩士 Master |
系所名稱: |
|
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 英文 |
論文頁數: | 71 |
中文關鍵詞: | 音樂檢索 、繪圖式檢索 、人機互動 、自動標註 |
外文關鍵詞: | music retrieval, sketch-based retrieval, human-computer interaction, auto-tagging |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一套新型、直覺式的音樂檢索系統,讓使用者僅需透過簡易繪圖的方式,即能表達複雜的標籤位置條件,有利於精準找尋所想要的樂曲。例如在本系統中,使用者可精確搜尋一首先含有「小提琴」段落,再含有另一個出現「吉他」「慢速」段落的「古典」樂曲。為產生近萬首樂曲的資料庫以供本系統使用,本研究運用Liu and Yang提出基於深度學習技術的音框層級 (frame-level) 自動標註模型所產生之預測輸出,經由本論文另提出的前處理機制而生成段落層級 (segment-level) 的資料庫。實驗中藉由問卷調查與試用網站,評估本系統之方法所適用的對象及用途。經實驗結果得知本研究之繪圖式音樂檢索系統,相較於非繪圖式的對照系統而言,明顯具有較高的「創新性」及「使用者體驗滿意度」;且本系統的搜尋技術,尤其將造福多媒體創作相關產業的工作者。
We proposed a novel and intuitive music retrieval interface that allows users to precisely search music containing multiple localized social tags with merely simple sketches. For example, one may search for a “classical” music clip that also includes a segment with “violin”, followed by another segment which simultaneously includes “slow” and “guitar”, while such complex conditions can be simply and correctly expressed in the query. We also proposed a segment-level database with thousands of songs and its preprocessing algorithms for our music retrieval method, which leverages the predictions by Liu and Yang’s deep learning-based frame-level auto-tagging model. To assess how users feel about this system, we have conducted a user study with a questionnaire and a demo website. Experimental results show that: i) the proposed sketch-based system outperforms the two non-sketch-based baselines we implemented in “interestingness” and “satisfaction in user experience”; ii) our proposed method is especially beneficial to multimedia content creators.
[1] J.-Y. Liu and Y.-H. Yang. Event Localization in Music Auto-tagging. In Proc. ACM Multimedia, 2016, [Online] https://github.com/ciaua/clip2frame.
[2] P. Knees. Searching for Audio by Sketching Mental Images of Sound: A Brave New Idea for Audio Retrieval in Creative Music Production. In Proc. ICMR, 2016.
[3] Y.-L. Chiang, Y.-S. Lee, W.-C. Hsieh, and J.-C. Wang. Efficient and Portable Content-based Music Retrieval System. In Proc. IEEE International Conference on Orange Technologies (ICOT), 2014.
[4] Y.-S. Lee, Y.-L. Chiang, P.-R. Lin, C.-H. Lin, and T.-C. Tai. Robust and Efficient Content-based Music Retrieval System. In Proc. APSIPA Transactions on Signal and Information Processing, 2016.
[5] C. Poynton. Digital Video and HDTV Algorithms and Interfaces. Morgan Kaufmann, 2003.
[6] J. Maller. RGB and YUV Color. 2016. [Online] http://joemaller.com/fcp/fxscript_yuv_color.shtml.
[7] P. Gaonkar, S. Varma, and R. Nikhare. A Survey on Content-Based Audio Retrieval Using Chord Progression. In Proc. International Journal of Innovative Research in Computer and Communication Engineering, vol. 4, issue 1. January, 2016.
[8] Y.-H. Yang. Towards Real-time Music Auto-tagging Using Sparse Features. In Proc. ICME, 2013.
[9] D. Tingle, Y. E. Kim, and D. Turnbull. Exploring Automatic Music Annotation with Acoustically Objective Tags. In Proc. ACM MIR, 2010, pp. 55-62, [Online] http://cosmal.ucsd.edu/cal/projects/AnnRet/.
[10] E. Law, K. West, M. I. Mandel, M. Bay, and J. S. Downie. Evaluation of Algorithms Using Games: The Case of Music Tagging. In Proc. ISMIR, 2009, [Online] http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset.
[11] C. L. Zitnick and S. B. Kang. Stereo for Image-Based Rendering using Image Over-Segmentation. In Proc. IJCV, 2007.
[12] C. Pal, A. Chakrabarti, and R. Ghosh. A Brief Survey of Recent Edge-Preserving Smoothing Algorithms on Digital Images. arXiv:1503.07297 [cs.CV], Mar. 2015.
[13] Peter Eggleston. Understanding Oversegmentation and Region Merging. Vision Systems Design, Dec. 1, 1998.
[14] Z. Lu, Z. Fu, T. Xiang, P. Han, L. Wang, and X. Gao. Learning from Weak and Noisy Labels for Semantic Segmentation. In Proc. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.
[15] P. Vellachu and S. Abburu. Tag Based Audio Search Engine. In Proc. International Journal of Computer Science Issues (IJCSI), vol. 9, issue 2, no. 3, Mar. 2012.
[16] Google Inc. (2017) “AutoDraw - A.I. Experiments”. [Online] https://aiexperiments.withgoogle.com/autodraw.
[17] M. A. Casey et al. Content-Based Music Information Retrieval: Current Directions and Future Challenges. In Proceedings of the IEEE, vol. 96, no. 4, Apr. 2008.
[18] N. Borjian. A Survey on Query-by-Example based Music Information Retrieval. In Proc. International Journal of Computer Applications, vol. 158, no. 8, Jan. 2017.
[19] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask R-CNN. arXiv:1703.06870 [cs.CV], Mar. 2017.
[20] R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello. Medleydb: A Multitrack Dataset for Annotation-intensive MIR Research. In Proc. ISMIR, pp. 155-160, 2014. [Online] http://medleydb.weebly.com.
[21] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:1506.01497v3 [cs.CV], Jan. 2016.
[22] S. Oramas and L. Espinosa. Tutorial - Natural Language Processing for Music Information Retrieval. Poblenou Campus, UPF, Jan. 2017. [Online] https://www.upf.edu/web/mdm-dtic/tutorial-natural-language-processing-for-music-information-retrieval.
[23] B. Schuller, G. Rigoll, and M. Lang. HMM-based Music Retrieval Using Stereophonic Feature Information and Framelength Adaptation. In Proc. ICME, 2003.
[24] H.-M. Wang and L.-S. Lee. Tone Recognition for Continuous Mandarin Speech with Limited Training Data Using Selected Context-dependent Hidden Markov Models. In Proc. Journal of the Chinese Institute of Engineers, vol. 17, no. 6, pp. 775-784, 1994.
[25] H. Li. (2016) Deep Learning for Information Retrieval. [Online] http://www.hangli-hl.com/uploads/3/4/4/6/34465961/deep_learning_for_information_retrieval.pdf.
[26] R. Typke. Music Retrieval Based on Melodic Similarity. M.A. Thesis, Utrecht University, Netherlands, 2007.
[27] Hooktheory, LLC. “Songs with the same chords - Theorytab”. [Online] https://www.hooktheory.com/trends.
[28] R. Typke, F. Wiering, and R. C. Veltkamp. A Survey of Music Information Retrieval Systems. In Proc. ISMIR, pp. 153–160, 2005.
[29] BlogPress. “What Is The Difference Between Tags And Keywords?” [Online] https://theblogpress.com/blog/what-is-the-difference-between-tags-and-keywords/.
[30] Ava Garcia. (Nov. 12, 2015) “The importance of Visual Consistency in UI Design” [Online] http://www.uxpassion.com/blog/the-importance-of-visual-consistency-in-ui-design/.
[31] D. D. McCracken and E. D. Reilly. Backus-Naur form (BNF). Encyclopedia of Computer Science, 4th edition, pp. 129-131, John Wiley and Sons Ltd. Chichester, UK, 2003. ISBN: 0-470-86412-5.
[32] L. Li, H. Zhao, W. Zhang, and W. Wang. An Action-Stack Based Selective-Undo Method in Feature Model Customization. In Proc. Intl. Conference on Software Reuse (ICSR), 2013.