基於音框層級自動標註預測之繪圖式音樂檢索系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	江衍霖 Chiang, Yen-Lin
論文名稱：	基於音框層級自動標註預測之繪圖式音樂檢索系統 Sketch-based Music Retrieval Based on Frame-level Auto-tagging Predictions
指導教授：	楊奕軒 Yang, Yi-Hsuan 許秋婷 Hsu, Chiou-Ting
口試委員:	王浩全 Wang, Hao-Chuan 陳宜欣 Chen, Yi-Shin
學位類別：	碩士 Master
系所名稱：
論文出版年：	2017
畢業學年度：	105
語文別：	英文
論文頁數：	71
中文關鍵詞：	音樂檢索、繪圖式檢索、人機互動、自動標註
外文關鍵詞：	music retrieval, sketch-based retrieval, human-computer interaction, auto-tagging
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出一套新型、直覺式的音樂檢索系統，讓使用者僅需透過簡易繪圖的方式，即能表達複雜的標籤位置條件，有利於精準找尋所想要的樂曲。例如在本系統中，使用者可精確搜尋一首先含有「小提琴」段落，再含有另一個出現「吉他」「慢速」段落的「古典」樂曲。為產生近萬首樂曲的資料庫以供本系統使用，本研究運用Liu and Yang提出基於深度學習技術的音框層級 (frame-level) 自動標註模型所產生之預測輸出，經由本論文另提出的前處理機制而生成段落層級 (segment-level) 的資料庫。實驗中藉由問卷調查與試用網站，評估本系統之方法所適用的對象及用途。經實驗結果得知本研究之繪圖式音樂檢索系統，相較於非繪圖式的對照系統而言，明顯具有較高的「創新性」及「使用者體驗滿意度」；且本系統的搜尋技術，尤其將造福多媒體創作相關產業的工作者。

We proposed a novel and intuitive music retrieval interface that allows users to precisely search music containing multiple localized social tags with merely simple sketches. For example, one may search for a “classical” music clip that also includes a segment with “violin”, followed by another segment which simultaneously includes “slow” and “guitar”, while such complex conditions can be simply and correctly expressed in the query. We also proposed a segment-level database with thousands of songs and its preprocessing algorithms for our music retrieval method, which leverages the predictions by Liu and Yang’s deep learning-based frame-level auto-tagging model. To assess how users feel about this system, we have conducted a user study with a questionnaire and a demo website. Experimental results show that: i) the proposed sketch-based system outperforms the two non-sketch-based baselines we implemented in “interestingness” and “satisfaction in user experience”; ii) our proposed method is especially beneficial to multimedia content creators.

  Introduction    1
1    Motivation    1
2    Challenges    2
3    Thesis Organization    4
  Background    5
1    Multimedia Information Retrieval    5
1.1    Tag-based Music Retrieval    6
1.2    Content-based Music Retrieval    6
1.3    Sketch-based Image Retrieval    7
1.4    A Novel Music Retrieval Idea    8
2    Music Auto-tagging    9
  The Sketch-based Frontend Interface    12
1    Overview    12
1.1    The Query Sketching Panel    13
2    Sketching Behaviors    16
3    Keyword Suggestion Mechanisms    18
3.1    The Tag Name Interpreter    19
3.2    Symbolic Icons    21
4    Sketching Multiple Tag Rows    23
5    The Colors of Tag Rows    26
  The Retrieval Backend    28
1    The Main Framework    28
2    Database Preprocessing Stage    29
2.1    Obtaining the Auto-tagging Prediction Results    30
2.2    Segmenting the Frame-level Sequences    31
2.3    Data Structure of the Database    37
3    User Query Simplification Stage    38
3.1    User Queries vs. Database Segment Representations    38
3.2    User Sketch Simplification    39
4    Retrieval Stage    44
4.1    Matching Algorithm and Score Calculation    44
4.2    Obtaining Secondary Scores    46
4.3    Obtaining the Ranking List    48
  Visualizing Searching Results    49
1    Timeline Warping Algorithm    51
  Experimental User Study    52
1    Users Recruitment and the Questionnaire    52
2    Text-based Baselines    54
3    Results and Analyses    56
  Conclusion    60
1    Future Work    60
Appendix: The User Study Questionnaire    61
References    68

                                

[1] J.-Y. Liu and Y.-H. Yang. Event Localization in Music Auto-tagging. In Proc. ACM Multimedia, 2016, [Online] https://github.com/ciaua/clip2frame.

[2] P. Knees. Searching for Audio by Sketching Mental Images of Sound: A Brave New Idea for Audio Retrieval in Creative Music Production. In Proc. ICMR, 2016.

[3] Y.-L. Chiang, Y.-S. Lee, W.-C. Hsieh, and J.-C. Wang. Efficient and Portable Content-based Music Retrieval System. In Proc. IEEE International Conference on Orange Technologies (ICOT), 2014.

[4] Y.-S. Lee, Y.-L. Chiang, P.-R. Lin, C.-H. Lin, and T.-C. Tai. Robust and Efficient Content-based Music Retrieval System. In Proc. APSIPA Transactions on Signal and Information Processing, 2016.

[5] C. Poynton. Digital Video and HDTV Algorithms and Interfaces. Morgan Kaufmann, 2003.

[6] J. Maller. RGB and YUV Color. 2016. [Online] http://joemaller.com/fcp/fxscript_yuv_color.shtml.

[7] P. Gaonkar, S. Varma, and R. Nikhare. A Survey on Content-Based Audio Retrieval Using Chord Progression. In Proc. International Journal of Innovative Research in Computer and Communication Engineering, vol. 4, issue 1. January, 2016.

[8] Y.-H. Yang. Towards Real-time Music Auto-tagging Using Sparse Features. In Proc. ICME, 2013.

[9] D. Tingle, Y. E. Kim, and D. Turnbull. Exploring Automatic Music Annotation with Acoustically Objective Tags. In Proc. ACM MIR, 2010, pp. 55-62, [Online] http://cosmal.ucsd.edu/cal/projects/AnnRet/.

[10] E. Law, K. West, M. I. Mandel, M. Bay, and J. S. Downie. Evaluation of Algorithms Using Games: The Case of Music Tagging. In Proc. ISMIR, 2009, [Online] http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset.

[11] C. L. Zitnick and S. B. Kang. Stereo for Image-Based Rendering using Image Over-Segmentation. In Proc. IJCV, 2007.

[12] C. Pal, A. Chakrabarti, and R. Ghosh. A Brief Survey of Recent Edge-Preserving Smoothing Algorithms on Digital Images. arXiv:1503.07297 [cs.CV], Mar. 2015.

[13] Peter Eggleston. Understanding Oversegmentation and Region Merging. Vision Systems Design, Dec. 1, 1998.

[14] Z. Lu, Z. Fu, T. Xiang, P. Han, L. Wang, and X. Gao. Learning from Weak and Noisy Labels for Semantic Segmentation. In Proc. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.

[15] P. Vellachu and S. Abburu. Tag Based Audio Search Engine. In Proc. International Journal of Computer Science Issues (IJCSI), vol. 9, issue 2, no. 3, Mar. 2012.

[16] Google Inc. (2017) “AutoDraw - A.I. Experiments”. [Online] https://aiexperiments.withgoogle.com/autodraw.

[17] M. A. Casey et al. Content-Based Music Information Retrieval: Current Directions and Future Challenges. In Proceedings of the IEEE, vol. 96, no. 4, Apr. 2008.

[18] N. Borjian. A Survey on Query-by-Example based Music Information Retrieval. In Proc. International Journal of Computer Applications, vol. 158, no. 8, Jan. 2017.

[19] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask R-CNN. arXiv:1703.06870 [cs.CV], Mar. 2017.

[20] R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello. Medleydb: A Multitrack Dataset for Annotation-intensive MIR Research. In Proc. ISMIR, pp. 155-160, 2014. [Online] http://medleydb.weebly.com.

[21] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:1506.01497v3 [cs.CV], Jan. 2016.

[22] S. Oramas and L. Espinosa. Tutorial - Natural Language Processing for Music Information Retrieval. Poblenou Campus, UPF, Jan. 2017. [Online] https://www.upf.edu/web/mdm-dtic/tutorial-natural-language-processing-for-music-information-retrieval.

[23] B. Schuller, G. Rigoll, and M. Lang. HMM-based Music Retrieval Using Stereophonic Feature Information and Framelength Adaptation. In Proc. ICME, 2003.

[24] H.-M. Wang and L.-S. Lee. Tone Recognition for Continuous Mandarin Speech with Limited Training Data Using Selected Context-dependent Hidden Markov Models. In Proc. Journal of the Chinese Institute of Engineers, vol. 17, no. 6, pp. 775-784, 1994.

[25] H. Li. (2016) Deep Learning for Information Retrieval. [Online] http://www.hangli-hl.com/uploads/3/4/4/6/34465961/deep_learning_for_information_retrieval.pdf.

[26] R. Typke. Music Retrieval Based on Melodic Similarity. M.A. Thesis, Utrecht University, Netherlands, 2007.

[27] Hooktheory, LLC. “Songs with the same chords - Theorytab”. [Online] https://www.hooktheory.com/trends.

[28] R. Typke, F. Wiering, and R. C. Veltkamp. A Survey of Music Information Retrieval Systems. In Proc. ISMIR, pp. 153–160, 2005.

[29] BlogPress. “What Is The Difference Between Tags And Keywords?” [Online] https://theblogpress.com/blog/what-is-the-difference-between-tags-and-keywords/.

[30] Ava Garcia. (Nov. 12, 2015) “The importance of Visual Consistency in UI Design” [Online] http://www.uxpassion.com/blog/the-importance-of-visual-consistency-in-ui-design/.

[31] D. D. McCracken and E. D. Reilly. Backus-Naur form (BNF). Encyclopedia of Computer Science, 4th edition, pp. 129-131, John Wiley and Sons Ltd. Chichester, UK, 2003. ISBN: 0-470-86412-5.

[32] L. Li, H. Zhao, W. Zhang, and W. Wang. An Action-Stack Based Selective-Undo Method in Feature Model Customization. In Proc. Intl. Conference on Software Reuse (ICSR), 2013.

簡易檢索 / 詳目顯示

相關論文