旋律辨識系統之設計:限制回應時間之效能最佳化

簡易檢索 / 詳目顯示

回結果列表

研究生：	王儀蓁 Yi-Chen, Eva, Wang
論文名稱：	旋律辨識系統之設計:限制回應時間之效能最佳化 Melody Recognition System Design:Performance Optimization with Constrained Response Time
指導教授：	張智星 Roger Jang
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2005
畢業學年度：	93
語文別：	中文
論文頁數：	46
中文關鍵詞：	旋律辨識、加速、限制回應時間
外文關鍵詞：	Melody Recognition, speeding, Constrained Response Time
相關次數：	點閱：79 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

隨著多媒體的資料日新月異和快速地增加，多媒體資料檢索是一個越來越重要的課題。通常用來衡量檢索結果的好壞，是辨識結果的正確率和辨識速度的快慢。如何在兩者中取一個平衡點，使得整體的表現到達最好，是一個值得討論的議題。
本篇論文著重於加快辨識速度方面，並且在影響辨識結果的正確率不大的條件下，使得辨識的速度可以增快許多。本篇論文分為兩個部分：第一部分為「動態時間扭曲 (dynamic time warping, DTW)的加速理論」，第二部分為「實作加速旋律辨識系統並且在有限回應時間內之效能最佳化設計」。
「動態時間扭曲 (DTW)的加速理論」，是將常用的多媒體檢索方法DTW做加速。本篇論文提出階段性辨識的方法，顧名思義就是把辨識的步驟分為各個階段，一開始先利用明顯的特徵去快速刪除可能性較小的歌曲，第二部分則是先取出待辨識歌曲的部分取樣點去做辨識，最後再把第一、二部分的結果篩選出有可能的歌曲去做完整的比對。
「實作加速旋律辨識系統並且在有限回應時間內之效能最佳化設計」，是去推導出一套數學邏輯證明，用動態規劃法(Dynamic Programming)去設計辨識系統，使其能夠在限制回應時間的條件下，辨識的正確率能夠最高，在效率和辨識率兩者中達到平衡，並設計實驗去證明其可行性和效率。

This thesis discusses the methodology of speeding up melody recognition. The general algorithm for melody recognition now used is “Dynamic Time Warping”, shorted as DTW. While the database increases, the time cost for DTW will largely increase as well. As a result, we propose “step by step” recognition flow for acceleration.
In first step, we introduce the differences of semitones between test and referenced songs, and by setting the threshold, we can save certain number of songs which are more similar with test songs. Then, we reduce the pitch vector by the parameter, frame rate, and the time cost will be reduced in proportion of frame rate. Different number of songs in the database will be kept according to different survival rate after this stage. And so on, we can combine several methods for melody recognition and design the system to reduce the time cost and increase the recognition rate.
There are several stages in this combined system, and for mathematics analyzing, it requires some data: the recognition rate of stage and single song comparison time cost correspond to survival rate of stage. After achieving this information, we could know the approximate parameters setting for stages with dynamic programming. This system can achieve the balance between speed and performance. With this design, the stages will guarantee the recognition rate and speed up.

摘要                                         I
目錄                                         IV
圖表目錄                                     VI
第1章 緒論                               1
      1.1 研究主題                               1
      1.2 旋律辨識簡介                      2
      1.3 本論文研究方向和主要成果             4
      1.4 章節概要                               5
第2章 動態時間扭曲 (DTW)的加速             6
      2.1 動態時間扭曲 (DTW)的基本技術    6
          2.1.1 資料收集(Input Collection)    6
          2.1.2 音高追蹤(Pitch Tracking)    7
          2.1.3 音高平移(Key Shifting)    8
          2.1.4 音高修飾(Pitch Smoothing)    10
          2.1.5 音高切割(Note Segmentation)    11
      2.2 階段性辨識                      12
          2.2.1 第一階段                      13
          2.2.2 第二階段                      15
          2.2.3 第三階段                      16
          2.2.4 第四階段                      17
          2.2.5 最後階段                      18
第3章 階段性辨識方法的數學推論             19
      3.1 數學推論概念                      19
          3.1.1 二階段辨識                    20
          3.1.2 三階段辨識                      24
          3.1.3 多階段辨識                      26
          3.1.4 效能評估方法             29
          3.2 實驗數據                      30
          3.2.1 實驗一                      30
          3.2.2 實驗二                      33
          3.2.3 實驗三：                      36
          3.2.4 實驗四：                      37
      3.3 錯誤分析與比較                      40
第4章 結論以及未來工作                      42
參考文獻                                        43

                                

【1】 Jang, J.-S. Roger, and Gao, Ming-Yang, "A Query-by-Singing System based on Dynamic Programming", International Workshop on Intelligent Systems Resolutions (the 8th Bellman Continuum), PP. 85-89, Hsinchu, Taiwan, Dec 2000.
【2】 B.-K. Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary Lp Norms,” in Proc. Of VLDB, Sept., 2000.
【3】 Yunyue Zhu and Dennis ShaSha, “Warping Indexes with Envelope Transforms for Query by Humming”,
【4】 B. K. Yi, H. V. Jagadish, and Christos Faloutsos, “Efficient Retrieval of Similar Time Sequences under Time Warping”,
【5】 F. K.-P Chan, A. W.-c. Fu, and C. Yu, “Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping,” IEEE Trans. on Knowledge and Data Engineering, 15(3):686-705, May/June, 2003.
【6】 H. H. Shih, S. S. Narayanan, and C.-C. Jay Kuo, “A Statistical Multidimensional Humming Transcription Using Phone Level Hidden Markov Models for Query by Humming System”,
【7】 Ghias, A. J. and Logan, D. Chamberlain, B. C. Smith, “Query by humming-musical information retrieval in an audio database”, ACM Multimedia ’95 San Francisco, 1995. (http://www2.cs.cornell.edu/zeno/Papers/humming/humming.html)
【8】 Ning-Han Liu, Yi-Hung Wu, and Arbee L.P. Chen, “Efficient K-NN Search in Polyphonic Music Databases Using a Lower Bounding Mechanism”,
【9】 X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development: Prentice Hall PTR, 2001.
【10】 S.-W. Kim, S. Park, and W. W. Chu, “An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Database,” in Proc. Of IEEE Data Engineering, Germany, pp. 607-614, April, 2001.
【11】 M. Vlachos, M. Hadjieleftheriou, D.G. y, and E. Keogh, “Indexing Multi-Dimensional Time-Series with Support for Multiple Distance Measure,” in Proc. of ACM SIGKDD, Aug., 2003.
【12】 Lawrence Rabiner, B.H Juang, Fundamentals of speech recognition, Prentice Hall, 1993.
【13】 F. Korn, H. V. Jagadish, and C. Faloutsos, “Efficiently Supporting AD Hoc Queries in Large Datasets of Time Sequences,” in Proc. Of ACM SIGMOD, Arizona, pp. 289-300, May, 1997.
【14】 Christopher Raphael, “Automatic segmentation of acoustic musical signals using hidden markov models”, in IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, pp. 360-370.
【15】 Christopher Raphael, “Automated rhythm transcription,” in International Symposium on Music Information Retrieval(IS-MIR 2001), 2001.
【16】 Adriane Swalm Durey and Mark A. Clements, “Melody spoting using hidden markov models,” in International Symposium on Music Information Retrieval(IS-MIR 2001), 2001, pp.109-117.
【17】 Adriane Swalm Durey and Mark A. Clements, “Features for spotting using hidden markov models,” in ICASSP 2002, 2002.
【18】Steven Young, The HTK Book version 3, Microsoft Corporation, 2000.
【19】 McNab, R. J. and Smith, L. A. “Melody transcription for interactive applications” Department of Computer Science University of Waikato, New Zealand.
【20】 McNab, R. J., Smith, L. A. and Witten, Jan H. “Towards the Digital Music Library: Tune Retrieval from Acoustic Input” ACM, 1996.
【21】 McNab, R. J., Smith, L. A., Witten, I. H. and Henderson, C. L. “Tune Retrieval in the Multimedia Library,”
【22】McNab, R. J.,Smith, L. A. and Witten, Jan H. “Signal Processing for Melody Transcription” Proceedings of the 19th Australasian Computer Science Conference, 1996.
【23】Torres, L. and Huguet, J., “An Improvement on Codebook Search for Vector Quantization,” IEEE Transactions on Communication, Vol 42, No. 2/3/4, PP. 208-210, February/March/April, 1994.
【24】Chan, Chok-ki, and Ma, Chi-Kit, “A Fast Method of Designing Better Codebooks for Image Vector Quantization,” IEEE Transactions on Communications, Vol. 42, No. 2/3/4, PP. 237/242, February/March/April, 1994.
【25】Yianilos, Peter N. “Data structures and algorithms for nearest neighbor search in general metric spaces,” In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 311-321, Austin, Texas, 25-27 January 1993.
【26】Fukunaga, Keinosuke and M. Narendra, Patrenehalli, “ A Branch and Bound Algorithm for Computing K_Nearest Neighbors”, IEEE Transactions on Computer, July 1975.
【27】Chen B. and Jang, J.-S, Roger, “Query by Singing”, 11th IPPR Conference on Computer Vision, Graphics, and Image Processing, PP. 529-536, Taiwan, Aug 1998.
【28】Rainer Typke, Marc den Hoed, Justin de Nooijer, Frans Wiering, Remco C. Veltkamp Ultecht University, ” A Ground Truth For Half A Million Musical Incipits”, DIR “05”, January 10-11, 2005.
【29】Simon Sheu and Jinxiong Shen, “Effective Filtering for Nearest-Neighbors Queries in Large Time-Series Databases”, pp. 48-55, in Proc. of the 2003
National Computer Symposium (NCS), Taichung, Taiwan, Dec. 2003.
【30】N. Hu and R. B. Dannenberg, "A Comparison of Melodic Database Retrieval Techniques Using Sung Queries," in Proceedings of the second ACM/IEEE-CS joint
conference on Digital libraries, New York:ACM Press 2002, pp. 301-307.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文