簡易檢索 / 詳目顯示

研究生: 陳俊達
Chen, Chun-Ta
論文名稱: 音樂與樂譜對位之研究-使用onsets與調整型常數Q頻譜
The Study of Audio-to-score Alignment Using Onsets and Modified Constant Q Spectra
指導教授: 張智星
Jang, Jyh-Shing Roger
張俊盛
Chang, Jason S.
口試委員: 陳宜欣
Chen, Yi-Shin
王逸如
Wang, Yih-Ru
冀泰石
Chi, Tai-Shih
學位類別: 博士
Doctor
系所名稱:
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 44
中文關鍵詞: 音樂同步樂譜追蹤語音onset偵測音樂與樂譜對位
外文關鍵詞: music synchronization, score following, audio onset detection, audio-to-score alignment
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文提出了一種將多音音樂與相應的樂譜比對的有效算法。提出的框架包括3個步驟:onset檢測,音符比對和動態規劃。在第一步中,我們執行onset檢測,然後通過在每個onset周圍應用常量Q變換來提取特徵。使用音符比對函數計算相似度矩陣,該函數評估樂譜中的音符與音樂中的音符之間的相似度。最後,使用動態規劃來提取相似矩陣中的最佳匹配路徑。我們在我們的實現中比較了5個onset檢測器和3種類型的頻譜差向量。實驗結果表明,我們的方法比其他算法的精度要高。我們還提出了一種基於onset檢測的online方法,可以在10 ms內檢測大多數音符。基於我們的實驗,當公差窗口為50 ms時,online版本也優於其他方法。


    This thesis proposes an effective algorithm for polyphonic audio-to-score alignment which aligns a polyphonic music performance to a corresponding symbolic score. The proposed framework consists of 3 steps: onset detection, note matching and dynamic programming. In the first step, we perform onset detection and then extract onset features by applying constant Q transform around each onset. A similarity matrix is computed using a note-matching function which evaluates the similarity between concurrent notes in the music score and onsets in the audio recording. Finally, dynamic programming is used to extract the best alignment path in the similarity matrix. We compared 5 onset detectors and 3 types of spectrum difference vectors at audio onsets in our implementations. Experimental results showed our method achieved higher precision than other algorithms. We also propose an online approach based on onset detection which can detect most notes in just 10 ms. Based on our experiments, the online version also outperforms other methods when the tolerance window is 50 ms.

    CHAPTER 1. INTRODUCTION 1 1.1 DIFFICULTIES AND STANDARD APPROACHES OF AUDIO-TO-SCORE ALIGNMENT 2 1.2 RELATED WORK 3 1.3 NOVELTY OF THE PROPOSED METHOD 7 1.4 ORGANIZATION OF THE DISSERTATION 8 CHAPTER 2. FUSION METHOD FOR ONSET DETECTION 9 2.1 FUSION OF ONSET DETECTION FUNCTIONS 9 2.2 ONSET DETECTION FUNCTIONS AND PEAK SELECTION 10 2.3 PEAK GROUPING 11 2.4 FEATURE EXTRACTION 12 2.5 FINAL CLASSIFIER 15 CHAPTER 3. SIMILARITY MEASURE 16 3.1 MODIFIED CONSTANT Q SPECTRUM 16 3.2 NOTE-MATCHING METHOD 18 3.3 DYNAMIC PROGRAMMING 21 CHAPTER 4. SCORE FOLLOWING 24 4.1 ASSUMPTIONS OF OPERATION 24 4.2 ONLINE APPROACHES 25 CHAPTER 5. EXPERIMENTS 29 5.1 DATASETS 29 5.2 EXPERIMENT 1: ONSET DETECTION 30 5.3 EXPERIMENT 2: OFFLINE AUDIO-TO-SCORE ALIGNMENT 31 5.4 EXPERIMENT 3: ONLINE AUDIO-TO-SCORE ALIGNMENT 36 CHAPTER 6. CONCLUSIONS AND FUTURE WORK 38 REFERENCES 39

    [1] Meinard Müller, “Music Synchronization” in Information Retrieval for Music and Motion, Springer, 2007, pp 85–108.
    [2] Siying Wang, Sebastian Ewert and Simon Dixon, “Robust and Efficient Joint Alignment of Multiple Musical Performances,” IEEE Transactions on Audio, Speech and Language Processing, vol. 24, no. 11, pp. 2132–2145, Nov. 2016.
    [3] Chun-Ta Chen, Jyh-Shing Roger Jang and Wenshan Liou, “Improved Score-Performance Alignment Algorithms On Polyphonic Music,” in Proceedings of the 39th IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp 1365–1369.
    [4] Alexander Lerch, “Alignment” in An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics, John Wiley & Sons, Inc., New Jersey, 2012, pp 148–149.
    [5] N Orio, S Lemouton, D Schwarz, “Score following: State of the art and new developments,” in Proceedings of the 2003 conference on New interfaces for musical expression, Montreal, Canada, 2003, pp. 34–41.
    [6] Arzt, A., Widmer, G., Dixon, S., “Automatic page turning for musicians via real-time machine listening,” in Proc. of European Conference on Artificial Intelligence (ECAI), 2008, pp. 241–245.
    [7] Shinji Sako, Ryuichi Yamamoto, Tadashi Kitamura, “Ryry: A Real-Time Score-Following Automatic Accompaniment Playback System Capable of Real Performances with Errors, Repeats and Jumps,” in Active Media Technology: 10th International Conference, AMT 2014, Warsaw, Poland, August 2014, pp134–145.
    [8] Zhiyao Duan and Bryan Pardo, “Soundprism: an online system for score-informed source separation of music audio,” IEEE Journal of Selected Topics in Signal Process., vol. 5, no. 6, pp. 1205–1215, 2011.
    [9] Juanjuan Cai, Yiyun Guo, Hui Wang and Ying Wang, “Score-informed source separation based on real-time polyphonic score-to-audio alignment and bayesian harmonic model,” International Conference on Computational Intelligence and Communication Networks, 2014, pp 672–680.
    [10] Hu, N., Dannenberg, R., Tzanetakis, G., “Polyphonic audio matching and alignment for music retrieval,” in Proc. IEEE WASPAA, New Paltz, NY, October 2003.
    [11] Dannenberg and Hu, “Polyphonic Audio Matching for Score Following and Intelligent Audio Editors,” in Proceedings of the 2003 International Computer Music Conference, San Francisco: International Computer Music Association, 2003, pp. 27–34.
    [12] Orio, N., & Schwarz, D., “Alignment of Monophonic and Polyphonic Music to a Score,” in Proc. 2001 ICMC, 2001, pp. 155–158.
    [13] Carabias-Orti, J. J., Rodriguez-Serrano F. J., Vera-Candeas P., Ruiz-Reyes N., and Canadas-Quesada F. J., “An audio to score alignment framework using spectral factorization and dynamic time warping,” in 16th International Society for Music Information Retrieval (ISMIR) Conference, 2015, pp 742-748.
    [14] F.J. Rodriguez-Serrano, J.J. Carabias-Orti, P. Vera-Candeas and D. Martinez-Muñoz, “Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping,” ACM Transactions on Intelligent Systems and Technology (TIST), Volume 8 Issue 2, January 2017.
    [15] Colin Raffel and Daniel P. W. Ellis, “Optimizing DTW-based audio-to-MIDI alignment and matching,” in Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp 81-85.
    [16] Dannenberg, R, “An on-line algorithm for real time accompaniment,” in Proceedings of the 1984 International Computer Music Conference, 1984, pp. 193-198.
    [17] Arshia Cont, “Realtime Audio to Score Alignment for Polyphonic Music Instruments, using Sparse Non-Negative Constraints and Hierarchical HMMS,” in ICASSP 2006, pp 245-248.
    [18] Joder, C., Essid, S., Richard, G., “A conditional random field framework for robust and scalable audio-to-score matching,” IEEE Transactions on Audio, Speech and Language Processing, 19 (8), pp. 2385–2397, 2011.
    [19] Matthias Dorfer, Andreas Arzt, and Gerhard Widmer, “Towards score following in sheet music images,” In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2016.
    [20] Cyril Joder, Slim Essid, and Gaël Richard, “Learning Optimal Features for Polyphonic Audio-to-Score Alignment,” IEEE Transactions on Audio, Speech and Language Processing, vol. 21, no.10, 2013, pp. 2118–2128.
    [21] Chun-Ta Chen, Jyh-Shing Roger Jang, Wen-Shan Liou and Chi-Yao Weng, “An efficient method for polyphonic audio-to-score alignment using onset detection and constant Q transform,” in Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, 2016, pp 2802-2806.
    [22] J.P. Bello, L. Daudet, S. Abdallan, C. Duxbury, and M. Davies, “A tutorial on onset detection in music signals,” IEEE Transactions on Audio, Speech, and Language Processing, volume 13, 2005.
    [23] S. Dixon, “Onset Detection Revisited,” in Proc. of the Int. Conf. on Digital Audio Effects (DAFx-06), pp. 133–137, September 2006.
    [24] A. Lacoste and D. Eck, “Onset detection with artificial neural networks,” Proceedings of the International Conference on Music Information Retrieval, 2005.
    [25] F. Eyben, S. Böck, B. Schuller, and A. Graves, “Universal onset detection with bidirectional long short-term memory neural networks,” Proceedings of the 11th International Conference on Music Information Retrieval (ISMIR), pp. 589–594, 2010.
    [26] J. Schlüter and S. Böck, “Musical onset detection with convolutional neural networks,” in 6th International Workshop on Machine Learning and Music (MML) , Prague, Czech Republic, 2013.
    [27] Jan Schlüter and Sebastian Böck. “Improved musical onset detection with convolutional neural networks,” in Proc. of the 39th Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2014.
    [28] N. Degara-Quintela, A. Pena, and S. Torres-Guijarro,“A comparison of score-level fusion rules for onset detection in music signals,” Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR), pp. 117–121, 2009.
    [29] Duxbury, C., J. P. Bello, M. Davies, and M. Sandler, “A combined phase and amplitude based approach to onset detection for audio segmentation,” Proceedings of the European Workshop on Image Analysis for Multimedia Interactive Services, 2003.
    [30] A. Holzapfel, Y. Stylianou, A.C. Gedik, and B. Bozkurt, “Three dimensions of pitched instrument onset detection,” IEEE Transactions on Audio, Speech, and Language Processing, 2010.
    [31] Mi Tian, G. Fazekas, Dawn A. A. Black, Mark B. Sandler, “Design And Evaluation of Onset Detectors using Different Fusion Policies,” Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 631-636, 2014.
    [32] S. Böck and G. Widmer, “Maximum filter Vibrato Suppression for Onset Detection,” Proc. of the 16th Int. Conf. on Digital Audio Effects (DAFx), 2013.
    [33] S. Böck and G. Widmer, “Local group delay based vibrato and tremolo suppression for onset detection,” Proc. of the 14th Int. Soc. for Music Information Retrieval Conference (ISMIR), 2013.
    [34] Sebastian Böck, Filip Korzeniowski, Jan Schlüter, Florian Krebs, and Gerhard Widmer, “madmom: a new Python Audio and Music Signal Processing Library,” Proceeding MM '16 Proceedings of the 2016 ACM on Multimedia Conference, pp. 1174-1178, 2016.
    [35] Nobutaka Ono, Kenichi Miyamoto, Hirokazu Kameoka, Jonathan Le Roux, Yuuki Uchiyama, Emiru Tsunoo, Takuya Nishimoto, and Shigeki Sagayama, “Harmonic and percussive sound separation and its application to MIR-related tasks,” Advances in Music Information Retrieval, vol 274, pp. 213-236, 2010.
    [36] Hideyuki Tachibana, Nobutaka Ono, Hirokazu Kameoka, Shigeki Sagayama, “Harmonic/Percussive Sound Separation Based on Anisotropic Smoothness of Spectrograms” IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 22, pp. 2059–2073, 2014.
    [37] J. Salamon, E. Gómez, D. P. W. Ellis, and G. Richard, “Melody extraction from polyphonic music signals: Approaches, applications and challenges,” IEEE Signal Processing Magazine, vol. 31, no. 2, pp 118–134, 2014.
    [38] Y Ueda, Y Uchiyama, T Nishimoto, N Ono, S Sagayama, “HMM-based approach for automatic chord detection using refined acoustic features,” In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 5518–5521, 2010.
    [39] Judith C. Brown, “Calculation of a constant Q spectral transform,” J. Acoust. Soc. Am., 89(1), pp. 425–434, 1991.
    [40] Arshia Cont, Diemo Schwarz, Norbert Schnell, Christopher Raphael, “Evaluation of Real-Time Audio-to-Score Alignment,” in International Symposium on Music Information Retrieval (ISMIR), 2007, Vienna, Austria. 2007.
    [41] A. Lacoste and D. Eck, “A supervised classification algorithm for note onset detection,” in EURASIP Journal on Applied Signal Processing, Aug. 2007.

    QR CODE