自動樂器家族分類｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃彥學 Huang, Ian Shiue
論文名稱：	自動樂器家族分類 Music Instrument Family Classification
指導教授：	劉奕汶 Liu, Yi Wen
口試委員:	李祈均 Lee, Chi Chun 陳新 Chen, Hsin 陳志強 Chan, Chi Keung
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2016
畢業學年度：	105
語文別：	英文
論文頁數：	59
中文關鍵詞：	音樂訊號處理、機器學習、音色分類
外文關鍵詞：	music signal processing, machine learning, timbre classification
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

常見的樂團通常包含了五個不同的樂手，分別是主唱、電吉他手、電貝斯手、鼓手以及鍵盤手，其中鍵盤手常見的問題為，市面上缺少著鍵盤手的樂譜，以至於需要參考其他樂手的譜以了解整首歌的進行，但通常這些譜是缺少樂器資訊的，使用者並無法得知某個時間點需要在鍵盤上模擬的樂器為何，為了解決這樣的問題，我們用了預錄好的三十種不同的樂器音檔，形成了六種不同的樂器家族的一秒檔案，並且利用這六種家族有次序的混合產生十五種雙重樂器以及二十種三重樂器的資料，這些加起來有四十一種類別的一秒音檔分別取了時域訊號以及頻域訊號堆疊起來當作特徵向量，並且透過一些機器學習演算法，使系統能自動分類樂器，本文獻的結果為，最近鄰居法於驗證(validation)與實測(testing)有最好的精準度，分別是71.1%以及65.2%。此外，我們也提出了十題的聽力測試，分別是九題的兩秒音檔以及一題的陷阱題，九題中的每一題多選題均須回答全對才算答對了完整一題，陷阱題須回答對才算有效樣本，否則為無效樣本，這樣的測試是為了檢測我們所使用的演算法是否超越了人類的能力，總共參與的樣本數有498人，但有效樣本數只有301人，這些人依照音樂能力分了三個等級，等級最高的人群確實表現超越了系統，但平均而言，機器的能力是大於人類的。

A typical music band is composed of a vocal, an electric guitarist, an electric bassist, a drummer, and a keyboardist. The task of a keyboardist is to utilize the music instruments plugged-in in a keyboard appropriately. Nevertheless, keyboard sheets are hard to obtain. A keyboard beginner usually refers to guitar tabs to practice, thus the information of the instruments decision is lost. In this thesis, we have built a system of classification in an attempt to solve this problem. Each music instrument family data is composed of various pitches in 1 second. Also, duo-timbre and trio-timbre are mixed in order to generate mixtures and they serve as different labels. Their feature vectors are composed of a low-pass filtered power spectrogram, a high-pass filtered power spectrogram, a chromagram, and the time domain waveform. Several machine learning methods have been applied respectively, yet not all of the methods perform well. The k-nearest neighbors method has the most accurate result in both validation step (71.1%) and testing step (65.2%). We also have carried out a hearing test in order to understand whether the ability of classification for humans can compete with computers. As a result, humans’ accuracy is lower than computers’ in average.

摘要    i
Abstract    ii
Introducion    1
1 Music instruments    2
2 Timbres    2
3 Music instrument families    5
4 Literature review    6
5 Motivation    7
Methods    10
1 Training databases    10
2 Feature extraction    11
2.1 Time domain features    13
2.2 Frequency domain features    13
2.3 Pooling    17
2.4 Summary    19
3 Machine learning algorithms    19
3.1 k-nearest neighbors    20
3.2 Support vector machines    21
3.3 Neural networks    22
3.4 Nearest neighbor of sparse coding    27
3.5 Principal components analysis    30
4 Testing database    30
5 Block diagrams    32
6 Hearing test    34
Results    36
1 Cross-validation    36
2 Testing    42
3 Hearing tests and overall accuracy    44
4 Summary    45
Discussion    47
1 classifying distribution    47
2 kNN versus NNSC    49
3 Hearing tests    51
Conclusion and future works    53
Reference    55
Appendix    57

                                

[1] A. de Cheveigné, & H. Kawahara. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917-1930
[2] A. Noll. (1967). Cepstrum Pitch Determination. Journal of the Acoustical Society America, 41(2), 293-309.
[3] T. Fujishima. (1999). Realtime chord recognition of musical sound: A system using common lisp music. In Proceedings of the International Computer Music Conference, 464-467.
[4] A. Sheh, & Daniel, P.W. Ellis. (2003). Chord Segmentation and Recognition using EM-Trained Hidden Markov Models. In Proceedings of the International Conference on Music Information Retrieval, 3, 183-189.
[5] Hung-Chen Chen, & Arbee, L. P. Chen. (2001). A music recommendation system based on music data grouping and user interests. Proceedings of the tenth international conference on Information and knowledge management, 231-238.
[6] Ja-Hwung Su, Hsin-Ho Yeh, Philip S. Yu, & Vincent S., Tseng. (2010). Music Recommendation Using Content and Context Information Mining. IEEE Intelligent Systems, 25(1), 16-26.
[7] J. C. Brown. (1999). Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. The Journal of the Acoustical Society of America, 105(3), 1933-1941.
[8] T. Kitahara, M. Goto, & H. G. Okuno. (2003). Musical instrument identification based on F0-dependent multivariate normal distribution. In Proceedings of Acoustics, Speech, and Signal Processing, 5, V-421.
[9] J. Marques, & P. Moreno. (1999) A study of musical instrument classification using Gaussian mixture models and support vector machines. Compaq, 99(4).
[10] A. Eronen. (2001). Comparison of features for musical instrument recognition. In Proc. IEEE Workshop Appl. Signal Process, Audio Acoust., 19–22.
[11] J. C. Brown, O. Houix, & S. McAdams. (2001). Feature dependence in the automatic identification of musical woodwind instruments. The Journal of the Acoustical Society of America, 109(3), 1064–1072.
[12] G. Agostini, M. Longari, & E. Poolastri. (2003). Musical instrument timbres classification with spectral features. EURASIP Journal on Applied Signal Processing, 2003(1). 5–14.
[13] I. Kaminskyj, & T. Czaszejko. (2005). Automatic recognition of isolated monophonic musical instrument sounds using kNN. Journal of Intelligent Information Systems, 24(2/3), 199–221.
[14] E. M. Hornbostel, & C. Sachs. (1914). Zeitschrift für Ethnologie German: Braunschweig, A. Limbach.
[15] S. Md Saha Goutam. (2012). Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Communication, 54(4), 543–565.
[16] http://impossible-music.wikia.com/wiki/Microsoft_GS_Wavetable_Synth
[17] B. Gold, N. Morgan, & D. Ellis. (2011). Speech and audio signal processing: processing and perception of speech and music. John Wiley & Sons.
[18] N. S. Roger. (1964). Circularity in judgments of relative pitch. Journal of the Acoustic Society of America, 36(212), 2346–2353.
[19] T. Cho, & J. P. Bello. (2014). On the relative importance of individual components of chord recognition systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2), 477-492.
[20] B. E. Boser, I. M. Guyon, & V. N. Vapnik. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, 144-152.
[21] M. B. Christopher. (2006). Pattern Recognition and Machine Learning (1st ed). America: Springer.
[22] Hsu Chih-Wei, & Lin Chih-Jen (2002). A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks.
[23] Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael, Ringgaard, & Chih-Jen Lin. (2010). Training and testing low-degree polynomial data mappings via linear SVM. J. Machine Learning Research, 11, 1471–1490.
[24] F. Pedregosa et al. (2011). Scikit-learn: Machine Learning in Python. JMLR 12, 2825-2830.
[25] D. E. Rumelhart, G. E. Hinton, & R. J. Williams. (1988). Learning representations by back-propagating errors. Cognitive modeling, 5(3), 1.
[26] S. Shai. (2011). Online Learning and Online Convex Optimization. Foundations and Trends® in Machine Learning, 107–194.
[27] J. Mairal, F. Bach, J. Ponce, & G. Sapiro. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning, 689-696.
[28] M. Schmidt. (2005). Least squares optimization with l1-norm regularization. CS542B Project Report of The University of British Columbia, 14-18.
[29] K. Pearson. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine, 2(6), 559–572.
[30] J. P. Bello et al. (2005). A tutorial on onset detection in music signals. IEEE Transactions on speech and audio processing, 13(5), 1035-1047.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文