AutoRhythm: 以自動打擊點生成與敲擊辨識為基礎的音樂遊戲

簡易檢索 / 詳目顯示

回結果列表

研究生：	葉子雋 Yeh, Tzu-Chun
論文名稱：	AutoRhythm: 以自動打擊點生成與敲擊辨識為基礎的音樂遊戲 AutoRhythm: A Music Game with Automatic Hit-Timing Generation and Percussion Identification
指導教授：	張智星 Jang, Jyh-Shing Roger 張俊盛 Chang, Jyun-Sheng Jason
口試委員:	陳煥宗 Chen, Hwann-Tzong 蔡銘峰 Tsai, Ming-Feng 蘇黎 Su, Li
學位類別：	博士 Doctor
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2020
畢業學年度：	108
語文別：	英文
論文頁數：	101
中文關鍵詞：	聲音起始點偵測、節奏遊戲、敲擊辨識
外文關鍵詞：	Onset detection, Rhythm game, Percussion identification
相關次數：	點閱：76 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文提出了一個音樂遊戲系統”AutoRhythm”，旨在根據玩家自己所選的音樂自動產生遊戲的譜面供使用者遊玩，此外，AutoRhythm還提出了一個嶄新的方式讓使用者可以選擇自己的敲擊樂器來跟遊戲做互動。AutoRhythm利用節拍偵測的技術來自動產生遊戲譜面，並且在此篇論文中，我們提出了一個讓使用者可以快速的訓練左右手所使用的敲擊樂器辨識模型，在此模型中，我們利用不同樂器在頻譜上的突顯特徵之位置做為模型訓練之依據，並設計了一個簡單的閥值分類法並即時用於音樂遊戲中以辨識使用者之不同敲擊聲。為了防止背景音樂回錄所造成的噪音干擾，我們提出一套基於自回歸模型的主動式噪音消除系統，藉此能讓敲擊偵測有更好的表現。我們在敲擊辨識上蒐集了10位使用者的100個錄音中達到了78.22%的F-measure，足夠讓使用者來做為遊戲時所用。

This thesis describes a music rhythm game called AutoRhythm, which can automatically generate the hit timing as game contents from a given piece of music and identify user-defined percussion of real objects in real time for gameplay. More specifically, AutoRhythm can generate the hit timing of a piece of music based on onset detection, so the user can use any music from their own collection for the rhythm game. Moreover, to make the game more realistic, AutoRhythm also allows the user to interact with the game via any object that can produce percussion sounds. AutoRhythm can identify the percussions in real time to replace tapping on the screen. This real-time user percussion identification is achieved based on the frame-based power spectrum of the filtered recording after background music reduction, which is performed based on the concept of active noise cancellation, with the estimated noisy playback music being subtracted from the original recording. Based on a test dataset of 100 recordings, our experiment indicates that our system can achieve an F-measure of 78.22%, which outperforms other well-known classifiers and is quite satisfactory for the purpose of gameplay.

Chapter 1.    Introduction    1
1    Related Work: Automatic Content Generation    4
2    Related Work: Onset Detection    5
3    Related Work: User Percussion Identification    6
4    Related Work: Active Noise Cancellation    8
Chapter 2.    System and Basics    9
1    Onset Detection    11
2    Harmonic/Percussive Source Separation    28
3    Procedural Content Generation    33
Chapter 3.    Methods for AutoRhythm    43
1    Hit-timing Generation    43
1.1    Harmonic/Percussive Source Separation    44
1.2    Onset detection    44
1.3    Post-processing: Result Combination and Nearby Onset Removal    46
1.4    Grouping    47
2    User Percussion Identification    53
3    Background Music Reduction    58
Chapter 4.    Experiments    64
1    Experiment on Hit-timing List Generation    64
1.1    Dataset Description    64
1.2    Evaluation Metrics    66
1.3    Experiment on Baseline Onset Detection Functions    67
1.4    Experiment on Performing HPSS    70
1.5    Experiment on Type-1 Ratios    72
2    Experiment on User Percussion Identification    74
3    Experiment on Background Music Reduction    77
Chapter 5.    Conclusions and Future Work    85
References    87

                                

[1] Tapulous, (2008) Tap Tap Revenge. Since the game is not available, a brief reference from Wikipedia is here: https://en.wikipedia.org/wiki/Tap_Tap_Revenge
[2] Rayark, (2011) Cytus. Available: https://www.rayark.com/
[3] Rayark, (2013) Deemo. Available: https://www.rayark.com/
[4] Konami (2012) Jubeat Plus. Available: http://www.konami.jp/jubeatplus/index.php5
[5] Cheetar Technology Co. Ltd., (2014) Piano Tiles. Available on App Store: https://itunes.apple.com/uz/app/piano-tiles/id848160327?mt=8
[6] Tap Lab, (2017) Dream Piano. Available on Google Play: https://play.google.com/store/apps/details?id=com.eyu.piano&hl=en_US
[7] Amanotes, (2018) Tiles Hop. Available on Google Play: https://play.google.com/store/apps/details?id=com.amanotes.beathopper&hl=en
[8] CreApptive, (2013) BeatMp3. Available on Google Play: https://play.google.com/store/apps/details?id=com.studio7775.BeatMP3&hl=en
[9] SmartPlayland, (2015) TapTube. Available on Google Play: https://play.google.com/store/apps/details?id=com.joylol.taptube&hl=en
[10] Float32, (2017) Melobeat. Available on Google Play: https://play.google.com/store/apps/details?id=com.float32.themelobeat&hl=en
[11] Pocket Games, (2015) Musiverse. Available on Google Play: https://play.google.com/store/apps/details?id=com.pocketgames.musiverse
[12] Konami, (1998) Dance Dance Revolution. A newest version (A20) is available at: https://p.eagate.573.jp/game/ddr/ddra20/p/?___REDIRECT=0
[13] Sega, (1999) Samba de Amigo. Since the game is not available, a brief reference from Wikipedia is here: https://en.wikipedia.org/wiki/Samba_de_Amigo
[14] Konami (1999) DrumMania. Available: https://p.eagate.573.jp/game/gfdm/gitadora_matixx/p/index.html?___REDIRECT=0
[15] Activision (2006) Guitar Hero. Available: https://www.guitarhero.com/
[16] Beat Games, (2018) BeatSaber. Available on Steam: https://store.steampowered.com/app/620980/Beat_Saber/
[17] Audio Surf LLC. (2016) AudioSurf. Available on Steam: https://store.steampowered.com/app/412740/Audioshield/
[18] Cold Beam Games, (2009) Beat Hazard. Available on Steam: https://store.steampowered.com/app/49600/Beat_Hazard/
[19] Empty Clip Studios, (2012) Symphony. Available on Steam: https://store.steampowered.com/app/207750/Symphony/
[20] N. Shaker, J. Togelius, and M. J. Nelson, “Procedural Content Generation in Games: A Textbook and an Overview of Current Research,” Springer, 2016
[21] Blizzard (1999), Diablo II, Available: https://www.blizzard.com/zh-tw/games/d2/
[22] Runic Games (2009), Torch Light, Available: https://www.torchlight1.com/en
[23] Grinding Gear Games (2013), Path of Exile, Available: https://www.pathofexile.com/
[24] A. Jordan, D. Scheftelowitsch, J. Lahni, J. Hartwecker, M. Kuchem, M. WalterHuber, N. Vortmeier, T. Delbr¨ugger, U. G¨uler, I. Vatolkin, and M. Preuß. BeatThe- “Beat: Music-based procedural content generation in a mobile game,” In Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), pp. 320–327, 2012
[25] S. Dixon, "Simple Spectrum-Based Onset Detection," Extended Abstract on 2nd Music Information Retrieval Evaluation eXchange (MIREX 2006).
[26] J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. Sandler, "A tutorial on onset detection in musical signals," IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 1035–1047, 2005.
[27] J. P. Bello, C. Duxbury, M. Davies, and M. Sandler, "On the use of phase and energy for musical onset detection in the complex domain," IEEE Signal Processing Letters, vol. 11, no. 6, pp. 553–556, 2004.
[28] S. Böck, F. Krebs, and M. Schedl, "Evaluating the online capabilities of onset detection methods," in Proc. of the 14th International Conference on Music Information Retrieval (ISMIR), Porto, 2012.
[29] J. Schlüter and S. Böck, “Improved musical onset detection with convolutional neural networks,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2014, pp. 6979–6983.
[30] D. Wessel and M. Wright, “Problems and prospects for intimate musical control of computers,” Computer Music J., vol. 26, no. 3, pp. 11–22, 2002.
[31] P. Herrera, A. Yeterian, and F. Gouyon, "Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques," in The International Conference on Music and Artificial Intelligence (ICMAI), 2002.
[32] W. A. Schloss, "On the Automatic Transcription of Percussive Music," in Acoustic Signal to High-level Analysis. STAN-M-27, Stanford, CA, CCRMA, Department of Music, Stanford University, 1985.
[33] F. Gouyon, F. Pachet, and O Delerue, "On the Use of Zero-Crossing Rate for and Application of Classification of Percussive Sounds," in the Proc. of the COST G-6 Conference on Digital Audio Effect, 2000.
[34] K. Yoshii, M. Goto, and H. G. Okuno, "Automatic Drum Sound Description For Real-world Music Using Template Adaptation and Matching Methods," in Proc. of International Conference on Music Information Retrieval, 2004.
[35] K. Yoshii, M. Goto, and H. Okuno, “AdaMast: a Drum Sound Recognizer based on Adaptation and Matching of Spectrogram Templates,” in Proc. Music Information Retrieval Evaluation eXchange (MIREX), 2005.
[36] K. Yoshii, M. Goto, and H. Okuno, “Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates with Harmonic Structure Suppression,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 333–345, 2007.
[37] C. Dittmar and D. Gartner, “Real-time Transcription and Separation of Drum Recordings based on NMF Decomposition. In Proc. of the 17th International Conference on Digital Audio Effects (DAFX), 2014
[38] P. Lueg, "Process of silencing sound oscillations," U.S. Patent No. 2043416, Filed March 8th, 1934, Issued June 9th, 1936.
[39] L. J. Fogel, "Method of improving intelligence under random noise interference," U.S. Patent No. 2866848, Filed April 2nd, 1954, Issued December 30th, 1958.
[40] B. Widrow and M. A. Lehr, "Noise Canceling and Channel Equalization," in The Handbook of Brain Theory and Neural Networks. Michael A. Arbib (Ed.). The MIT Press, Cambridge, Massachusetts, London, England. 648-650, 1995.
[41] A. V. Oppenheim, E. Weinstein, K. Zangi, M. Feder, and D. Gauger, "Single-Sensor Active Noise Cancellation," in the IEEE Transaction on Speech and Audio Processing, vol. 2, no. 2, pp. 285-290, 1994.
[42] B. Benoit, C. Camastra, M. Kenny, K. Li, R. Romanowski, and S. Kevin, "Engineering Silence: Active Noise Cancellation," Published by the North Carolina State University, 2012
[43] S. Liebich, C. Anemüller, P. Vary, P. Jax, D. Rüschen, and S. Leonhardt, "Active noise cancellation in headphones by digital robust feedback control," in Proc. of the 24th European Signal Processing Conference (EUSIPCO), 2016.
[44] P.-P. Chen, T.-C. Yeh, and J.-S. R. Jang, "AutoRhythm: A Music Game with Automatic Hit-Timing Generation and Percussion Identification", in Proc. of IEEE International Conference on Multimedia & Expo 2015.
[45] B. C. J. Moore, An Introduction to the Psychology of Hearing, 5th ed. New York: Academic, 1997
[46] M. Goto and Y. Muraoka, “Beat Tracking Based on Multiple-agent Architecture—A Real-Time Beat Tracking System for Audio Signals,” in Proc. 2nd Int. Conf. Multiagent Systems, pp. 103–110, Dec. 1996,
[47] E. D. Scheirer, “Tempo and beat analysis of acoustic musical signals,” Journal of Acoustic. Society of America, vol. 103, no. 1, pp. 588–601, Jan. 1998.
[48] R. J. McAulay and T. F. Quatieri, “Speech analysis/synthesis based on a sinusoidal representation,” IEEE Transaction of Acoustic, Speech, Signal Process., vol. ASSP-34, pp. 744–754, 1986
[49] X. Serra and J. O. Smith, “Spectral modeling synthesis: a sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Comput. Music J., vol. 14, no. 4, pp. 12–24, winter 1990.
[50] S. Levine, “Audio Representations for Data Compression and Compressed Domain Processing,” Ph.D. dissertation, Stanford Univ., Stanford, CA, 1998
[51] T. Verma, S. Levine, and T. Meng, “Transient modeling synthesis: A flexible analysis/synthesis tool for transient signals,” in Proceedings of International Computer Music Conference, Thessaloniki, Greece, pp. 164–167, 1997
[52] T. Verma and T. Meng, “Sinusoidal modeling using frame-based perceptually weighted matching pursuits,” in Proceedings of International Conference of Acoustics, Speech, and Signal Processing, vol. 2, Phoenix, AZ, pp. 981–998, 1999
[53] Z. Settel and C. Lippe, “Real-time musical applications using the FFT-based resynthesis,” in Proceedings of International Computer Music Conference (ICMC94), Aarhus, Denmark, pp. 338–343, 1994
[54] C. Duxbury, M. Davies, and M. Sandler, “Extraction of transient content in musical audio using multiresolution analysis techniques,” in Proceedings of Digital Audio Effects Conf. (DAFX ’01), Limerick, Ireland, pp. 1–4, 2001
[55] S. Shlien, “The modulated lapped transform, its time-varying forms, and its applications to audio coding standards,” IEEE Transaction of Speech and Audio Processing, vol. 5, no. 4, pp. 359–366, 1997
[56] M. Purat and P. Noll, “Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms,” in Proceedings of International Conference of Acoustic, Speech, and Signal Processing (ICASSP), Atlanta, GA, pp. 1021–1024, 1996
[57] J. Princen and A. Bradley, “Analysis/synthesis filter bank design based on time domain aliasing cancellation,” IEEE Transaction of Acoustic, Speech, Signal Processing, vol. ASSP-34, no. 5, pp. 1153–1161, Oct. 1986.
[58] L. Daudet and B. Torrésani, “Hybrid representations for audiophonic signal encoding,” Signal Processing, vol. 82, no. 11, pp. 1595–1617, 2002
[59] P. Masri, "Computer modeling of Sound for Transformation and Synthesis of Musical Signal, " PhD thesis, University of Bristol, 1996.
[60] P. Brossier, J. P. Bello and M. D. Plumbley, "Real-time temporal segmentation of note objects in music signals, " in Proceedings of the International Computer Music Conference (ICMC 2004), Miami, Florida, USA, Nov., 2004.
[61] S. Dixon, "Onset Detection Revisited," in Proceedings of the 9th International Conference on Digital Audio Effects (DAFx’06), Montreal, Canada, September 18-20, 2006
[62] C. Duxbury, J.P. Bello, M. Davies, and M. Sandler, “A Combined Phase and Amplitude Based Approach to Onset Detection for Audio Segmentation,” in Proceedings of the 4th European Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS-03), 2003, pp. 275–280
[63] S. Kullback and R.A. Leibler, "On information and sufficiency," Annals of Mathematical Statistics. Vol. 22, No. (1), pp. 79–86, 1951
[64] P. M. Brossier, “Automatic Annotation of Musical Audio for Interactive Applications,” PHD Thesis, Centre for Digital Music, Queen Mary, University of London, 2006.
[65] S. Hainsworth and M. Macleod, “Onset detection in musical audio signals,” in Proceedings of the International Computer Music Conference, 2003
[66] R.A. Haddad and A.N. Akansu, "A Class of Fast Gaussian Binomial Filters for Speech and Image Processing," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 39, pp 723-727, March 1991.
[67] X. Rodet and F. Jaillet, “Detection and modeling of fast attack transients,” in Proc. Int. Computer Music Conf., Havana, Cuba, 2001, pp. 30–33.
[68] N. Ono, K. Miyamoto, J. L. Roux, H. Kameoka, and S. Sagayama, "Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram," in Proc. of the European Signal Processing Conference, 2008.
[69] H. Tachibana, T. Ono, N. Ono, and S. Sagayama, “Melody line estimation in homophonic music audio signals based on temporal-variability of melody source”, IEEE ICASSP, pp. 425-428, 2010
[70] C.-L. Hsu, L.-Y. Chen, J.-S. R. Jang, and H.-J. Li, “Singing Pitch Extraction from Monaural Polyphonic Songs by Contextual Audio Modeling and Singing Harmonic Enhancement”, in the proceedings of International Society for Music Information Retrieval (ISMIR 2009), Kobe, Japan, Oct. 2009.
[71] J. Togelius, G. Yannakakis, K.O. Stanley, C. Browne, “Search-Based Procedural Content Generation: A Taxonomy and Survey,” IEEE Transactions on Computational Intelligence and AI in Games, Vol. 3, pp 172 - 186, 2011
[72] M. Hendrikx, S. Meijer, J. V. D. Velden, and A. Iosup “Procedural Content Generation for Games: A Survey,” ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 9, No. 1, Article 1, Feb. 2013
[73] M. Edwards, “Algorithmic Composition: Computational Thinking in Music,” Communication of ACM, Vol 54, pp 58–67, 2011
[74] J. Dorsey and H. Rushmeier, “Advanced material appearance modeling, ” In Proceedings of ACM SIGGRAPH 2009 Courses, New York, NY, USA, pp 1–134
[75] D. S. Ebert, F. K. Musgrave, D. Peachey, K. Perlin, S. Worley, “Texturing and Modeling: A Procedural Approach,” in 3rd Edition of Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002
[76] Blizzard (2005), World of Warcraft, Available: https://worldofwarcraft.com/
[77] C. Alexander, “A Pattern Language: Towns, Buildings, Construction,” Oxford University Press, New York, NY, USA, 1977
[78] S. H. Strogatz, “Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering,” Perseus Books Publishing, LLC., 1994
[79] L. Edelstein-Keshet, “Mathematical Models in Biology. Classics in Applied Mathematics Series,” vol. 46. SIAM, 2005
[80] Origin Systems (1997), Ultima Online, Available: https://uo.com/
[81] Maxis (1989), Sim City, Available: https://www.ea.com/games/simcity
[82] Parl AI and Facebook (2019), LIGHT, Available: https://parl.ai/projects/light/
[83] J. Urbanek A. Fan, S. Karamcheti, S. Jain, S. Humeau, E. Dinan, T. Rocktasche, D. Kiela, A. Szlam, J. Weston, “Learning to Speak and Act in a Fantasy Text Adventure Game, ” arXiv:1903.03094v1, Mar, 2019
[84] S. Colton, “Automated puzzle generation,” In Proceedings of Symposium on AI and Creativity in the Arts and Science, 2002
[85] Playabl Studio (2005), Façade, Available: https://www.playablstudios.com/facade
[86] Tarn Adams (2006), Dwarf Fortress, Available: https://store.steampowered.com/app/975370/Dwarf_Fortress/
[87] Epyx (1980), Rogue, Unavailable. Relative wiki page: https://en.wikipedia.org/wiki/Rogue_(video_game) (fetched 2020/04/11)
[88] B. Pell, “A Strategic Metagame Player for General Chess-Like Games,” in the proceedings of 12th Computational Intelligence, pp 177–198, 1993
[89] Blizzard (1998), Starcraft, Available: https://starcraft.com/
[90] Riot (2009), League of Legends, Available: http://leagueoflegends.com/
[91] Blizzard (2015), Overwatch, Available: http://playoverwatch.com/
[92] Hao-Hsun Li, “Automatic Hit Time Generation for Onset-Based Music Games,” Master Thesis, NTU CSIE, 2014
[93] J. Nelder and R. Mead, "A simplex method for function minimization". Computer Journal. 7: 308–313. doi:10.1093/comjnl/7.4.308, 1965.
[94] J. B. MacQueen, “Some Methods for classification and Analysis of Multivariate Observations,” in Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. 1. University of California Press. pp. 281–297, 1967
[95] S. B. Davis, and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," in IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), pp. 357–366, 1980
[96] M. Galrinho, "Least Squares Methods for System Identification of Structured Models", Licentiate Thesis, published by KTH School of Electrical Engineering, 2016.
[97] Taiko Jiro’s Dataset. Available: http://tieba.baidu.com/p/1736272776
[98] I. C. A. Oyeka, "An Introduction to Applied Statistical Methods", 8th Edition, Nobern Avocation Publishing Company, Enugu, pp. 496-533, 2009.
[99] G. Tzanetakis and P. Cook, "Musical Genre Classification of Audio Signals," in IEEE Transactions on Audio and Speech Processing, Vol 10, No. 5, 2002. Dataset available at http://opihi.cs.uvic.ca/sound/genres.tar.gz
[100] C.-C. Chang, and C.-J. Lin, "LIBSVM : a library for support vector machines," in ACM Transactions on Intelligent Systems and Technology. 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

簡易檢索 / 詳目顯示

相關論文