研究生: |
鄭語芳 Zheng, Yu-Fang |
---|---|
論文名稱: |
將日本流行歌曲轉換為晶片音樂風格 - 結合音高偵測與矩陣分解之轉換系統設計與實作 From J-Pop to Chiptune: Design and Implementation of a Conversion System Based on Pitch Detection and Matrix Decomposition |
指導教授: |
劉奕汶
Liu, Yi-Wen |
口試委員: |
蘇黎
Su, Li 黃元豪 Huang, Yuan-Hao 程瓊瑩 Cheng, Chiung-Ying |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 113 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 曲風轉換 、非負矩陣分解 、音高偵測 、峰值偵測 |
外文關鍵詞: | Music style transfer, Nonnegative Matrix Factorization, Pitch detection, Peak detection |
相關次數: | 點閱:54 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨在開發一套將日本流行歌曲轉為晶片音樂的自動轉換系統。晶片音樂是一種源自 1980 年代電子遊戲音樂的音樂風格,具有簡單的聲音特徵和懷舊氛圍。隨著近年來復古風格的流行,許多音樂創作者將現代流行音樂改編為晶片音樂風格,以展現獨特的聽覺體驗。然而手動轉換流行音樂為晶片音樂需要具備一定程度的樂理知識,並且耗費時間和精力,因此本研究提出了一套自動轉換系統,旨在降低轉換過程的複雜度,並提升轉換結果的品質。
本系統主要結合了音高偵測與非負矩陣分解(Non-negative Matrix Factorization, NMF)兩個技術。首先透過開源的音源分離模型Spleeter將原始音樂拆分為人聲、貝斯、鋼琴、鼓組和其他部分。然後,分別對人聲、貝斯進行音高偵測,使用晶片音樂風格中的波型重新生成,並在時域做合成。對於鋼琴及其他音效則使用NMF分解,透過替換其中的模板矩陣達成轉換。對於鼓組部分,則是使用非負矩陣分解技術將鼓聲再細分為大鼓、小鼓、高腳鈸與鈸類,利用峰值檢測演算法偵測主要的打擊點,並將其替換為典型晶片音樂風格的鼓聲音效。
本研究結合音高偵測與矩陣分解的方式,實現將日本流行歌曲轉換為晶片音樂風格,在保留原曲結構的同時展現獨特的復古電子音效,最後邀請35位受測者對轉換結果進行評估。
This research aims to develop an automated system to transform Japanese pop songs into chiptune style. Chiptune, a music style originating from the electronic game music of the 1980s, is characterized by its simple sound features and nostalgic ambience. With the recent popularity of retro styles, many music creators have adapted modern pop music into the chiptune style to offer a unique auditory experience. However, manually converting pop music into chiptune requires a certain level knowledge in music theory and is time-consuming. Therefore, this study proposes an automated conversion system to simplify the transformation process and enhance the quality of the results.
The system primarily integrates two techniques: pitch detection and Non-negative Matrix Factorization (NMF). Using a source separation model, Spleeter, the original music is first split into vocals, bass, piano, drums, and other components. Then, this research applied pitch detection to the vocals and bass separately, and utilized chiptune waveforms to regenerate and resynthesize the corresponding waveforms in the time domain. For the piano and other sound effects, this research used NMF to decompose the original spectrum, and conversion was acheieved via replacing the template matrix. The drums were further subdivided into kick, snare, hi-hat, and cymbals by using NMF. Finally peak detection algorithms were applied to identify primary beats and replaced them with typical chiptune drum sounds.
In short, this research combines pitch detection and matrix decomposition to transform Japanese pop songs into chiptune style, preserving the original song structure while presenting a unique retro electronic sound effect. 35 participants were invited to evaluate the conversion results.
References
[1] R. Copetti, “Nintendo entertainment system (nes) architecture- a practical analysis.” https://www.copetti.org/writings/consoles/nes/, 2019.
[2] K. Driscoll and J. Diaz, “Endless loop: A brief history of chiptunes,” Transformative Works and Cultures, vol. 2, no. 1, 2009.
[3] GASHISOFT, “Gxscc beta 236e.” https://gashisoft.web.fc2.com/P/GsorigE.htm, 2002.
[4] S.-Y.Su,C.-K.Chiu,L.Su,andY.-H.Yang,“Automatic conversion of pop music into chiptunes for 8-bit pixel art,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 411–415, IEEE, 2017.
[5] J.Driedger, T.Prätzlich, andM.Müller, “Letitbee-towardsnmf-inspired audio mosaicing.,” in ISMIR, pp. 350–356, 2015.
[6] D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” in Neural Information Processing Systems 2000, pp. 556–562, 2000.
[7] 范綵均, “優化多峰分部之音色特徵強化音訊風格轉換,” Master’sthesis, 國立清華大學,2020.
[8] A. Asesh, “Markov chain sequence modeling,” in 2022 3rd International In formatics and Software Engineering Conference (IISEC), pp. 1–6, IEEE,2022.
[9] C.-W. Wu, C. Dittmar, C. Southall, R. Vogl, G. Widmer, J. Hockman, M. Müller, and A. Lerch, “A review of automatic drum transcription,” IEEE/ACMTransactions on Audio, Speech, and Language Processing,vol.26, no. 9, pp. 1457–1483, 2018.
[10] J. Paulus and T. Virtanen, “Drum transcription with non-negative spectrogram factorisation,” in Proc. European Signal Processing Conf.(EUSIPCO), pp. 556–562, 2005.
[11] C. Dittmar and D. Gartner, “Real-time transcription and separation of drum recordings based on nmf decomposition,” in Proc. Intl. Conf. Digital Audio Effects (DAFx), pp. 187–194, 2014.
[12] C.-W. Wu and A. Lerch, “Drum transcription using partially fixed non negative matrix factorization,” in 2015 23rd European Signal Processing Conference (EUSIPCO), pp. 1281–1285, 2015.
[13] P. Smaragdis, “Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs,” in Independent Component Analysis and Blind Signal Separation: Fifth International Conference, ICA 2004, Granada, Spain, September 22-24, 2004. Proceedings 5, pp. 494–499, Springer, 2004.
[14] H. Lindsay-Smith, S. McDonald, and M. Sandler, “Drumkit transcription via convolutive nmf,” in International Conference on Digital Audio Effects (DAFx), York, UK, 2012.
[15] A. Roebel, J. Pons, M. Liuni, and M. Lagrangey, “On automatic drum transcription using non-negative matrix deconvolution and Itakura Saito divergence,” in 2015 IEEE International Conference on Acoustics, Speech and Sig
nal Processing (ICASSP), pp. 414–418, IEEE, 2015.
[16] P. López-Serrano, C. Dittmar, Y. Özer, and M. Müller, “NMF toolbox: Music processing applications of nonnegative matrix factorization,” in Proceedings of the International Conference on Digital Audio Effects DAFx, vol. 19, pp. 26, 2019.
[17] R. Hennequin, A. Khlif, F. Voituret, and M. Moussallam, “Spleeter: a fast and efficient music source separation tool with pre-trained models,” Journal of Open Source Software, vol. 5, no. 50, p. 2154, 2020.
[18] M. Mauch and S. Dixon, “pYIN: A fundamental frequency estimator using probabilistic threshold distributions,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 659–663, IEEE, 2014.
[19] J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler, “Atutorial on onset detection in music signals,” IEEE Transactions on speech and audio processing, vol. 13, no. 5, pp. 1035–1047, 2005.