簡易檢索 / 詳目顯示

研究生: 鄭語芳
Zheng, Yu-Fang
論文名稱: 將日本流行歌曲轉換為晶片音樂風格 - 結合音高偵測與矩陣分解之轉換系統設計與實作
From J-Pop to Chiptune: Design and Implementation of a Conversion System Based on Pitch Detection and Matrix Decomposition
指導教授: 劉奕汶
Liu, Yi-Wen
口試委員: 蘇黎
Su, Li
黃元豪
Huang, Yuan-Hao
程瓊瑩
Cheng, Chiung-Ying
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 113
語文別: 中文
論文頁數: 52
中文關鍵詞: 曲風轉換非負矩陣分解音高偵測峰值偵測
外文關鍵詞: Music style transfer, Nonnegative Matrix Factorization, Pitch detection, Peak detection
相關次數: 點閱:50下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在開發一套將日本流行歌曲轉為晶片音樂的自動轉換系統。晶片音樂是一種源自 1980 年代電子遊戲音樂的音樂風格,具有簡單的聲音特徵和懷舊氛圍。隨著近年來復古風格的流行,許多音樂創作者將現代流行音樂改編為晶片音樂風格,以展現獨特的聽覺體驗。然而手動轉換流行音樂為晶片音樂需要具備一定程度的樂理知識,並且耗費時間和精力,因此本研究提出了一套自動轉換系統,旨在降低轉換過程的複雜度,並提升轉換結果的品質。

    本系統主要結合了音高偵測與非負矩陣分解(Non-negative Matrix Factorization, NMF)兩個技術。首先透過開源的音源分離模型Spleeter將原始音樂拆分為人聲、貝斯、鋼琴、鼓組和其他部分。然後,分別對人聲、貝斯進行音高偵測,使用晶片音樂風格中的波型重新生成,並在時域做合成。對於鋼琴及其他音效則使用NMF分解,透過替換其中的模板矩陣達成轉換。對於鼓組部分,則是使用非負矩陣分解技術將鼓聲再細分為大鼓、小鼓、高腳鈸與鈸類,利用峰值檢測演算法偵測主要的打擊點,並將其替換為典型晶片音樂風格的鼓聲音效。

    本研究結合音高偵測與矩陣分解的方式,實現將日本流行歌曲轉換為晶片音樂風格,在保留原曲結構的同時展現獨特的復古電子音效,最後邀請35位受測者對轉換結果進行評估。


    This research aims to develop an automated system to transform Japanese pop songs into chiptune style. Chiptune, a music style originating from the electronic game music of the 1980s, is characterized by its simple sound features and nostalgic ambience. With the recent popularity of retro styles, many music creators have adapted modern pop music into the chiptune style to offer a unique auditory experience. However, manually converting pop music into chiptune requires a certain level knowledge in music theory and is time-consuming. Therefore, this study proposes an automated conversion system to simplify the transformation process and enhance the quality of the results.

    The system primarily integrates two techniques: pitch detection and Non-negative Matrix Factorization (NMF). Using a source separation model, Spleeter, the original music is first split into vocals, bass, piano, drums, and other components. Then, this research applied pitch detection to the vocals and bass separately, and utilized chiptune waveforms to regenerate and resynthesize the corresponding waveforms in the time domain. For the piano and other sound effects, this research used NMF to decompose the original spectrum, and conversion was acheieved via replacing the template matrix. The drums were further subdivided into kick, snare, hi-hat, and cymbals by using NMF. Finally peak detection algorithms were applied to identify primary beats and replaced them with typical chiptune drum sounds.

    In short, this research combines pitch detection and matrix decomposition to transform Japanese pop songs into chiptune style, preserving the original song structure while presenting a unique retro electronic sound effect. 35 participants were invited to evaluate the conversion results.

    1緒論 1.1研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2文獻回顧. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1晶片音樂的研究跟發展. . . . . . . . . . . . . . . . . . . . 1 1.2.2晶片音樂相關的曲風轉換. . . . . . . . . . . . . . . . . . . 2 1.2.3鼓聲轉錄. . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4主要貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2晶片音樂簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1晶片音樂歷史起源. . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2晶片音樂特徵. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3帶來的文化影響. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4現有的創作工具. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 FamiTracker . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.2 FamiStudio . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.3 GameBoy&LSDj. . . . . . . . . . . . . . . . . . . . . . . 13 3系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1人聲轉換. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1靜音檢測. . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2音高偵測及處理. . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.3時域合成. . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2貝斯轉換. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3多音轉換. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4鼓聲轉換. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4.1分段NMFD. . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4.2起始偵測. . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.3替換音效與振幅調整. . . . . . . . . . . . . . . . . . . . . 29 4轉換結果與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1實驗的問卷與設計. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2個別轉換結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.3整體轉換與原曲比較結果. . . . . . . . . . . . . . . . . . . . . . . 33 4.4整體轉換與網路上創作版本比較結果. . . . . . . . . . . . . . . . . 34 5結論. . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.1優化人聲轉換系統的自然度. . . . . . . . . . . . . . . . . . . . . . 37 6.2嘗試其他曲風歌曲. . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.3開發系統介面. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Appendix. . . . . . . . . . . . . . . . . . . . . . . . . . . 41 A.1聽測問卷. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 A.2口試委員建議. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.2.1黃元豪教授. . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.2.2程瓊瑩教授. . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.2.3蘇黎教授. . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    References
    [1] R. Copetti, “Nintendo entertainment system (nes) architecture- a practical analysis.” https://www.copetti.org/writings/consoles/nes/, 2019.
    [2] K. Driscoll and J. Diaz, “Endless loop: A brief history of chiptunes,” Transformative Works and Cultures, vol. 2, no. 1, 2009.
    [3] GASHISOFT, “Gxscc beta 236e.” https://gashisoft.web.fc2.com/P/GsorigE.htm, 2002.
    [4] S.-Y.Su,C.-K.Chiu,L.Su,andY.-H.Yang,“Automatic conversion of pop music into chiptunes for 8-bit pixel art,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 411–415, IEEE, 2017.
    [5] J.Driedger, T.Prätzlich, andM.Müller, “Letitbee-towardsnmf-inspired audio mosaicing.,” in ISMIR, pp. 350–356, 2015.
    [6] D. Lee and H. Seung, “Algorithms for non-negative matrix factorization,” in Neural Information Processing Systems 2000, pp. 556–562, 2000.
    [7] 范綵均, “優化多峰分部之音色特徵強化音訊風格轉換,” Master’sthesis, 國立清華大學,2020.
    [8] A. Asesh, “Markov chain sequence modeling,” in 2022 3rd International In formatics and Software Engineering Conference (IISEC), pp. 1–6, IEEE,2022.
    [9] C.-W. Wu, C. Dittmar, C. Southall, R. Vogl, G. Widmer, J. Hockman, M. Müller, and A. Lerch, “A review of automatic drum transcription,” IEEE/ACMTransactions on Audio, Speech, and Language Processing,vol.26, no. 9, pp. 1457–1483, 2018.
    [10] J. Paulus and T. Virtanen, “Drum transcription with non-negative spectrogram factorisation,” in Proc. European Signal Processing Conf.(EUSIPCO), pp. 556–562, 2005.
    [11] C. Dittmar and D. Gartner, “Real-time transcription and separation of drum recordings based on nmf decomposition,” in Proc. Intl. Conf. Digital Audio Effects (DAFx), pp. 187–194, 2014.
    [12] C.-W. Wu and A. Lerch, “Drum transcription using partially fixed non negative matrix factorization,” in 2015 23rd European Signal Processing Conference (EUSIPCO), pp. 1281–1285, 2015.
    [13] P. Smaragdis, “Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs,” in Independent Component Analysis and Blind Signal Separation: Fifth International Conference, ICA 2004, Granada, Spain, September 22-24, 2004. Proceedings 5, pp. 494–499, Springer, 2004.
    [14] H. Lindsay-Smith, S. McDonald, and M. Sandler, “Drumkit transcription via convolutive nmf,” in International Conference on Digital Audio Effects (DAFx), York, UK, 2012.
    [15] A. Roebel, J. Pons, M. Liuni, and M. Lagrangey, “On automatic drum transcription using non-negative matrix deconvolution and Itakura Saito divergence,” in 2015 IEEE International Conference on Acoustics, Speech and Sig
    nal Processing (ICASSP), pp. 414–418, IEEE, 2015.
    [16] P. López-Serrano, C. Dittmar, Y. Özer, and M. Müller, “NMF toolbox: Music processing applications of nonnegative matrix factorization,” in Proceedings of the International Conference on Digital Audio Effects DAFx, vol. 19, pp. 26, 2019.
    [17] R. Hennequin, A. Khlif, F. Voituret, and M. Moussallam, “Spleeter: a fast and efficient music source separation tool with pre-trained models,” Journal of Open Source Software, vol. 5, no. 50, p. 2154, 2020.
    [18] M. Mauch and S. Dixon, “pYIN: A fundamental frequency estimator using probabilistic threshold distributions,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 659–663, IEEE, 2014.
    [19] J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler, “Atutorial on onset detection in music signals,” IEEE Transactions on speech and audio processing, vol. 13, no. 5, pp. 1035–1047, 2005.

    QR CODE