研究生: |
徐佳瑄 Hsu, Chia-Hsuan |
---|---|
論文名稱: |
基於人工智慧使用微調音樂轉換器網路之流行歌曲風格模仿 AI-based Style Imitation of Pop Songs Using Fine-tuned Music Transformers |
指導教授: |
蘇黎
Su, Li 蘇郁惠 Su, Yu-Huei |
口試委員: |
丁川康
Ting, Chuan-Kang 楊奕軒 Yang, Yi-Hsuan 黃郁芬 Huang, Yu-Fen |
學位類別: |
碩士 Master |
系所名稱: |
藝術學院 - 音樂學系所 Music |
論文出版年: | 2022 |
畢業學年度: | 111 |
語文別: | 中文 |
論文頁數: | 45 |
中文關鍵詞: | 人工智慧 、自動和弦識別 、演算法作曲 、自動作曲 、預訓練模型 、深度學習 、微調 |
外文關鍵詞: | algorithmic, pre-training |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
摘要
近年來隨著越來越多的自動音樂生成模型被提出,我們觀察這些模型通常有以下問題:1. 對於程式不熟悉的音樂愛好者來說訓練困難,且需要大量運算資源及耗費很長的訓練時間2. 在人類音樂創作過程中,希望使用自己想要的風格來創作,但自動音樂生成研究中較少討論「個人化輸入音樂生成模型」。故我們想研究出一套按照個人喜好音樂風格微調預訓練好的模型作為一種可供音樂愛好者與音樂創作者創作音樂更有效率且具個人化風格方式,又因微調的技術門檻相較於從頭訓練自動音樂生成模型容易許多,期待微調預訓練模型生成音樂可推廣成為適合更多音樂創作者使用音樂生成模型創作音樂的方法。
我們探討使用流行音樂資料集於風格相近的模型(資料集風格為流行音樂)與風格相去較遠的模型(資料集風格為古典音樂)在微調的方法之下自動生成音樂做比較且進行主觀測試問卷調查。經研究實驗:準備數量為十首的音樂資料集微調生成音樂,並完成模型標注資料集的自動化,增進資料前處理的效率。我們亦進行主觀問卷測試,調查兩模型於好聽程度、創意性、驚喜程度、輔助音樂創作程度的評分與成效,並分析實驗結果推薦大家如何選擇微調模型,更有效的使用自動作曲系統進行音樂創作。
Abstract
In recent years, as more and more automatic music generation models are proposed, we observe that these models usually have the following problems: 1. they are difficult to train for music lovers who are not familiar with the program, and they require a lot of computing resources and take a long training time; 2. in the process of human music creation, they want to use their own desired style to create music, but "personalized input music generation models" are less discussed in automatic music generation studies. The "personalized input music generation model" has been less discussed in automatic music generation research. Therefore, we would like to develop a set of pre-trained models that can be fine-tuned according to individual preferences as a more efficient and personalized way for music lovers and music composers to create music, and because the technical threshold of fine-tuning is much easier than training automatic music generation models from scratch, we expect that fine-tuning pre-trained models can be promoted as a suitable method for more music composers to create music using music generation models. We hope that fine-tuning pre-training models can be promoted as a suitable method for more music composers to create music using music generation models.
We investigate the use of popular music dataset to generate music automatically under the fine-tuning method using a model with a similar style (dataset style is popular music) and a model with a more distant style (dataset style is classical music) and conduct a subjective test questionnaire survey. After the research experiment, we prepared a number of ten music datasets to generate music by fine-tuning and automated the model annotation dataset to improve the efficiency of pre-processing. We also conducted a subjective questionnaire survey to investigate the ratings and effectiveness of the two models in terms of goodness of sound, creativity, surprise, and the degree of assistance in music creation
參考文獻
外文文獻
Allen, H., & Raymond, W. (2016). Deep learning for music. arXiv preprint arXiv:1606.04930.
Alpern, A. (1995). Techniques for algorithmic composition of music. On the web: http://hamp. hampshire. edu/adaF92/algocomp/algocomp, 95(1995), 120.
Briot, J.-P. (2021). From artificial neural networks to deep learning for music generation: history, concepts and trends. Neural Computing and Applications, 33(1), 39-65.
Briot, J.-P., Hadjeres, G., & Pachet, F.-D. (2017). Deep learning techniques for music generation--a survey. arXiv preprint arXiv:1709.01620.
Brunner, G., Konrad, A., Wang, Y., & Wattenhofer, R. (2018). MIDI-VAE: Modeling dynamics and instrumentation of music with applications to style transfer. arXiv preprint arXiv:1809.07600.
Da Silva, P. (2003). David Cope and Experiments in Musical Intelligence. In: Spectrum Press.
David Cope, N. D. i. M. t. e. W. C. B. D., Iowa. 259 pp. (1984).
Dhariwal, P., Jun, H., Payne, C., Kim, J. W., Radford, A., & Sutskever, I. (2020). Jukebox: A generative model for music. Computing Research Repository (CoRR).
Ellis, D. P. (2007). Beat tracking by dynamic programming. Journal of New Music Research, 36(1), 51-60.
Garcia, C. (2015). Algorithmic Music–David Cope and EMI. Computer History Museum, 29.
Gardner, J., Simon, I., Manilow, E., Hawthorne, C., & Engel, J. (2021). Mt3: Multi-task multitrack music transcription. in International Conference on Learning Representations.
Hadjeres, G., Pachet, F., & Nielsen, F. (2017). Deepbach: a steerable model for bach chorales generation. International Conference on Machine Learning.
Hao-Wen, D., Wen-Yi, H., & Yi-Hsuan, Y. (2018). Pypianoroll: Open source Python package for handling multitrack pianoroll. in Late-Breaking Demos of the 19th International Society for Music Information Retrieval Conference.
Hawthorne, C., Simon, I., Swavely, R., Manilow, E., & Engel, J. (2021). Sequence-to-sequence piano transcription with Transformers. in International Society of Music Information Retrieval Conference.
Herbert, J. B. (1897). Herbert's Harmony and Composition.
Hernandez-Olivan, C., & Beltran, J. R. (2021). Music composition with deep learning: A review. arXiv preprint arXiv:2108.12290.
Huang, C.-Z. A., Hawthorne, C., Roberts, A., Dinculescu, M., Wexler, J., Hong, L., & Howcroft, J. (2019). The bach doodle: Approachable music composition with machine learning at scale. in Proceedings of the 20th International Society for Music Information Retrieval Conference.
Huang, C.-Z. A., Koops, H. V., Newton-Rex, E., Dinculescu, M., & Cai, C. J. (2020). AI song contest: Human-AI co-creation in songwriting. in International Society of Music Information Retrieval Conference.
Huang, C.-Z. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Hawthorne, C., Dai, A., Hoffman, M., & Eck, D. (2018). Music transformer: Generating music with long-term structure. in International Conference on Learning Representations.
Ji, S., Luo, J., & Yang, X. (2020). A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions. arXiv preprint arXiv:2011.06801.
Louie, R., Coenen, A., Huang, C. Z., Terry, M., & Cai, C. J. (2020). Novice-AI music co-creation via AI-steering tools for deep generative models. in Proceedings of the 2020 CHI conference on human factors in computing systems,
Martineau, J. (2008). The Elements of Music. In: New York: Walker Publishing.
Maurer, J., & John, A. (1999). A brief history of algorithmic composition. Unpublished manuscript. Available at https://ccrma. stanford. edu/~ blackrse/algorithm. html.
Minh-Thang, L., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing, 452, 48-62.
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. International conference on machine learning.
Ruxton, G. D. (2006). The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test. Behavioral Ecology, 17(4), 688-690.
Tajbakhsh, N., Shin, J. Y., Gurudu, S. R., Hurst, R. T., Kendall, C. B., Gotway, M. B., & Liang, J. (2016). Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging, 35(5), 1299-1312.
Tsung-Ping, C., & Li, S. (2018). Functional Harmony Recognition of Symbolic Music Data with Multi-task Recurrent Neural Networks. in International Society of Music Information Retrieval Conference.
Tsung-Ping, C., & Li, S. (2019). Harmony Transformer: Incorporating chord segmentation into harmony recognition. in International Society of Music Information Retrieval Conference., 12, 15.
Tsung-Ping, C., & Li, S. (2021). Attend to chords: Improving harmonic analysis of symbolic music using transformer-based models. Transactions of the International Society for Music Information Retrieval, 4(1).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wei-Tsung, L., Wang, J.-C., Won, M., Choi, K., & Song, X. (2021). SpecTNT: A time-frequency transformer for music audio. in International Society of Music Information Retrieval Conference.
Wei-Tsung, L., Wu, M.-H., Chiu, Y.-M., & Su, L. (2021). Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience. in Proceedings of the 29th ACM International Conference on Multimedia.
Yi-Hui, C., Chen, I., Chang, C.-J., Ching, J., & Yang, Y.-H. (2021). MidiBERT-piano: large-scale pre-training for symbolic music understanding. arXiv preprint arXiv:2107.05223.
Yu-Siang, H., & Yi-Hsuan, Y. (2020). Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. Proceedings of the 28th ACM International Conference on Multimedia.
中文文獻
吳孟軒. (2021). 考慮調性、和聲與樂句的音樂自動生成. 國立台灣大學資訊工程學研究所碩士論文.
吳明隆、涂金堂. (2014). SPSS 與統計應用分析. 五南圖書出版股份有限公司.