用流派融合賽局來學習混合音樂流派的作曲｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	施德睿 Derrick Seaven Bol
論文名稱：	用流派融合賽局來學習混合音樂流派的作曲 Learning to compose music with mixed genres by genre fusion games
指導教授：	蘇豐文 Soo, Von-Wun
口試委員:	朱宏國蘇英俊
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	75
中文關鍵詞：	音樂生成、風格轉移、流派融合、對抗性潛在自動編碼器
外文關鍵詞：	Music Generation, Style Transfer, Genre Fusion, Adversarial Latent Autoencoders
相關次數：	點閱：82 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

音樂創作是一種創造性的進化藝術，可以產生與以前不同的新音樂作品。在這項工作中，我們將音樂創作過程視為一種流派融合遊戲，一種新流派從舊流派演變而來，通過使用風格轉移以微妙的方式將它們簡單地融合在一起，從而產生既具有音樂性又融合了先前流派的音樂。流派。

換句話說，我們將作曲視為兩種遊戲:音樂性遊戲和流派融合遊戲。我們提出了一種 ALAE(Adversarial Latent Autoencoder)，一種類似 GAN 的自動編碼器深度學習架構，來實現這種新的音樂創作理念。因此，我們提出了我們系統的兩種變體:1) 僅音樂性系統和 2) 音樂性流派融合系統。在後者中，我們採用三元組損失函數來訓練流派融合遊戲過程。

我們的完整音樂流派融合系統與兩個基線進行了比較:隨機採樣融合模型和僅音樂性模型。僅音樂性模型使用相同的 ALAE 架構，但只關注匹配真實的音樂分佈。

我們進行了客觀和主觀評估實驗。在客觀實驗中，我們使用了各種指標，例如重建精度、空條比率、複音、音調距離和類型分類器，再次評估我們的系統基線。我們發現我們提出的音樂性流派融合系統能夠學習以融合兩種流派為特徵的風格生成音樂。複音率和空條率接近真實數據分佈的值。雖然增加的音調距離表明我們的系統確實探索了真實音樂數據分佈以外的其他領域，以滿足流派融合遊戲施加的額外限制。

在主觀實驗中，我們發現如果流行和搖滾流派樣本來自各自的數據集，他們能夠快速識別它們。然而，當被要求從我們系統中使用的音樂性流派融合系統中表徵樣本時，他們努力選擇特定的流派並更喜歡“流行和搖滾的混合"。我們還發現，我們的完整音樂風格融合系統和音樂風格遊戲都能夠勝過我們工作中使用的基線。音樂創作是一種創造性的進化藝術，可以產生與以前不同的新音樂作品。在這項工作中，我們將音樂創作過程視為一種流派融合遊戲，一種新流派從舊流派演變而來，通過使用風格轉移以微妙的方式將它們簡單地融合在一起，從而產生既具有音樂性又融合了先前流派的音樂、流派。

換句話說，我們將音樂創作視為兩種遊戲:流派融合遊戲和音樂性遊戲。我們提出了一種 ALAE(對抗性潛在自動編碼器)，一種類似 GAN 的自動編碼器深度學習架構，以實現這種新穎的音樂創作理念。我們還採用三元組損失函數來訓練流派融合遊戲過程。
我們比較的三個基線模型是隨機採樣融合模型，以及刪除了流派融合組件的相同 ALAE 架構。

我們進行了客觀和主觀評估實驗。在客觀實驗中，我們使用了各種指標，例如重建準確度、空小節比率、複音、音調距離和流派分類器，我們發現我們提出的模型能夠學習以融合為特徵的風格的音樂。兩種流派。複音率和空條率接近真實數據分佈的值。雖然增加的音調距離表明我們的模型確實探索了數據分佈以外的其他領域，以滿足流派融合遊戲施加的額外約束。

在主觀實驗中，我們發現如果流行和搖滾流派樣本來自各自的數據集，他們能夠快速識別;然而，他們在處理來自我們系統中使用的各種模型的樣本時遇到了困難。我們還發現，帶有流派融合遊戲和音樂性遊戲的完整模型能夠勝過我們工作中使用的基線。

Music composition is an evolutionary art of creativity that can produce new pieces of music in contrast to the previous ones. In this work, we view the music composition process as a genre fusion game where a piece of new genre evolves from old ones by simply fusing them together in a subtle way using style transfer that results in music that is both musical and a fusion of previous genres.

In other words, we view music composition as two games: a musicality game and a genre fusion game. We propose an ALAE (Adversarial Latent Autoencoder), a GAN-like autoencoder deep learning architecture, to implement this novel idea of music composition. Thus we present two variants of our system: 1) a musicality-only system and a 2) musicalitygenre-fusion system. In the latter, we adopt a triplet loss function to train the genre fusion game process.

Our full musicality-genre-fusion system is compared with two baselines: a random sampling fusion model and the musicality-only model. The musicality-only model uses the same ALAE architecture but focuses only on matching the real music distribution.

We conducted both objective and subjective evaluation experiments. In the objective experiments, we used various metrics such as reconstruction accuracy, empty bar ratio, polyphony, tonal distance, and a genre classifier to evaluate our system again the baselines. We found out that our proposed musicality-genre-fusion system was able to learning to generate music in a style characterized by a fusion of two genres. The polyphony rate and empty bar ratio were close to the real data distribution’s values. While the increased tonal distances showed that our system did indeed explore other areas other than the real music data distribution in order to satisfy the additional constraints imposed by the genre fusion games.

In subjective experiment, we found out that listeners were able to quickly identify pop and rock genres samples if they come from their respective datasets. How ever, they struggled to choose a specify genre and preferred ‘a mixture of pop and rock’ when asked to characterize the samples from the musicality genre fusion system used in our system. We also found that both our full musicality genre fusion system and the musicality game were able to outperform the baselines used in our work.

Introduction........................1
Related Works........................7
1 Multi­-track Symbolic Music Generation ..................... 7
2 Music Style Transfer ...............................9
Background...........................11
1 Musical InstrumentDigital Interface.......................11
2 Music Representation............................... 12
3 Multi­ Track Music ................................ 13
4 Style Transfer in Music.............................. 14
4.1 Music Style Transfer ........................... 14
4.2 Music Genre Fusion ........................... 15
5 Convolutional Neural Networks (CNNs)..................... 16
6 Generative Models ................................ 17
6.1 Autoencoders............................... 17
6.2 Generative Adversarial Networks(GANs). . . . .. . . . . 19
6.3 The Adversarial Latent Autoencoder(ALAE)  . . . . . . . 20
7 The Triplet Loss.................................. 23
Methodology..............................27
1 Dataset ...................................... 27
2 Baseline Architecture............................... 28
3 System Overview................................. 28
4 Musicality Game ................................. 32
5 Genre­ fusion Game................................ 35
Experiments 41
1 The Musicality ­Only System........................... 41
2 The Musicality ­Genre­ Fusion System ...................... 43
2.1 The Musicality Game........................... 44
2.2 The Genre­ Fusion Game......................... 44
Results 51
1 ObjectiveEvaluation ............................... 51
1.1 ObjectiveMetrics............................. 51
1.2 MetricsComparison ........................... 53
1.3 T­SNE Plots to Visualize Multi­track Samples Embeddings . . . . 57
2 SubjectiveEvaluation............................... 60
Conclusion and Future Work.....................65
1 Conclusion .................................... 65
2 FutureWork.................................... 67
Appendix....................69
1 Subjective Listening Test............................. 69
                                

[1] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” 2015.
[2] S. Pidhorskyi, D. A. Adjeroh, and G. Doretto, “Adversarial latent autoencoders,” CoRR, vol. abs/2004.04467, 2020.
[3] S. Dai, Z. Zhang, and G. G. Xia, “Music style transfer: A position paper,” 2018.
[4] G. Brunner, Y. Wang, R. Wattenhofer, and S. Zhao, “Symbolic music genre transfer with cyclegan,” 2018.
[5] W.Marshall,“Dembow,dembow,dembo:Translation and transnation in reggaeton,”Lied und populäre Kultur / Song and Popular Culture, vol. 53, pp. 131–151, 2008.
[6] M. M., M. R. M., M. Levy, . Leroi, and A. M, “The evolution of popular music: Usa 19602010,” Royal Society open science, vol. 2(5), p. 5, 2015.
[7] S. Ji, J. Luo, and X. Yang, “A comprehensive survey on deep music generation: Multi level representations, algorithms, evaluations, and future directions,” 2020.
[8] Z. Chen, C. Wu, Y. Lu, A. Lerch, and C. Lu, “Learning to fuse music genres with generative adversarial dual learning,” CoRR, vol. abs/1712.01456, 2017.
[9] H.W. Dong, W.Y. Hsiao, L.C. Yang, and Y.H. Yang, “Musegan: Multitrack sequential generative adversarial networks for symbolic music generation and accompaniment,” 2017.
[10] G. Brunner, A. Konrad, Y. Wang, and R. Wattenhofer, “Midivae: Modeling dynamics and instrumentation of music with applications to style transfer,” 2018.
[11] A.Valenti, A.Carta, and D.Bacciu,“Learning styleaware symbolic music representations by adversarial autoencoders,” 2020.
[12] C. Donahue, H. H. Mao, Y. E. Li, G. W. Cottrell, and J. McAuley, “Lakhnes: Improving multi instrumental music generation with cross domain pre training,” 2019.
[13] H.W. Dong and Y.H. Yang, “Convolutional generative adversarial networks with binary neurons for polyphonic music generation,” 2018.
[14] F. Guan, C. Yu, and S. Yang, “A gan model with self attention mechanism to generate multi instruments symbolic music,” in 2019 International Joint Conference on Neural Net works (IJCNN), pp. 1–6, 2019.
[15] M. Oza, H. Vaghela, and K. Srivastava, “Progressive generative adversarial binary net works for music generation,” 2019.
[16] G. Chen, Y. Liu, S.h. Zhong, and X. Zhang, “Musicalitynovelty generative adversarial nets for algorithmic composition,” in Proceedings of the 26th ACM International Confer ence on Multimedia, MM ’18, (New York, NY, USA), p. 1607–1615, Association for Computing Machinery, 2018.
[17] A. Roberts, J. Engel, C. Raffel, C. Hawthorne, and D. Eck, “A hierarchical latent vector model for learning longterm structure in music,” 2019.
[18] P. Ashis, “Neural style transfer for musical melodies,” 2018.
[19] A.Makhzani,J.Shlens,N.Jaitly,I.Goodfellow,andB.Frey,“Adversarialautoencoders,” 2016.
[20] J. Wang, C. Jin, W. Zhao, S. Liu, and X. Lv, “An unsupervised methodology for musical style translation,” 2019 15th International Conference on Computational Intelligence and Security (CIS), pp. 216–220, 2019.
[21] H.W. Dong, W.Y. Hsiao, and Y.H. Yang, “Pypianoroll: Open source python package for handling multitrack pianorolls,” 2018.
[22] C. Raffel and D. P. W. Ellis, “Intuitive analysis, creation and manipulation of midi data with pretty_midi,” 2014.
[23] W. Shi, J. Caballero, L. Theis, F. Huszar, A. Aitken, C. Ledig, and Z. Wang, “Is the de convolution layer the same as a convolutional layer?,” 2016.
[24] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. http: //www.deeplearningbook.org.
[25] A. Oussidi and A. Elhassouny, “Deep generative models: Survey,” in 2018 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–8, 2018.
[26] D. P. Kingma and M. Welling, “Autoencoding variational bayes,” 2014.
[27] I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 2014.
[28] T. Karras, S. Laine, and T. Aila, “A style based generator architecture for generative ad versarial networks,” 2019.
[29] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” 2017.
[30] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” 2017.
[31] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” 2018.
[32] L. Mescheder, A. Geiger, and S. Nowozin, “Which training methods for gans do actually converge?,” 2018.
[33] F.Schroff, D.Kalenichenko, and J.Philbin,“Facenet: A unified embedding for face recognition and clustering,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2015.
[34] A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re identification,” 2017.
[35] E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” 2018.
[36] H. Xuan, A. Stylianou, and R. Pless, “Improved embeddings with easy positive triplet mining,” 2020.
[37] C.Raffel,“Learningbasedmethodsforcomparingsequences,withapplicationstoaudio tomidi alignment and matching,” PhD Thesis, 2016.
[38] T.BertinMahieux,D.P.Ellis,B.Whitman,andP.Lamere,“Themillionsongdataset,”in Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 2011.
[39] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Te jani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, highperformance deep learning library,” in Advances in Neural Information Pro cessing Systems 32 (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, eds.), pp. 8024–8035, Curran Associates, Inc., 2019.
[40] A. Elnekave, “Pytorch alae.” https://github.com/ariel415el/SimplePytorch-ALAE, 2020.
[41] H.W. Dong, “Generating music with gans ismir 2019 tutorial.” https://github.com/ salu133445/ismir2019tutorial, 2020.
[42] K. W. Cheuk, “Pytorch triplet loss and online mining.” https://github.com/ KinWaiCheuk/pytorch-triplet-loss, 2020.
[43] D. M. Chan, R. Rao, F. Huang, and J. F. Canny, “Gpu accelerated tdistributed stochastic neighbor embedding,” Journal of Parallel and Distributed Computing, vol. 131, pp. 1–13, 2019.
[44] L. van der Maaten, “Accelerating tsne using treebased algorithms,” Journal of Machine Learning Research, vol. 15, no. 93, pp. 3221–3245, 2014.

簡易檢索 / 詳目顯示

相關論文