酷貓人工智慧 – 從人類的對應方來學習爵士音樂即興作曲的自動化

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭光丸 Giambi, Manuel
論文名稱：	酷貓人工智慧 – 從人類的對應方來學習爵士音樂即興作曲的自動化 CoolCatAI – Tackling the Automated Jazz Improvisation Task by Learning from its Human Counterpart
指導教授：	蘇豐文 Soo, Von-Wun
口試委員:	郭柏志 Kuo, Po-Chih 陳鴻文 Chen, Hong-Wen
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2022
畢業學年度：	110
語文別：	英文
論文頁數：	92
中文關鍵詞：	爵士、音樂、人工智慧
外文關鍵詞：	Jazz
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

N/A

Creative tasks are at the cutting edge of machine learning research, which is seeing many recent improvements, but automated systems are still far from reaching human levels of proficiency and creativity. Advancements in music generation, and in particular jazz music generation, are being slowed down by the lack of sizeable high-quality datasets. In this work, we try to mitigate this problem by curating a large symbolic jazz music dataset that can be used for a number of downstream tasks. This dataset contains improvised melodies (solos), each paired and aligned with its corresponding chord progression and original melody.
Furthermore, we design a family of deep learning models (dubbed ’CoolCatAI’), to test the hypothesis that learning from the human task we are trying to automate can help us achieve better results. We train these models using the newly created dataset and discuss the results.
An analysis of the models’ learned embeddings indicates that the models have learned fundamental music theory concepts and an objective evaluation of the generated music shows promising results in metrics pertaining to four areas: melody, rhythm, harmony and creativity. For most of the metrics, our models surpass the previous approaches. Finally, subjective evaluation results show that the perceived quality and novelty of the music generated by CoolCatAI are comparable to that of human-improvised music.

Abstract (Chinese) I
Abstract II
Contents III
List of Figures VII
List of Tables IX
List of Algorithms X
Introduction 1
1 ThesisStructure............................. 1
2 Motivation................................ 1
3 UnderstandingImprovisation ..................... 2
4 LearningtoImprovise ......................... 3
5 ImprovisationContext ......................... 4
Background 6
1 Terminology............................... 6
1.1 MusicGeneration........................ 6
1.2 Improvisation .......................... 7
1.3 Leadsheet ............................ 7
1.4 ChordProgression ....................... 9
1.5 Melody ............................. 9
1.6 Cycle............................... 9
2 RelatedWork .............................. 10
Methodology 12
1 CoolCatAI................................ 12
1.1 NetworkArchitecture...................... 12
1.2 LSTMNetworks ........................ 14
2 RhythmEncoding............................ 16
2.1 Time-StepEncoding ...................... 16
2.2 DurationEncoding ....................... 19
3 ChordEncoding............................. 24
3.1 CompressedChordEncoding ................. 25
3.2 FixedChordEncoding ..................... 26
3.3 ExtendedChordEncoding................... 27
Dataset 28
1 DataCuration.............................. 28
1.1 ExampleStructure ....................... 28
1.2 DataSources .......................... 29
1.3 ChordProgressions....................... 29
1.4 DuplicateMelodyRemoval................... 30
1.5 FileNamesStandardization .................. 30
1.6 TimeSignatureSelection.................... 30
1.7 Original and Improvised Melodies Tagging .................... 31 4.1.8 MelodyExtraction ....................... 31
1.9 PolyphonyRemoval....................... 31
1.10 MelodyAlignment ....................... 33
1.11 MetadataIntegration...................... 36
2 DatasetAnalysis ............................ 37
2.1 NumberofImprovisedExamplesperSong . . . . . . . . . . 37
2.2 Numberofmeasures ...................... 37
2.3 Numberofnotes ........................ 38
2.4 Noteoffset............................ 38
2.5 Noteduration.......................... 39
2.6 Notepitchandpitchclass ................... 39
2.7 Chordtriad ........................... 41
2.8 Songkey............................. 43
Experiments 44
1 Baseline ................................. 44
2 Training................................. 44
2.1 Inputtensorscreation ..................... 45
2.2 Hyper-parameters........................ 45
2.3 Trainingtermination ...................... 47
3 Generation................................ 47
3.1 Generationhyper-parameters ................. 48
4 ObjectiveEvaluation .......................... 49
4.1 MelodyMetrics......................... 49
4.2 RhythmMetrics......................... 50
4.3 HarmonyMetrics ........................ 51
4.4 CreativityMetrics ....................... 52
5 SubjectiveEvaluation.......................... 53
5.1 PersonalInformation...................... 53
5.2 MusicalEvaluation ....................... 54
Results 57
1 EmbeddingAnalysis .......................... 57
1.1 OffsetEmbeddingAnalysis................... 58
1.2 DurationEmbeddingAnalysis................. 58
1.3 PitchEmbeddingAnalysis................... 61
2 ObjectiveEvaluationResults ..................... 63
2.1 MelodyMetrics......................... 63
2.2 RhythmMetrics......................... 66
2.3 HarmonyMetrics ........................ 67
2.4 CreativityMetrics ....................... 68
3 SubjectiveEvaluationResults ..................... 70
3.1 Demographics.......................... 70
3.2 ScoreAnalysis.......................... 70
3.3 HumanvsComputer ...................... 73
Conclusion 75
1 Contributions .............................. 75 7.2 FutureWork............................... 77
A Dataset 79
A.1 DataSources .............................. 79
Bibliography
                                

[1] Gino Brunner, Yuyi Wang, Roger Wattenhofer, and Jonas Wiesendanger. Jambot: Music theory aware chord based generation of polyphonic music with lstms, 2017.
[2] Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang. Musegan: Multi-track sequential generative adversarial networks for symbolic music gen- eration and accompaniment, 2017.
[3] Jon Gillick, Kevin Tang, and Robert Keller. Learning jazz grammars, 01 2010.
[4] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adver- sarial networks, 2014.
[5] Ga ̈etan Hadjeres, Fran ̧cois Pachet, and Frank Nielsen. Deepbach: A steerable model for bach chorales generation, 2017.
[6] Shunit Haviv Hakimi, Nadav Bhonker, and Ran El-Yaniv. Bebopnet: Deep neural models for personalized jazz improvisations, 2020.
[7] Sepp Hochreiter and Ju ̈rgen Schmidhuber. Long short-term memory, nov 1997.
[8] Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Curtis Hawthorne, Andrew M Dai, Matthew D Hoffman, and Douglas Eck. Music transformer: Generating music with long-term structure, 2018.
[9] Yu-Siang Huang and Yi-Hsuan Yang. Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions, 2020.
[10] Hsiao-Tzu Hung, Chung-Yang Wang, Yi-Hsuan Yang, and Hsin-Min Wang. Improving automatic jazz melody generation by transfer learning techniques, 2019.
[11] Hakan Inan, Khashayar Khosravi, and Richard Socher. Tying word vectors and word classifiers: A loss framework for language modeling, 2016.
[12] George Papadopoulos and Geraint Wiggins. A genetic algorithm for the gen- eration of jazz melodies, 06 2000.
[13] Nicholas Trieu and Robert Keller. Jazzgan: Improvising with generative ad- versarial networks, 08 2018.
[14] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017.
[15] Roberts Waite, Eck and Abolafia. Project magenta: Generating long-term structure in songs and stories, 2016.
[16] Shih-Lun Wu and Yi-Hsuan Yang. The jazz transformer on the front line: Exploring the shortcomings of ai-composed music through quantitative mea- sures, 2020.

簡易檢索 / 詳目顯示

相關論文