研究生: |
朱沛全 Chu, Pei-Chuan |
---|---|
論文名稱: |
基於機器學習及音樂數據分析的自動作曲系統 An Automated Composition System based on Machine Learning and Music Data Analysis |
指導教授: |
黃志方
Huang, Chih-Fang 蘇郁惠 Su, Yu-Huei |
口試委員: |
鄭泗東
Cheng, Stone 蘇黎 Su, Li |
學位類別: |
碩士 Master |
系所名稱: |
藝術學院 - 音樂學系所 Music |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 114 |
中文關鍵詞: | 自動作曲 |
外文關鍵詞: | Automated Composition |
相關次數: | 點閱:4 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
近年來,隨著人工智慧的興起、通過神經網路、深度學習等技術進行音樂創作已是人工智慧領域中備受關注的題目。本篇論文提出一套基於和弦進行的旋律自動生成系統,於系統輸入指定的和弦進行後,可以經由機器學習模型預測主旋律、同時透過訓練資料集的音樂特徵統計數據做自動化的自我評估,進而產出經篩選過的旋律結果。系統中,旋律預測的部分由長短期記憶單元(LSTM)組成的遞迴神經網路所負責,預測出的旋律將經過篩選器評分,進而得到最高分的一組旋律輸出。本文中也分析了訓練資料集中歌曲的音樂特徵,並將分析所得之統計數據作為篩選器的評分基準。系統中實作了包含 LSTM 及 雙向 LSTM 兩種機器學習模型,另外我們也實作了以馬爾可夫鏈作為模型的預測系統,將之作為比較。產出旋律均經過問卷調查,蒐集受測者的主觀聆聽感受。結果顯示,當使用 LSTM 與 BLSTM 作為預測模型時,聽眾給出高於平均值的評分;使用 BLSTM 作為預測模型時,產出的旋律具備最佳表現。
Automated composition has become a topic of great concern in the field of artificial intelligence in recent years. In this study, an automated composition system based on machine Learning and music data analysis has been proposed. With a set of chords inputted as initializing information, the main melody is able to be predicted through machine learning models. At the same time, self-assessment is automatically performed based on the music feature statistics of the training data set. In this system, melody prediction is utilized by recurrent neural network composed of long short-term memory units (LSTMs). The predicted melody will be scored by a filter to obtain the highest set of melody outputs. The music characteristics of the songs in the training data set has been analyzed as well, and the obtained statistical data were used as the criteria for the melody filter. Machine learning models including LSTM and bidirectional-LSTM (BLSTM) are implemented in this study. In addition, a system using Markov chains as a prediction model has also been implement for comparison. The output melody is subject to a questionnaire survey to collect subjective listening experience. The results show that when LSTM and BLSTM are used as prediction models, the audience gives a score higher than the average; the system shows best performance when bidirectional LSTM is used as the prediction model.
參考文獻
ÖzcanEnder, & ErçalTürker. (2007). A Genetic Algorithm for Generating Improvised Music. International Conference on Artificial Evolution (Evolution Artificielle), (頁 266-277).
张苾荍, & 韩圣龙 . (2012). 基于色度特征和动态时间卷曲算法的音频与乐谱对位. 现代图书情报技术.
宫富艺. (2012). 音乐作品中调内和弦的辨析. 山东艺术学院学报, 頁 25-34.
A. Graves, & M. Liwicki. (2009). A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.
A. Prechtl, R. Laney, A. Willis, & R. Samuels. (2014). Algorithmic music as intelligent game music. Proceedings of the AISB Anniversary Convention.
A. Schoenberg, G. Strang, & L. Stein. (1967). Fundamentals of Musical Composition. Faber & Faber.
AmesCharles. (1989). The Markov Process as a Compositional Model: A Survey and Tutorial. Leonardo 22(2), 頁 175-187.
Andres E. Coca, Debora C. Correa, & Liang Zhao. (2013). Computer-Aided Music Composition with LSTM Neural Network and Chaotic Inspiration. International Joint Conference on Neural Networks (IJCNN).
B. Pearlmutter. (1995). Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Network.
BalliauwMatteo , HerremansDorien , CuervoPalhaziDaniel , & SörensenKenneth . (2015). Generating Fingerings for Polyphonic Piano Music with a Tabu Search Algorithm. Lecture Notes in Computer Science.
BigandEmmanuel , & ParncuttRichard . (1999). Perceiving musical tension in long chord sequences. Psychological Research, 頁 237-254.
BrittinV.Ruth , & Duke A. Robert. (1997). Continuous versus Summative Evaluations of Musical Intensity: A Comparison of Two Methods for Measuring Overall Effect. Journal of Research in Music Education.
ChenC.J., & MiikkulainenR. (2001). Creating melodies with evolving recurrent neural networks. International Joint Conference on Neural Networks, 頁 2241-2246.
ChoiKeunwoo , FazekasGeorge , & Sandler Mark . (2016). Automatic tagging using deep convolutional neural networks. ISMIR 2016.
CocaE.A., CorrêaC.D., & ZhaoL. (2013). Computer Aided Music Composition with LSTM Neural Network and Chaotic Inspiration. The 2013 International Joint Conference on Neural Networks (IJCNN), (頁 1-7).
CocaE.A., RomeroA.F.R., & ZhaoL. (2011). Generation of composed musical structures through recurrent neural networks based on chaotic inspiration. The 2011 International Joint Conference on Neural Networks, (頁 3220-3226).
CronbachJ.L. (1951). COEFFICIENT ALPHA AND THE INTERNAL STRUCTURE OF TESTS.
Diaz-Jerez. (2000). Algorithmic music: Using mathematical models in music composition. The Manhattan School of Music.
DongHW , Hsiao WY , YangLC , & YangYH. (2018). Musegan Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Publications.
EckD., & SchmidhuberJ. (2002). Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks. Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, 頁 747-756.
F. Gers. (2001). Long Short-Term Memory in Recurrent Neural Networks.
FernándezDavidJose, & VicoFrancisco. (2013). AI Methods in Algorithmic Composition: A Comprehensive Survey.
FredricksonE. William . (1999). Effect of Musical Performance on Perception of Tension in Gustav Hoist's First Suite in E-flat. Journal of Research in Music Education.
G. Diaz-Jerez. (2000). Algorithmic music: Using mathematical models in music composition. Ph.D. dissertation, The Manhattan School of Music.
Geraint Anthony Wiggins. (1999). AI methods for algorithmic composition: A survey, a critical view and future prospects. Scotland.
GersA.F., & SchmidhuberE. (2001). LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 頁 1333-1340.
GersA.F., SchmidhuberJ., & CumminsF. (1999). Learning to forget: continual prediction with LSTM. IET Conference Proceedings, (頁 850-855).
GravesAlex, & SchmidhuberJürgen. (無日期). Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Neural Information Processing Systems (NIPS).
H. Sak, A. Senior, & F. Beaufays. (2014). Long Short- Term Memory Based Recurrent Neural Network Architectures for Large V ocabulary Speech Recognition. CoRR.
HadjeresGaëtan , PachetFrançois , & NielsenFrank . (2017). DeepBach a Steerable Model for Bach Chorales Generation. Proceedings of the 34th International Conference on Machine Learning, (頁 1362-1371).
HitoshiIba, & NaoTokui. (2000). Music Composition with Interactive Evolutionary Computation. Proceedings of the third international conference.
Hyungui Lim, Seungyeon Rhyu, & Kyogu Lee. (2017). Chord Generation from Symbolic Melody Using BLSTM Networks. ISMIR.
I. Liu, & B. Ramakrishnan. (2014). Bach in 2014: Music Composition with Recurrent Neural Network.
Jiang Minjun , & Zhou Changle . (2010). Automated composition system based on GA. IEEE International Conference on Intelligent Systems and Knowledge Engineering, (頁 380-383).
Jose David Fernández, & Francisco Vico. (2013). AI Methods in Algorithmic Composition: A Comprehensive Survey (第 48 冊). Spain: Journal of Artificial Intelligence Research .
K. Choi, G. Fazekas, & M. Sandler. (2016). Text-based LSTM Networks for Automatic Music Composition.
Klumpenhouwer, & Henry. (1994). Some Remarks on the Use of Riemann Transformations.
Likert, & Rensis. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology.
LimHyungui , Rhyu Seungyeon , & LeeKyogu . (2017). Chord Generation from Symbolic Melody Using BLSTM Networks. ISMIR 2017.
LiuC., & TingC. (2015). Music pattern mining for chromosome representation in evolutionary composition. IEEE Congress on Evolutionary Computation (CEC), 頁 2145-2152.
LiuI-Ting, & RamakrishnanBhiksha . (2014). BACH IN 2014 MUSIC COMPOSITION WITH RECUR RENT NEURAL NETWORK. ICLR 2015.
M. Allan, & C. Williams. (2005). Harmonizing Chorales by Probabilistic Inference. Advances in Neural Information Processing Systems.
M. Miller. (2005). The Complete Idiot’s Guide to Music Composition. Alpha.
M. Wöllmer, A. Metallinou, F. Eyben, & B. Schuller. (2010). Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression Using Bidirectional LSTM Modeling. Interspeech.
ManarisBill , HughesDana , & VassilandonakisYiorgos . (2011). Monterey Mirror: Combining Markov Models, Genetic Algorithms, and Power Laws. IEEE Congress on Evolutionary Computation (CEC 2011), (頁 33-40).
MattE. (2019). What is Tension and Release in Music? (and how do you create it?). 擷取自 schoolofcomposition: https://www.schoolofcomposition.com/what-is-tension-and-release-in-music/
Palmer, Manus, & Lethco. (1994). The Complete Book of Scales, Chords, Arpeggios and Cadences. US: Alfred Pub Co.
PapadopoulosGeorge, & WigginsGeraint. (2000). A Genetic Algorithm for the Generation of Jazz Melodies.
ParkH. , & YooC. D. . (2017). Melody extraction and detection through LSTM-RNN with harmonic sum loss. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 頁 2766-2770.
PrechtlAnthony, LaneyRobin, WillisAlistair, & SamuelsRobert . (2014). Algorithmic music as intelligent game music. Proceedings of the AISB Anniversary Convention.
RameauP. J. . (1722). Traité de l'harmonie réduite à ses principes naturels.
RooksbRikky. (2004). Melody: How to Write Great Tunes. US: Backbeat.
S. A. Raczyński, S. Fukayama, & E. Vincent. (2013). Melody Harmonization with Interpolated Probabilistic Models. Journal of New Music Research.
S. Hochreiter, Y. Bengio, P. Frasconi, & J. Schmi. (2001). Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Networks, IEEE Press.
S. Sertan, & P. Chordia. (無日期). Modeling melodic improvisation in Turkish folk music using variable-length Markov models. Proceedings of the International Society for Music Information Retrieval Conference .
SakHaşim , SeniorAndrew, & BeaufaysFrançoise. (2014). Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition.
SalvadorStan , & ChanPhilip. (2007). Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 頁 561-580.
SchusterM. , & PaliwalK. K. (1997). Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 頁 2673-2681.
SLOBODAA. JOHN , & LEHMANNC. ANDREAS . (2001). Tracking Performance Correlates of Changes in Perceived Intensity of Emotion During Different Interpretations of a Chopin Piano Prelude. Music Perception, 頁 87–120.
SteedmanJ. Mark. (1984). A Generative Grammar for Jazz Chord Sequences. Music Perception: An Interdisciplinary Journal, 頁 52-77.
StrausN. Joseph . (2004). Introduction to Post-Tonal Theory (3rd Edition). Pearson.
VargasV.F., FusterA.J., & CastañónBeltránC. (2015). Artificial musical pattern generation with genetic algorithms. Latin America Congress on Computational Intelligence (LA-CCI), 頁 1-5.
WöllmerM., MetallinouA., & EybenF. (2010). Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression using Bidirectional LSTM Modeling. INTERSPEECH, 頁 2362-2365.
YiikselA. Ç. , Karci M. M., & UyarA. Ş. . (2011). Automatic music generation using evolutionary algorithms and neural networks. International Symposium on Innovations in Intelligent Systems and Applications, 頁 354-358.
YuYi , & CanalesSimon . (2019). Conditional LSTM-GAN for Melody Generation from Lyrics.
大禾音樂製作編輯部. (2019). 音樂製作工具書. 台北市: 大禾音樂製作有限公司.
吳源鈁. (2017). 和聲學. 大陸書店.
樊祖荫. (2002). 和聲外音与旋律化音型. 校园歌声.
蔡振家. (2013). 音樂認知心理學. 臺大出版中心.
蔡振家. (2013). 音樂認知心理學. 國立臺灣大學出版中心.