簡易檢索 / 詳目顯示

研究生: 朱沛全
Chu, Pei-Chuan
論文名稱: 基於機器學習及音樂數據分析的自動作曲系統
An Automated Composition System based on Machine Learning and Music Data Analysis
指導教授: 黃志方
Huang, Chih-Fang
蘇郁惠
Su, Yu-Huei
口試委員: 鄭泗東
Cheng, Stone
蘇黎
Su, Li
學位類別: 碩士
Master
系所名稱: 藝術學院 - 音樂學系所
Music
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 114
中文關鍵詞: 自動作曲
外文關鍵詞: Automated Composition
相關次數: 點閱:4下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,隨著人工智慧的興起、通過神經網路、深度學習等技術進行音樂創作已是人工智慧領域中備受關注的題目。本篇論文提出一套基於和弦進行的旋律自動生成系統,於系統輸入指定的和弦進行後,可以經由機器學習模型預測主旋律、同時透過訓練資料集的音樂特徵統計數據做自動化的自我評估,進而產出經篩選過的旋律結果。系統中,旋律預測的部分由長短期記憶單元(LSTM)組成的遞迴神經網路所負責,預測出的旋律將經過篩選器評分,進而得到最高分的一組旋律輸出。本文中也分析了訓練資料集中歌曲的音樂特徵,並將分析所得之統計數據作為篩選器的評分基準。系統中實作了包含 LSTM 及 雙向 LSTM 兩種機器學習模型,另外我們也實作了以馬爾可夫鏈作為模型的預測系統,將之作為比較。產出旋律均經過問卷調查,蒐集受測者的主觀聆聽感受。結果顯示,當使用 LSTM 與 BLSTM 作為預測模型時,聽眾給出高於平均值的評分;使用 BLSTM 作為預測模型時,產出的旋律具備最佳表現。


    Automated composition has become a topic of great concern in the field of artificial intelligence in recent years. In this study, an automated composition system based on machine Learning and music data analysis has been proposed. With a set of chords inputted as initializing information, the main melody is able to be predicted through machine learning models. At the same time, self-assessment is automatically performed based on the music feature statistics of the training data set. In this system, melody prediction is utilized by recurrent neural network composed of long short-term memory units (LSTMs). The predicted melody will be scored by a filter to obtain the highest set of melody outputs. The music characteristics of the songs in the training data set has been analyzed as well, and the obtained statistical data were used as the criteria for the melody filter. Machine learning models including LSTM and bidirectional-LSTM (BLSTM) are implemented in this study. In addition, a system using Markov chains as a prediction model has also been implement for comparison. The output melody is subject to a questionnaire survey to collect subjective listening experience. The results show that when LSTM and BLSTM are used as prediction models, the audience gives a score higher than the average; the system shows best performance when bidirectional LSTM is used as the prediction model.

    摘要……………………………………………………………………………………………i 目錄………………………………………………………………………………………iii 表目錄………………………………………………………………………………………vii 圖目錄………………………………………………………………………………………viii 第一章、緒論…………………………………………………………………………………1 1.1 研究動機…………………………………………………………………………1 1.2 相關研究……………………………………………………………………2 1.2.1 音樂創作……………………………………………………………………2 1.2.2 人工智慧音樂創作…………………………………………………………5 1.3 背景知識………………………………………………………………………15 1.3.1 和弦……………………………………………………………………15 1.3.2 TSD 和弦分類……………………………………………………………16 1.3.3 和弦進行…………………………………………………………………17 1.3.4 聲部結構………………………………………………………………18 1.3.5 和弦內音與和弦外音……………………………………………………19 1.3.6 樂器數位介面……………………………………………………………21 1.3.7 音類……………………………………………………………………22 1.3.8 色度特徵………………………………………………………………23 1.3.9 旋律相似度……………………………………………………………24 1.4 論文架構………………………………………………………………………25 第二章、研究方法………………………………………………………………………26 2.1 資料庫預處理…………………………………………………………………26 2.2 機器學習神經網路………………………………………………………………28 2.2.1 遞歸神經網路………………………………………………………………28 2.2.2 長短期記憶模型……………………………………………………………30 2.2.3 雙向長短期記憶模型………………………………………………………31 2.3 用於比較之方法…………………………………………………………………33 第三章、實驗內容……………………………………………………………………………34 3.1 實驗模型與流程…………………………………………………………………34 3.2 資料庫與資料庫預處理…………………………………………………………36 3.2.1 資料庫……………………………………………………………………36 3.2.2 資料庫預處理……………………………………………………………37 3.2.3 資料庫歌曲所包含之和弦分析……………………………………………38 3.3 模型結構……………………………………………………………………39 3.3.1 LSTM 模型結構……………………………………………………………39 3.3.2 BLSTM 模型結構………………………………………………………39 3.4 系統之輸入資訊…………………………………………………………………40 3.5 後處理……………………………………………………………………………41 3.6 資料庫之音樂特徵分析…………………………………………………………42 3.7 篩選器…………………………………………………………………………47 3.8 比較模型………………………………………………………………………49 3.8.1 選用之比較模型……………………………………………………………49 3.8.2 過渡機率矩陣………………………………………………………………49 3.8.3 預測旋律後修飾……………………………………………………………51 第四章、研究結果與討論……………………………………………………………………52 4.1 模型訓練結果……………………………………………………………………52 4.1.1 長短期記憶模型收斂曲線…………………………………………………52 4.1.2 雙向長短期記憶模型收斂曲線……………………………………………53 4.1.3 隱馬爾可夫模型之過渡機率矩陣…………………………………………54 4.2 旋律生成結果展示……………………………………………………………54 4.2.1 使用長短期記憶模型之旋律生成結果……………………………………54 4.2.2 使用雙向長短期記憶模型之旋律生成結果………………………………58 4.2.3 使用隱馬爾可夫模型之旋律生成結果……………………………………60 4.3 用戶主觀性測試…………………………………………………………………61 4.3.1 受測者背景…………………………………………………………………61 4.3.2 使用於主觀性測試之和弦進行……………………………………………63 4.3.3 問卷設計…………………………………………………………………63 4.3.4 主觀性測試檢定……………………………………………………………64 4.3.5 主觀性測試結果分析………………………………………………………65 第五章、結論………………………………………………………………………………73 5.1 結論………………………………………………………………………………73 5.2 後續研究與未來展望……………………………………………………………74 參考文獻……………………………………………………………………………………75 附件一 訓練集資料庫樂譜示………………………………………………………………79 附件二 主觀性測試問卷範例音樂…………………………………………………………82 附件三 訓練之損失函數……………………………………………………………………91 附件四 過渡機率矩陣………………………………………………………………………92

    參考文獻

    ÖzcanEnder, & ErçalTürker. (2007). A Genetic Algorithm for Generating Improvised Music. International Conference on Artificial Evolution (Evolution Artificielle), (頁 266-277).
    张苾荍, & 韩圣龙 . (2012). 基于色度特征和动态时间卷曲算法的音频与乐谱对位. 现代图书情报技术.
    宫富艺. (2012). 音乐作品中调内和弦的辨析. 山东艺术学院学报, 頁 25-34.
    A. Graves, & M. Liwicki. (2009). A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence.
    A. Prechtl, R. Laney, A. Willis, & R. Samuels. (2014). Algorithmic music as intelligent game music. Proceedings of the AISB Anniversary Convention.
    A. Schoenberg, G. Strang, & L. Stein. (1967). Fundamentals of Musical Composition. Faber & Faber.
    AmesCharles. (1989). The Markov Process as a Compositional Model: A Survey and Tutorial. Leonardo 22(2), 頁 175-187.
    Andres E. Coca, Debora C. Correa, & Liang Zhao. (2013). Computer-Aided Music Composition with LSTM Neural Network and Chaotic Inspiration. International Joint Conference on Neural Networks (IJCNN).
    B. Pearlmutter. (1995). Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Network.
    BalliauwMatteo , HerremansDorien , CuervoPalhaziDaniel , & SörensenKenneth . (2015). Generating Fingerings for Polyphonic Piano Music with a Tabu Search Algorithm. Lecture Notes in Computer Science.
    BigandEmmanuel , & ParncuttRichard . (1999). Perceiving musical tension in long chord sequences. Psychological Research, 頁 237-254.
    BrittinV.Ruth , & Duke A. Robert. (1997). Continuous versus Summative Evaluations of Musical Intensity: A Comparison of Two Methods for Measuring Overall Effect. Journal of Research in Music Education.
    ChenC.J., & MiikkulainenR. (2001). Creating melodies with evolving recurrent neural networks. International Joint Conference on Neural Networks, 頁 2241-2246.
    ChoiKeunwoo , FazekasGeorge , & Sandler Mark . (2016). Automatic tagging using deep convolutional neural networks. ISMIR 2016.
    CocaE.A., CorrêaC.D., & ZhaoL. (2013). Computer Aided Music Composition with LSTM Neural Network and Chaotic Inspiration. The 2013 International Joint Conference on Neural Networks (IJCNN), (頁 1-7).
    CocaE.A., RomeroA.F.R., & ZhaoL. (2011). Generation of composed musical structures through recurrent neural networks based on chaotic inspiration. The 2011 International Joint Conference on Neural Networks, (頁 3220-3226).
    CronbachJ.L. (1951). COEFFICIENT ALPHA AND THE INTERNAL STRUCTURE OF TESTS.
    Diaz-Jerez. (2000). Algorithmic music: Using mathematical models in music composition. The Manhattan School of Music.
    DongHW , Hsiao WY , YangLC , & YangYH. (2018). Musegan Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. Thirty-Second AAAI Conference on Artificial Intelligence. AAAI Publications.
    EckD., & SchmidhuberJ. (2002). Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks. Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing, 頁 747-756.
    F. Gers. (2001). Long Short-Term Memory in Recurrent Neural Networks.
    FernándezDavidJose, & VicoFrancisco. (2013). AI Methods in Algorithmic Composition: A Comprehensive Survey.
    FredricksonE. William . (1999). Effect of Musical Performance on Perception of Tension in Gustav Hoist's First Suite in E-flat. Journal of Research in Music Education.
    G. Diaz-Jerez. (2000). Algorithmic music: Using mathematical models in music composition. Ph.D. dissertation, The Manhattan School of Music.
    Geraint Anthony Wiggins. (1999). AI methods for algorithmic composition: A survey, a critical view and future prospects. Scotland.
    GersA.F., & SchmidhuberE. (2001). LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 頁 1333-1340.
    GersA.F., SchmidhuberJ., & CumminsF. (1999). Learning to forget: continual prediction with LSTM. IET Conference Proceedings, (頁 850-855).
    GravesAlex, & SchmidhuberJürgen. (無日期). Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Neural Information Processing Systems (NIPS).
    H. Sak, A. Senior, & F. Beaufays. (2014). Long Short- Term Memory Based Recurrent Neural Network Architectures for Large V ocabulary Speech Recognition. CoRR.
    HadjeresGaëtan , PachetFrançois , & NielsenFrank . (2017). DeepBach a Steerable Model for Bach Chorales Generation. Proceedings of the 34th International Conference on Machine Learning, (頁 1362-1371).
    HitoshiIba, & NaoTokui. (2000). Music Composition with Interactive Evolutionary Computation. Proceedings of the third international conference.
    Hyungui Lim, Seungyeon Rhyu, & Kyogu Lee. (2017). Chord Generation from Symbolic Melody Using BLSTM Networks. ISMIR.
    I. Liu, & B. Ramakrishnan. (2014). Bach in 2014: Music Composition with Recurrent Neural Network.
    Jiang Minjun , & Zhou Changle . (2010). Automated composition system based on GA. IEEE International Conference on Intelligent Systems and Knowledge Engineering, (頁 380-383).
    Jose David Fernández, & Francisco Vico. (2013). AI Methods in Algorithmic Composition: A Comprehensive Survey (第 48 冊). Spain: Journal of Artificial Intelligence Research .
    K. Choi, G. Fazekas, & M. Sandler. (2016). Text-based LSTM Networks for Automatic Music Composition.
    Klumpenhouwer, & Henry. (1994). Some Remarks on the Use of Riemann Transformations.
    Likert, & Rensis. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology.
    LimHyungui , Rhyu Seungyeon , & LeeKyogu . (2017). Chord Generation from Symbolic Melody Using BLSTM Networks. ISMIR 2017.
    LiuC., & TingC. (2015). Music pattern mining for chromosome representation in evolutionary composition. IEEE Congress on Evolutionary Computation (CEC), 頁 2145-2152.
    LiuI-Ting, & RamakrishnanBhiksha . (2014). BACH IN 2014 MUSIC COMPOSITION WITH RECUR RENT NEURAL NETWORK. ICLR 2015.
    M. Allan, & C. Williams. (2005). Harmonizing Chorales by Probabilistic Inference. Advances in Neural Information Processing Systems.
    M. Miller. (2005). The Complete Idiot’s Guide to Music Composition. Alpha.
    M. Wöllmer, A. Metallinou, F. Eyben, & B. Schuller. (2010). Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression Using Bidirectional LSTM Modeling. Interspeech.
    ManarisBill , HughesDana , & VassilandonakisYiorgos . (2011). Monterey Mirror: Combining Markov Models, Genetic Algorithms, and Power Laws. IEEE Congress on Evolutionary Computation (CEC 2011), (頁 33-40).
    MattE. (2019). What is Tension and Release in Music? (and how do you create it?). 擷取自 schoolofcomposition: https://www.schoolofcomposition.com/what-is-tension-and-release-in-music/
    Palmer, Manus, & Lethco. (1994). The Complete Book of Scales, Chords, Arpeggios and Cadences. US: Alfred Pub Co.
    PapadopoulosGeorge, & WigginsGeraint. (2000). A Genetic Algorithm for the Generation of Jazz Melodies.
    ParkH. , & YooC. D. . (2017). Melody extraction and detection through LSTM-RNN with harmonic sum loss. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 頁 2766-2770.
    PrechtlAnthony, LaneyRobin, WillisAlistair, & SamuelsRobert . (2014). Algorithmic music as intelligent game music. Proceedings of the AISB Anniversary Convention.
    RameauP. J. . (1722). Traité de l'harmonie réduite à ses principes naturels.
    RooksbRikky. (2004). Melody: How to Write Great Tunes. US: Backbeat.
    S. A. Raczyński, S. Fukayama, & E. Vincent. (2013). Melody Harmonization with Interpolated Probabilistic Models. Journal of New Music Research.
    S. Hochreiter, Y. Bengio, P. Frasconi, & J. Schmi. (2001). Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Networks, IEEE Press.
    S. Sertan, & P. Chordia. (無日期). Modeling melodic improvisation in Turkish folk music using variable-length Markov models. Proceedings of the International Society for Music Information Retrieval Conference .
    SakHaşim , SeniorAndrew, & BeaufaysFrançoise. (2014). Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition.
    SalvadorStan , & ChanPhilip. (2007). Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 頁 561-580.
    SchusterM. , & PaliwalK. K. (1997). Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 頁 2673-2681.
    SLOBODAA. JOHN , & LEHMANNC. ANDREAS . (2001). Tracking Performance Correlates of Changes in Perceived Intensity of Emotion During Different Interpretations of a Chopin Piano Prelude. Music Perception, 頁 87–120.
    SteedmanJ. Mark. (1984). A Generative Grammar for Jazz Chord Sequences. Music Perception: An Interdisciplinary Journal, 頁 52-77.
    StrausN. Joseph . (2004). Introduction to Post-Tonal Theory (3rd Edition). Pearson.
    VargasV.F., FusterA.J., & CastañónBeltránC. (2015). Artificial musical pattern generation with genetic algorithms. Latin America Congress on Computational Intelligence (LA-CCI), 頁 1-5.
    WöllmerM., MetallinouA., & EybenF. (2010). Context-Sensitive Multimodal Emotion Recognition from Speech and Facial Expression using Bidirectional LSTM Modeling. INTERSPEECH, 頁 2362-2365.
    YiikselA. Ç. , Karci M. M., & UyarA. Ş. . (2011). Automatic music generation using evolutionary algorithms and neural networks. International Symposium on Innovations in Intelligent Systems and Applications, 頁 354-358.
    YuYi , & CanalesSimon . (2019). Conditional LSTM-GAN for Melody Generation from Lyrics.
    大禾音樂製作編輯部. (2019). 音樂製作工具書. 台北市: 大禾音樂製作有限公司.
    吳源鈁. (2017). 和聲學. 大陸書店.
    樊祖荫. (2002). 和聲外音与旋律化音型. 校园歌声.
    蔡振家. (2013). 音樂認知心理學. 臺大出版中心.
    蔡振家. (2013). 音樂認知心理學. 國立臺灣大學出版中心.

    QR CODE