簡易檢索 / 詳目顯示

研究生: 洪琳
Hung, Lin
論文名稱: 基於RBM模型以非監督式學習歸納環境中的聲響
Unsupervised sound summarization from an environment based on the Restricted Boltzmann Machine
指導教授: 劉奕汶
Liu, Yi Wen
口試委員: 徐嘉連
Hsu, Jia Lien
林嘉文
Lin, Chia Wen
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2016
畢業學年度: 105
語文別: 英文
論文頁數: 37
中文關鍵詞: 聲響歸納非監督式
外文關鍵詞: RBM, sound summarization
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 機器聽覺在現在一些人機互動的應用中是一項不可或缺的技術,而希望電腦能夠像人腦一樣具有學習及辨識的能力,同樣是近幾年來熱門的議題。在眾多的機器學習演算法中,類神經網路已經被廣泛且有效的應用在各個不同的領域,像是機器視覺、語音辨識等等。
    想像我們到一個新的環境,在沒有已知的聲響標籤之下,要如何利用機器學習的方式,讓使用者知道一段時間中最常發生的聲響,因為沒有已經標籤過類別的資料讓我們事先訓練一個聲響辨識的系統,所以需要結合不同的非監督式學習演算法做聲響的歸納或總結,這類非監督式學習的應用,在其他機器聽覺的研究中比較少見,在提出這個想法的同時,也嘗試使用類神經網路和其他分群的演算法,總結環境中反覆出現的聲響。
    在本論文的模擬實驗中,我們錄製10多分鐘的音檔,其中包含室內環境中的常見的交談聲或其他物品發出的聲響,並設計兩種目標警示聲,其發生時間皆小於總時長的10% , 將經過聲響偵測後的音訊做傅立葉轉換並通過梅爾係數帶通濾波器得到的梅爾頻譜當作聽覺頻譜特徵,用類神經網路中的限制波爾茲曼機(Restricted Boltzmann Machine) 作為訓練模型,最後再利用分群演算法成功歸納出環境中常出現的幾類聲音頻譜,使用者即可以透過聆聽這些類別的聲音得到其中兩種常出現的警示音,同時了解環境中常出現甚麼樣的聲響。


    Machine listening plays an important role in machine-human interaction applications recent years. The prospect of making the computer to imitate the learning ability of human brain also became a popular issue with the rise of neural networks. Imagine that we go to a new place where labeled sound data is not available. How to let the users know what sound events happen frequently in a period of time by applying machine learning methods? These kinds of unsupervised learning applications are relatively rare in other machine listening research. We proposed this idea and also try to use neural networks and other unsupervised algorithms to summarize sound events that happen repeatedly in a place. In the simulation experiments of our thesis, we take self-recorded audio including common indoor sounds such as people talking and object collision sounds. Two electrical alarm sounds are also designed as target sound events, which the duration of each event is less than 10% of the total recording time. Frist, we take the sound signal and apply Fourier transform, then pass through the Mel-frequency filter bank to obtain Mel-spectrogram as our feature. Restricted Boltzmann machine of neural networks is chosen as our training model. Finally, we use clustering algorithm and successfully summarize the spectrogram that happens repeatedly. The user can distinguish the two target sound events through listen to the summarized sound events.

    摘要 i ABSTRACT ii CONTENTS iii LIST OF FIGURES v LIST OF TABLES vii Chapter 1 Introduction 1 1.1 Motivation 2 1.2 Main Contribution 4 1.3 Thesis Organization 4 Chapter 2 Method 6 2.1 Feature extraction 6 2.1.1 End point detection (EPD) 7 2.1.2 Auditory Spectral Features (ASF) 7 2.2 Neural Network 9 2.3 Restricted Boltzmann machine 10 2.3.1 Gibbs sampling in RBM [11] 12 2.3.2 Contrastive Divergence (CD) 12 2.4 Iterative Self-organizing Data Analysis (ISODATA) [13] 15 2.5 Similarity comparison 17 Chapter 3 Experiment and Discussion 19 3.1 Dataset 19 3.1.1 Electronic Sound Data 19 3.1.2 Record settings 19 3.2 System implementation and experimental results 20 3.2.1 Feature extraction 20 3.2.2 Restricted Boltzmann machine (RBM) 21 3.2.3 Unsupervised clustering 24 3.3 Weight reconstruction method 25 3.3.1 Pitch estimation 26 3.3.2 Inverse Mel-Spectrogram 28 Chapter 4 Conclusion and Future Work 33 4.1 Conclusion 33 4.2 Future work 34 REFERENCE 36

    [1] W. S. McCulloch, W. Pitts, “A logical calculus of ideas immanent in nervous activity,” Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943.
    [2] G. E. Hinton, P. Sejnowski, J. Terrence, “A learning algorithm for Boltzmann machines,” Cognitive science, 9 (1): 147–169, 1985.
    [3] P. Smolensky, “Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group, (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1, Foundations (pp. 194–281). Cambridge, MA: MIT Press. 1986.
    [4] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Comput., vol. 14, pp. 1771–1800, 2002.
    [5] G. E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.
    [6] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng. “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” In International Conference on Machine Learning, vol. 22, pp. 1096-1104, 2009.
    [7] H. Lee, P. Pham, Y. Largman, A. Ng, A. Culotta, et al., "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Advances in Neural Information Processing Systems 22, pp. 1096-1104, 2009.
    [8] F. Eyben, S. Bock, B. Schuller, and A. Graves, “Universal onset detection with bidirectional long short-term memory neural networks,” in International Society for Music Information Retrieval, pp. 589–594, 2010.
    [9] A. Fischer and C. Igel, “An introduction to restricted Boltzmann machines” in Proceedings of the 17th Iberoamerican Congress on Pattern Recognition CIARP LNCS, vol. 7441, pp. 14-36, 2012, Springer.
    [10] S. Haykin, Neural Networks: A Comprehensive Foundation, 1999, Prentice-Hall.
    [11] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Comput., vol. 14, pp. 1771-1800, 2002.
    [12] G. H. Ball and D. J. Hall, ISODATA A Novel Method of Data Analysis and Pattern Classification, 1965.
    [13] Free sound - http://www.freesound.org/
    [14] G. E. Hinton, “A practical guide to training restricted Boltzmann machines,” Machine Learning Group University of Toronto Tech. Rep. 2010-003, 2010.
    [15] Bo-Min Chen, “Sound reconstruction based on features for sound recognition,” National Tsing Hua University Department of Electrical Engineering, Thesis, 2015.
    [16] Frey, Brendan J., and Delbert Dueck. “Clustering by passing messages between data points,” science 315.5814: pp.972-976, 2007.
    [17] M. Abe and J. Smith, “Design criteria for simple sinusoidal parameter estimation based on quadratic interpolation of FFT magnitude peaks,” Audio Engineering Society Convention 117, 2004.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE