研究生: |
吳晨瑋 Wu, Chen Wei |
---|---|
論文名稱: |
生活聲響之自動辨認 Automatic Recognition of Life Sounds |
指導教授: |
劉奕汶
Liu, Yi Wen |
口試委員: |
陳倩瑜
張智星 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 73 |
中文關鍵詞: | 梅爾倒頻譜 、高斯混合模型 、Log Likelihood Ratio Test 、聲音辨識 、非語音 、不予辨認 |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在人的日常生活環境中有許多種不同的聲音,不論是語音還是非語音,我們可以藉由聲音的特質透過人耳來辨識出聲音,判斷周遭的情形。隨著科技的進步,聲音辨識已經是逐漸實用化的技術,尤其在語音辨識上。聲音辨識現今也逐漸融入居家安全中,但不論使用者的身分年齡地位,在居家中可能會出現屬於危急且非語音的聲音。由於以往聲音辨識大都著重在語音及語者的辨識上,這時候生活聲響的聲音辨識則顯得重要。若能針對人們在住宅中可能發生的任何危急聲音資訊做分類及辨識,除了對於分析周遭的情境有很大的幫助以外,亦可增加獨立生活的安全感。
在本論文中,實驗用的音檔部分我們收集了八類共三百七十二筆音檔,分別平均拆成八類186筆音檔當訓練之資料庫,另外的八類186筆音檔當測試之資料庫,來研究一般環境下與含雜訊情況下的辨認方法與開發。特徵萃取主要使用梅爾倒頻譜係數(Mel-scale Frequency Cepstral Coefficients, MFCC)以及感知特徵(Perceptual Feature)萃取音檔的特徵向量。分類器是使用高斯混和模型(Gaussian Mixture Model, GMM)的方式來做為前端,並且增加異常排除(Outlier Rejection)的機制,使用似然比檢驗(Likelihood Ratio Test, LRT)為基礎,將測試音檔與非資料庫音檔分別與資料庫做模型比對,以防止非資料庫音檔被強制誤判。
本論文使用了三個研究方法,分別為變異數加平均值(Variance-Mean)、音框投票(Frame Vote)、代表性音框投票(Selected Frame Vote)來各別進行生活聲響檔案之分類。目前針對資料庫與測試音檔的比對,使用三個研究方法裡在一般環境下最好可以達到96.24%的辨識正確率,另外我們對於雜訊與回音之強健性也進行了完整的評估。在異常排除機制的部分,收集了非資料庫共120筆音檔來進行實驗,整體錯誤率最好可降低至19%。另外又找了非資料庫共100筆音檔再次實驗異常排除,錯誤率最好可降低至23%。
There are many kinds of different sounds in human daily lives. Whether it is speech or non-speech, we can recognize the sounds by characteristic sounds through the human ears and realize what is happening around us. With technical advances, the identification of the sound has become a practical technology gradually, especially in the speech recognition. The recognition of sound has gradually got into home safety. Regardless of the user's age or status, emergency can happen at home, accompanied by non-speech sounds. In the past, the recognition of the sound mostly focused on the voice and the speaker. If it is possible to classify and recognize any sound that indicates dangerous situations in the house, that will help analyze the scenario and increase people’s sense of security while living alone.
In this paper, we have collected eight classes of audio files, 372 files in total for experiments. The files were equally divided into training and testing datasets. We use them to develop methods for sound recognition in normal or noisy situations. As for feature extraction, the feature vector consists of Mel-scale Frequency Cepstral Coefficients (MFCC) and Perceptual Features. Gaussian mixture model (GMM) is used as the front-end in the classifier, and an outlier rejection mechanism is added to it. The outlier rejection mechanism is based on Likelihood Ratio Test (LRT), which compares the test audio files and non-dataset files respectively with dataset. That way, we can prevent the non-dataset audio files from being enforced to recognize by mistake.
In this paper, we use three methods to classify the audio files: the variance-mean method, the frame-vote method, and the selected frame-vote method. At the present time for the comparison of the dataset and the test audio files, the methods can reach 96.24% of recognition accuracy at best in the normal situation. In addition, we make a complete evaluation for the robustness against noise and echoes. As for the outlier rejection mechanism, we have collected a total of 120 non-dataset audio files to experiment on it, and the overall error rate can be reduced to 19%. What is more, we found a total of 100 non-dataset audio files to experiment on it again, and the overall error rate can be reduced to 23%.
1. 王小川, "語音訊號處理". 2009: 全華科技圖書有限公司.
2. 張智星. "音訊處理與辨識". 網路線上課程. available from: http://www.cs.nthu.edu.tw/~jang.
3. 顏銘祥, "以DSP Base為架構的不特定語句即時語者辨識系統", in 電機工程學系研究所2004, 國立中山大學: 高雄市.
4. Wong, E.; Sridharan, S., "Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification," in Intelligent Multimedia, Video and Speech Processing, 2001. Proceedings of 2001 International Symposium on. 2001. p. 95-98
5. Wang, J.-C.; Lee, H.-P.; Wang, J.-F.; Lin, C.-B., "Robust Environmental Sound Recognition for Home Automation," Automation Science and Engineering, IEEE Transactions on, 2008. 5(1): p. 25-31.
6. 林財貝, "應用機率型SVMs與ICA於以內容為基礎音訊分類之研究", in 電機工程學系碩博士班2006, 國立成功大學: 台南市.
7. Li, S.-Z., "Content-based audio classification and retrieval using the nearest feature line method," Speech and Audio Processing, IEEE Transactions on, 2000. 8(5): p. 619-625.
8. Sakoe, H.; Chiba, S., "Dynamic programming algorithm optimization for spoken word recognition," Acoustics, Speech and Signal Processing, IEEE Transactions on, 1978. 26(1): p. 43-49.
9. Rabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition". Proceedings of the IEEE, 1989. 77(2): p. 257-286.
10. Reynolds, D.A.; Rose, R.C., "Robust text-independent speaker identification using Gaussian mixture speaker models," Speech and Audio Processing, IEEE Transactions on, 1995. 3(1): p. 72-83.
11. Hsu, C.-W.; Lin, C.-J., "A comparison of methods for multiclass support vector machines," Neural Networks, IEEE Transactions on, 2002. 13(2): p. 415-425.
12. Cortes, C.; Vapnik, V.N., "Support-Vector Networks," Mach. Learn., 1995. 20(3): p. 273-297.
13. 游智翔, "整合高斯混合與具性能指標支撐向量機模型之語者確認研究", in 電機工程研究所2008, 國立中央大學: 桃園縣.
14. "Likelihood-ratio test," available from: http://en.wikipedia.org/wiki/Likelihood-ratio_test.
15. Hossan, M.A.; Memon, S.; Gregory, M.A., "A novel approach for MFCC feature extraction," in Signal Processing and Communication Systems (ICSPCS), 2010 4th International Conference on. 2010. p. 1-5
16. Piater, J.-H., "Mixture Models and Expectation-Maximization," Lecture at ENSIMAG, May 2002, revised Nov. 2005.
17. 林智仁, "LIBSVM -- A Library for Support Vector Machines," available from:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html