基於支持向量機之混合聲響辨認｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	何育澤 Ho, Yu-Tse
論文名稱：	基於支持向量機之混合聲響辨認 Methods for Recognizing Concurrent Sounds Based on Support Vector Machines
指導教授：	劉奕汶 Liu, Yi-Wen
口試委員:	吳尚鴻康仕仲李沛群
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2014
畢業學年度：	102
語文別：	中文
論文頁數：	61
中文關鍵詞：	聲響辨認、支持向量機
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

聲響在生活中扮演重要的角色，若我們可以利用電腦針對居家生活聲響做辨認，就能利用這些資訊了解生活狀態，假設有異常聲響發生，人們可以立即獲得通知。
本論文研究利用樣型識別進行聲響辨認，分為訓練和測試兩部分，訓練過程中分析聲響的聲學特徵並產生特徵向量，搭配聲響種類標籤進而訓練出分類器；測試過程則同樣分析產生特徵向量，再使用分類器分辨未知聲響。過去研究對於單一聲響辨認訓練用的聲響通常只使用對應到欲辨認的單一聲響種類，然而，當應用推向混合聲響辨認時，此方法容易受到混合聲響混淆而導致辨認能力不佳。處理混合聲響辨認的方法可以把各種混合聲響分別視為各種新的單一聲響，其餘流程與單一聲響相同，然而，這種方法會使得聲響種類暴增，辨認過程複雜。本論文根據多標籤分類法(Multi-labeled Classification)的概念，訓練分類器時同時使用單一聲響以及混合聲響，利用支持向量機(Support Vector Machines, SVMs)是二元分類器的特性，以是非題的概念進行辨認，換句話說，系統會對每類單一聲響建立模型，辨認過程就是在檢測未知聲響是否含有這些單一聲響。
本論文比較傳統訓練方法和多標籤訓練方法對於單一聲響、混合聲響的辨認能力，實驗結果顯示：當應用於單一聲響辨認時，傳統方法的Precision較高、多標籤方法則Recall較高；對於處理混合聲響則多標籤方法在各方面之辨認能力都優於傳統方法。

摘要…………………………………………………………………………….………i
Abstract………………………………………………………………………………..ii
誌謝…………………………………………………………………………….……..iv
第一章    緒論……………………………..……………………………………………1
1.1.    研究動機……………………………………………………………………1
1.2.    自動聲響辨認簡介及文獻回顧……………………………………………2
1.3.    研究方向……………………………………………………………………4
1.4.    章節介紹……………………………………………………………………6
第二章    音訊辨認演算法……………………………………………………………..7
2.1.    端點偵測……………………………………………………………………7
2.2.    特徵萃取……………………………………………………………………8
2.3.    語音活動檢測……………………………………………………………..12
2.4.    支持向量機………………………………………………………………..13
2.5.    多元分類…………………………………………………………………..23
2.6.    效能評估…………………………………………………………………..25
第三章    分析與討論…………………………………………………………………27
3.1.    特徵挑選…………………………………………………………………..29
3.2.    單一聲響辨認……………………………………………………………..31
3.3.    混合聲響辨認……………………………………………………………..37
3.4.    長時間音檔測試…………………………………………………………..52
第四章    結論與未來展望……………………………………………………………55
參考文獻……………………………………………………………………………..57
                                

[1] A. Temko and C. Nadeu, “Acoustic event detection in meeting-room environments,” Pattern Recognit. Lett., vol. 30, no. 14, pp. 1281–1288, Oct. 2009.
[2] A. H. Kam and L. Shue, “An Automatic Acoustic Bathroom Monitoring System,” 2005 IEEE Int. Symp. Circuits Syst., vol. 2, pp. 1750–1753, 2005.
[3] A. J. Eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund, T. Sorsa, G. Lorho, and J. Huopaniemi, “Audio-based context recognition,” IEEE Trans. Audio, Speech Lang. Process., vol. 14, no. 1, pp. 321–329, Jan. 2006.
[4] T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, “Context-dependent sound event detection,” EURASIP J. Audio, Speech, Music Process., vol. 2013, no. 1, p. 1, Jan. 2013.
[5] D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events: An IEEE AASP challenge,” IEEE Work. Appl. Signal Process. to Audio Acoust., pp. 1–4, Oct. 2013.
[6] T. Kristjansson, S. Deligne, and P. Olsen, “Voicing features for robust speech detection,” INTERSPEECH, pp. 369–372, 2005.
[7] J. A. Smith, J. E. Earis, and A. A. Woodcock, “Establishing a gold standard for manual cough counting: video versus digital audio recordings,” Cough, vol. 2, no. 1, p. 6, Jan. 2006.
[8] C. Clavel, T. Ehrette, and G. Richard, “Events Detection for an Audio-Based Surveillance System,” IEEE Int. Conf. Multimed. Expo, pp. 1306–1309, 2005.
[9] 羅祥友, “基於基頻與倍頻結構之語音偵測研究,” 國立清華大學, 2011.
[10] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, 1989.
[11] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech Audio Process., vol. 3, no. 1, pp. 72–83, 1995.
[12] C.-W. Wu and Y.-W. Liu, “Event-related sounds in residential environment: Classification and outlier rejection,” in National Computer Symposiums, Taichung, Taiwan, 2013.
[13] L. Chen, S. Gunduz, and M. Ozsu, “Mixed Type Audio Classification with Support Vector Machine,” IEEE Int. Conf. Multimed. Expo, pp. 781–784, Jul. 2006.
[14] H. C. Bao and Z. C. Juan, “The research of speaker recognition based on GMM and SVM,” Int. Conf. Syst. Sci. Eng., pp. 373–375, Jun. 2012.
[15] C.-H. Lee, C.-C. Han, and C.-C. Chuang, “Automatic Classification of Bird Species From Their Sounds Using Two-Dimensional Cepstral Coefficients,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 16, no. 8, pp. 1541–1550, Nov. 2008.
[16] X. Zhuang, X. Zhou, M. A. Hasegawa-Johnson, and T. S. Huang, “Real-world acoustic event detection,” Pattern Recognit. Lett., vol. 31, no. 12, pp. 1543–1551, Sep. 2010.
[17] H. Bhavsar and A. Ganatra, “A Comparative Study of Training Algorithms for Supervised Machine Learning,” Int. J. Soft Comput. Eng., vol. 2, no. 4, pp. 74–81, 2012.
[18] A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. 1994.
[19] D. Wang and G. J. Brown, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press, 2006.
[20] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A blind source separation technique using second-order statistics,” IEEE Trans. Signal Process., vol. 45, no. 2, pp. 434–444, 1997.
[21] J.-F. Cardoso, “Blind signal separation: statistical principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, 1998.
[22] E. Vincent, M. Jafari, and S. Abdallah, “Blind audio source separation,” 2005.
[23] P. Comon, “Independent component analysis, A new concept?,” Signal Processing, vol. 36, no. 3, pp. 287–314, Apr. 1994.
[24] A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural Networks, vol. 13, no. 4–5, pp. 411–430, Jun. 2000.
[25] D. D. Lee and H. S. Seung, “Algorithms for Non-negative Matrix Factorization,” Adv. Neural Inf. Process. Syst., pp. 556–562, 2001.
[26] P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for polyphonic music transcription,” IEEE Work. Appl. Signal Process. to Audio Acoust., pp. 177–180, 2003.
[27] A. W. Michael A. Casey, “Separation of Mixed Audio Sources By Independent Subspace Analysis,” Int. Comput. Music Conf. Proc., pp. 154–161, 2000.
[28] T. Virtanen, “Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 3, pp. 1066–1074, Mar. 2007.
[29] G. D. Forney, “The viterbi algorithm,” Proc. IEEE, vol. 61, no. 3, pp. 268–278, 1973.
[30] S. Godbole and S. Sarawagi, “Discriminative methods for multi-labeled classification,” Adv. Knowl. Discov. Data Min., pp. 22–30, 2004.
[31] Z. W. R. Alicja Wieczorkowska, Piotr Synak, “Multi-label classification of emotions in music,” Adv. Soft Comput., pp. 307–315, 2006.
[32] M. A. Hossan, S. Memon, and M. A. Gregory, “A novel approach for MFCC feature extraction,” Int. Conf. Signal Process. Commun. Syst., pp. 1–5, Dec. 2010.
[33] X. Yang, B. Tan, J. Ding, J. Zhang, and J. Gong, “Comparative Study on Voice Activity Detection Algorithm,” Int. Conf. Electr. Control Eng., pp. 599–602, Jun. 2010.
[34] M. Moattar and M. Homayounpour, “A simple but efficient real-time voice activity detection algorithm,” EUSIPCO. EURASIP, pp. 2549–2553, 2009.
[35] C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Min. Knowl. Discov., vol. 2, no. 2, pp. 121–167, Jun. 1998.
[36] C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995.
[37] R. Fletcher, Practical methods of optimization; (2nd ed.). Wiley-Interscience, 1987.
[38] H. W. Kuhn and A. W. Tucker, “Nonlinear Programming,” Proc. Second Berkeley Symp. Math. Stat. Probab., pp. 481–492, 1951.
[39] “非線性支持向量機器 (Non-linear SVMs) | 逍遙文工作室 on WordPress.com.”
[Online]. Available: http://cg2010studio.wordpress.com/2012/05/20/非線性支持向量機器-non-linear-svms/. [Accessed: 17-Jun-2014].
[40] J. Mercer, “Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations,” Philos. Trans. R. Soc. A Math. Phys. Eng. Sci., vol. 209, no. 441–458, pp. 415–446, Jan. 1909.
[41] C. Hsu, C. Chang, and C. Lin, “A practical guide to support vector classification,” vol. 1, no. 1, pp. 1–16, 2003.
[42] C. Chang and C. Lin, “LIBSVM: a library for support vector machines,” ACM Trans. Intell. Syst. Technol., pp. 1–39, 2011.
[43] A. Temko, C. Nadeu, and J. Biel, “Acoustic event detection: SVM-based system and evaluation setup in CLEAR’07,” Multimodal Technol. Percept. Humans, pp. 354–363, 2008.
[44] C. L. Tseng, Y. H. Chen, S. C. Chuang, and H. C. Fu, “Cluster-based support vector machines in text-independent speaker identification,” IEEE Int. Jt. Conf. Neural Networks, vol. 1, pp. 729–734, 2004.
[45] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio, “Why Does Unsupervised Pre-training Help Deep Learning?,” J. Mach. Learn. Res., vol. 11, pp. 625–660, Mar. 2010.
[46] M. Baillie and J. Jose, “Audio-based event detection for sports video,” Image Video Retr., pp. 61–65, 2003.
[47] J. Liu, E. Johns, and G.-Z. Yang, “A scene-associated training method for mobile robot speech recognition in multisource reverberated environments,” IEEE/RSJ Int. Conf. Intell. Robot. Syst., pp. 542–549, Sep. 2011.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文