基於調制頻譜向量之環境聲響事件分類｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	廖俊祺 Liao, Jyun-Ci
論文名稱：	基於調制頻譜向量之環境聲響事件分類 Environmental Sound Event Classification Based on Modulation Spectral Vectors
指導教授：	劉奕汶 Liu, Yi-Wen
口試委員:	黃元豪 Huang, Yuan-Hao 黃朝宗 Huang, Chao-Tsung
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2017
畢業學年度：	105
語文別：	中文
論文頁數：	48
中文關鍵詞：	調制頻譜向量、噪音訓練、環境聲響事件、高斯混合模型
外文關鍵詞：	modulation spectral vectors, noisy training, environment sound event, gmm
相關次數：	點閱：167 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

高斯混和模型運用在語音、聲響辨識系統方展成熟，然而在高度的環境背景雜訊，其辨識效果會大幅下降。本論文提出結合短時間與長時間的特徵萃取向量，提昇環境在高度背景雜訊下的辨識率。短時間特徵向量採用梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCCs)，長時間特徵係數採用調制頻譜向量(Modulation spectral vectors, MSVs)，調制頻譜特徵向量可以萃取訊號在頻率域的能量包絡，能量包絡的特性能夠有效的抵抗環境雜訊的干擾。
為了讓系統對於雜訊更加強健，本論文提一種訓練方式，在訓練的過程中，高斯混合模型就先看過含有雜訊的資料，這種方式有助於提升在低訊雜比的訊號辨識率。本論文進行辨識8種類別的室內環境聲響事件，在訊雜比 0 dB 的環境下，辨識率達八成以上。

The Gaussian mixture model (GMM) has developed well both in the speech and sound recognition, but it does not perform well in the high background noisy environment. This thesis proposes a method combining short-term and long-term features to overcome this issue. Here the short-term features are Mel-frequency cepstral coefficients (MFCCs) and the long-term features are the modulation spectral vectors (MSVs) calculated in the frequency domain. The MSVs contains the envelope message of signals which is a good feature against high noise.
For robustness against noise, this thesis proposes a method to learn noisy data while training on GMMs. This method could raise the recognition accuracy in the low singal-to-noise ratio (SNR) case. The method was evaluated on a database which consists of 8 different indoor sound event classes. It achieves > 80 % accuracy at 0 dB SNR.

摘要 [iii]
Abstract [v]
誌謝 [vii]

{1}緒論 [1]
{1.1}研究動機 [1]
{1.2}文獻回顧 [2]
{1.3}研究方向 [4]

{2}系統架構與方法 [5]
{2.1}訊號預處理與噪音訓練 [6]
{2.2}梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCCs) [7]
{2.3}調制頻譜向量(Modulation Spectral Vectors, MSVs) [12]
{2.3.1}調制頻譜向量單位(Unit of Modulation Spectral Vectors) [14]
{2.3.2}長時間向量萃取(Long-term Feature Extraction) [14]
{2.4}特徵向量分析 [16]
{2.4.1}自相關(Autocorrelation) [16]
{2.4.2}調制頻譜向量之和(Sum of Modulation Spectral Vectors) [17]
{2.5}高斯混合模型(Gaussian mixture models, GMMs) [20]
{2.5.1}模型描述 [20]
{2.5.2}模型參數的初始化 [20]
{2.5.3}期望值最大演算法(Expectation Maximization Algorithm, EM Algorithm) [21]
{2.5.4}高斯混合模型之訓練流程 [26]

{3}分析與討論 [29]
{3.1}單一聲響事件資料庫 [29]
{3.2}效能評估 [30]
{3.3}短時間與長時間特徵向量之比較 [31]
{3.4}噪音訓練之辨識結果 [33]
{3.5}調制頻譜向量之和之辨識結果 [37]
{3.6}雜訊種類之討論 [40]

{4}結論與未來展望 [45]

參考文獻 [47]

                                

[1] B. E. Kingsbury, N. Morgan, and S. Greenberg, “Robust speech recognition using the modulation spectrogram,” Speech communication, vol. 25, no. 1, pp. 117–132,
1998.
[2] G. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Transactions on speech and audio processing, vol. 10, no. 5, pp. 293–302, 2002.
[3] F. Morchen, A. Ultsch, M. Thies, and I. Lohken, “Modeling timbre distance with temporal statistics from polyphonic music,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 81–90, 2006.
[4] D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events,” IEEE Transactions on Multimedia, vol. 17, no. 10, pp. 1733–1746, 2015.
[5] D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley, “Acoustic scene classification: Classifying environments from the sounds they produce,” IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 16–34, 2015.
[6] M. H. Moattar and M. M. Homayounpour, “A review on speaker diarization systems and approaches,” Speech Communication, vol. 54, no. 10, pp. 1065–1103, 2012.
[7] M. Mckinney and J. Breebaart, “Features for audio and music classification,” in Proceedings of the International Symposium on Music Information Retrieval, pp. 151–158, 2003.
[8] M. A. Hossan, S. Memon, and M. A. Gregory, “A novel approach for mfcc feature extraction,” in Signal Processing and Communication Systems (ICSPCS), 2010 4th International Conference on, pp. 1–5, IEEE, 2010.
[9] S. Greenberg and B. E. Kingsbury, “The modulation spectrogram: In pursuit of an invariant representation of speech,” in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 3, pp. 1647–1650, IEEE, 1997.
[10] S. Sukittanon, L. E. Atlas, and J. W. Pitton, “Modulation-scale analysis for content identification,” IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 3023–3035, 2004.
[11] 何育澤, “基於支持向量機之混合聲響辦認,” 國立清華大學, 2014 年.
[12] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443–445, 1985.
[13] A. Oppenheim and R. Schafer, Discrete-time Signal Processing. Prentice-Hall signal processing series, Pearson, 2010.
[14] H. Hermansky, “Modulation spectrum in speech processing,” in Signal Analysis and Prediction, pp. 395–406, Springer, 1998.
[15] M. Markaki and Y. Stylianou, “Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features,” Speech Communication, vol. 53, no. 5, pp. 726–735, 2011.
[16] L. Atlas and S. A. Shamma, “Joint acoustic and modulation frequency,” EURASIP Journal on Advances in Signal Processing, vol. 2003, no. 7, p. 310290, 2003.
[17] G. Evangelopoulos and P. Maragos, “Multiband modulation energy tracking for noisy speech detection,” IEEE Transactions on audio, speech, and language processing, vol. 14, no. 6, pp. 2024–2038, 2006.
[18] J.-H. Bach, B. Kollmeier, and J. Anemüller, “Modulation-based detection of speech in real background noise: Generalization to novel background classes,” in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 41–44, IEEE, 2010.
[19] N. H. Sephus, A. D. Lanterman, and D. V. Anderson, “Exploring frequency modulation features and resolution in the modulation spectrum,” in Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE), 2013 IEEE, pp. 169–174, IEEE, 2013.
[20] J.-M. Ren and J.-S. R. Jang, “Discovering time-constrained sequential patterns for music genre classification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1134–1144, 2012.
[21] C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S. Lin, “Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features,” IEEE Transactions on Multimedia, vol. 11, no. 4, pp. 670–682, 2009.
[22] M. Markaki and Y. Stylianou, “Voice pathology detection and discrimination based on modulation spectral features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 1938–1948, 2011.
[23] S.-C. Lim, S.-J. Jang, S.-P. Lee, and M. Y. Kim, “Music genre/mood classification using a feature-based modulation spectrum,” in Mobile IT Convergence (ICMIC), 2011 International Conference on, pp. 133–136, IEEE, 2011.
[24] D. Reynolds, “Gaussian mixture models,” Encyclopedia of biometrics, pp. 827–832, 2015.
[25] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using gaussian mixture speaker models,” IEEE transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72–83, 1995.
[26] K. B. Petersen, M. S. Pedersen, et al., “The matrix cookbook,” Technical University of Denmark, vol. 7, p. 15, 2008.
[27] A. B. Downey, Think complexity: complexity science and computational modeling, ch. 9, p. 91. O’Reilly Media, Inc., 2012.

簡易檢索 / 詳目顯示

相關論文