研究生: |
蔡明嘉 Tsai, Ming-Chia |
---|---|
論文名稱: |
使用支持向量機演算法之鼻音事件偵測 Nasal event detection using support vector machine |
指導教授: |
王小川
Wang, Hsiao-Chuan |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 鼻音偵測 、聲學特徵參數 、支持向量機 |
外文關鍵詞: | Nasal detection, Acoustic parameter, Support vector machine |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
新一代自動語音辨認系統採用以知識為基礎的特徵參數,對特殊音提供更具其代表性特徵參數,以提升偵測正確率。本論文探討了容易混淆的鼻音與半母音特性,利用小波轉換計算每一頻帶範圍的能量值,藉由鼻音與半母音屬於低頻成分較多的性質,進而取出其特徵參數,特徵參數包含梅爾倒頻譜係數(Mel-frequency cepstral coefficients, MFCC)、能量比值(Energy ratio)以及希爾伯轉換後的包絡線值(Hilbert envelope)變化量,比較特徵參數分離效果,再使用支持向量機(Support Vector Machine, SVM)技術達到分類的目的,將音框分類之後,可以找出鼻音的釋放(Release)以及結束(Closure)的轉換點,找出語音分段邊界,並探討此方法的準確性。實驗語料使用TIMIT語料庫,鼻音偵測正確率可達到82%,比上以HMM作音素辨識之關鍵詞檢測架構的偵測率正確率80%可高上2%,其特徵參數使用的是MFCC+△MFCC+△△MFCC+ logEnergy +△logEnergy +△△logEnergy。而以本文方法實驗的釋放與結束轉換點,其偵測的結果與手動標示的記號誤差平均分別是9.74ms及-8.9ms。在假警報率的部分分別對母音、半母音、摩擦音、塞擦音及塞音的分類而言,其百分比分別是2.4%、1%、2%、1%、及0.2%,效果顯示不會有太多誤判的情形。
【1】G. Castellanos, G. Daza, L. Sánchez, O. Castrillón, J. Suárez, “Acoustic Speech Analysis for Hypernasality Detection in Children,” Proceedings of the 28th IEEE EMBS Annual International Conference New York City, USA, Aug 30-Sept 3, 2006
【2】Marilyn Y. Chen, “Nasal Detection Module for a Knowledge-based Speech Recognition System,” ICSLP 2000, Vol.6,pp.636-639
【3】J, R. Glass and V. W. Zue (1986), "Signal Representation for Acoustic Segmentation", Proceedings First Australian Conference on Speech Science and Technology, November 1986, pp. 124-129.
【4】T. Pruthi and C. Espy-Wilson, “Acoustic parameters for automatic detection of nasal manner,”Speech Communication, vol. 43, no. 3, pp. 225–239, 2004.
【5】Neira Hajro, “Automated nasal feature detection for the lexical access from features project”,Massachusetts Institute of Technology, 2004
【6】Vladimir N.Vapnik, The nature of statistical learning theory, Springer-Verlag New York, Inc., New York, NY, USA, 1995.
【7】Christopher J.C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998.
【8】Chih-Chung Chang and Chih-Jen Lin, LIBSVM tool version 2.89, 2009.
【9】 黃鈞尉, “語音事件偵測與國語連續語音之標音”, 清華大學碩士論文民國九十七年
【10】 陳錫賢, “語音特定屬性之偵測與應用”, 清華大學碩士論文民國九十五年
【11】Tarun Pruthi, Carol Y.Espy-Wilson, “Automatic Classification of Nasals and Semivowels”, 15th ICPhS Barcelona, 2003
【12】Glass, J.R. “Nasal Consonants and Nasalized Vowels:An Acoustic Study and Recognition Experiment,”Ms and EE thisis, Massachusetts Institute of Technology, Cambridge, MA, 1984
【13】Liu, S.A. “Landmark Detection for Distinctive Feature-Based Speech Recognition,” Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1995
【14】Strvens, K.N. “Toward a model for lexical access based on acoustic landmarks and distinctive features,”Journal of the Acoustical Society of America, 111, 1892-1891, 2002
【15】Zhimin Xie and Partha Niyogi, “ Robust acoustic-based syllable detection,”In Proc. of ICSLP, 2006.
【16】Ariel Salomon, Carol Espy-Wilson, and Om Deshmukh, “ Detection of speech landmarks: Use of temporal information,” J. Acoust. Soc. Am, 115(3):1296–1305, 2004.
【17】Partha Niyogi and M. M. Sondhi, “ Detecting stop consonants in continuous speech,” J. Acoust. Soc. Am, 111(2):1063–1076, 2002.
【18】Stéphane Mallat, “ A wavelet tour of signal processing,” 2nd Edition, Academic Press, 1999