研究生: |
莊芫綱 Chuang, Yuan-Kang |
---|---|
論文名稱: |
使用異質性線性鑑別分析於特定語料以改進特定應用之語音命令辨識 On the Use of Heteroscedastic Linear Discriminant Analysis on Task-specific Corpora for Improving Task-specific Speech Command Recognition |
指導教授: | 張智星 |
口試委員: |
呂仁園
江永進 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 50 |
中文關鍵詞: | 梅爾倒頻譜係數 、異質性線性識別分析 、隱藏式馬可夫模型 、語音辨識 |
外文關鍵詞: | Mel-frequency cepstral coefficients, Heteroscedastic linear discriminant analysis, hidden Markov models, speech recognition |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文論述的重點在於針對特定應用從現有語料中挑選特定語料與現有HMM進行鑑別式特徵轉換以改進英語語音辨識。本論文是使用異質性線性分析實做特徵轉換,包含兩大部分:第一部分為「以特定應用語料進行鑑別式特徵轉換」、第二部分為「特徵合併法」。
『以特定應用語料進行鑑別式特徵轉換』包含『以少量特定應用語料進行鑑別式特徵轉換』與『針對特定應用以挑選語料進行鑑別式特徵轉換』兩種方法。第一種方法是基於特定應用語料不足的前提下,利用現有HMM以此少量特定應用語料進行特徵轉換;而第二種方法則強調從現有語料中挑選特定語料與現有HMM進行特徵轉換,這兩者的差別在於有無使用特定應用之語料。
『特徵合併法』,藉由串接音框特徵的方式來增加每一個音框在時域上的特徵資訊,再搭配特徵降維的技巧訓練出HMM,以改善英語語音辨識系統。
為測試所提的方法的效能,我們以整句辨識當作評量的依據。經本論文實驗發現,使用HLDA進行鑑別式特徵轉換有較好辨識效能,特徵合併方法的辨識結果亦優於基礎語音模型。針對上述兩種方法的組合,其辨識率為97.49%,為本論文中最佳的結果。
This research focuses on selecting a specific training data from the existing corpus to conduct a discriminative feature transformation with existing hidden Markov models (HMM) to improve the performance of task-specific English speech recognition. This thesis contains two parts: the first part is task-specific corpus selection for discriminative feature transform using heteroscedastic linear discriminant analysis (HLDA); the second part is feature mergence.
Two methods are used for task-specific corpus selection for discriminative feature transformation using HLDA. The first method uses small amount of task-specific corpus to perform discriminative feature transformation. The second method select a subset of the training corpus, based on the task, to perform discriminative feature transformation. The first method uses the existing HMMs to conduct discriminative feature transformation with limited task-specific training data. The second method focuses on selecting specific training data from the existing corpus to conduct discriminative feature transformation with the existing HMMs. The difference between these two methods lies on whether the task-specific corpus is used.
The second part of this thesis, features mergence, improves the contextual information of each frame in time domain by cascading the feature of frames. HMMs are then trained with different feature extraction techniques to improve the English speech recognition system.
To evaluate the performance of the porposed methods, this thesis uses sentence recognition rate as our performance measure. The experimental result shows that discriminative feature transformation using HLDA has a better performance. Besides, feature mergence also outperforms the baseline acoustic HMMs. Lastly, combining the above two methods achieves the best recognition rate of 97.49% in this research.
【1】 R. Kozicka, J. Kacur, “Optimization and implementation of data processing methods for MASPER training procedure,” International Workshop on Multimedia and Signal Processing, 2012
【2】 Steve Young, The HTK Book version 3.4, Microsoft Corporation, 2009
【3】 Davis, “Comparison of parametric representation for mononsyllabic word recognition in continuously spoken sentences.” IEEE International Conference on Acoustics, 1980
【4】 R. A. Fisher, “The use of Multiple measurements in taxonomic problems,” Ann. Eugen., 1936.
【5】 R. A. Fisher, “The statistical utilization of multiple measurements,” Ann.Eugen., 1938.
【6】 N. Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition,”Ph.D. Thesis, Johns Hopkins Univ., Baltimore, MD, 1997
【7】 N. Kumar and A. G. Andreou, “Heteroscedastic Discriminant Analysis and Reduced Rank HMMs for Improved Speech Recognition,” Speech Communicatio ,1998.
【8】 M. J. F. Gales, “Semi-tied Covariance Matrices for Hidden Markov Models,” IEEE Trans. SAP, 1999
【9】 M. J. F. Gales, “Maximum Likelihood Multiple Projection Schemes for Hidden Markov Models,” Cambridge University Technical Report RT-365, 2001.
【10】 M. Gales, S. Young, The Application of Hidden Markov Models in Speech Recognition. Delft, The Netherlands: now Publishers Ins., 2008
【11】 H. Hermansky, “Exploring temporal domain for robustness in speech recognition, ” in Proc. ICA, 1995
【12】 S. Nakagawa and K. Yamamoto, “Evaluation of segmental unit input HMM,” ICASSP, 1996
【13】 S. Furui, “Speaker-independent isolated word recognizer using dynamic features of speech spectrum, ” IEEE Trans. Acoustics, Speech, and Signal Processing, 1986.
【14】 Zue, V.; Seneff, S. & Glass J. “Speech database development at MIT: TIMIT and beyond,” Speech Communication, 1990
【15】 Lee, K. & Hon, H. “Speaker-independent phone recognition using hidden Markov models,“ IEEE Transactions on Acoustics, Speech, and Signal Processing,1989
【16】 C. Lopes and F. Perdigao. Phone Recognition on the TIMIT Database. Instituto de Telecomunicac~oes, Portugal, 2011
【17】 M. Sakai, N. Kitaoka, and K. Takeda, “Feature transformation based on discriminant analysis preserving local structure for speech recognition,” ICASSP, 2009.
【18】 Garofolo, J.; Lamel, L.; Fisher, W.; Fiscus, J.; Pallett, D.; & Dahlgren, N. (1990). DARPA”, TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM, ” National Institute of Standards and Technology, 1990
【19】 CMU dictionary v.0.7a. Available from: <http://www.speech.cs.cmu.edu/cgi-bin/cmudict>
【20】 J.D. Markel, A.H. Gray, and H. Wakita,”Linear Prediction of Speech-Theory and Practice, ” SCRL Monograph No. 10, Speech Communications Research Laboratory, Santa Barbara, California, 1973.
【21】 Hermansky, H. “Perceptual linear predictive (PLP) analysis of speech, ”J. Acoust. Soc. Am.,1990.
【22】 S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoustics, Speech and Signal Processing, 1981.
【23】 楊永泰, “隱藏式馬可夫模型應用於中文語音辨識之研究”,中原大學碩士論文,民國89年。
【24】 陳柏琳, “中文語音資訊檢索-以音節為基礎之索引特徵、統計式檢索模型及進一步技術”,台灣大學博士論文,民國90年
【25】 呂道誠,”不特定語者、國台雙語大詞彙語音辨識之聲學模型研究”,長庚大學碩士論文,民國90年
【26】 R. A. Gopinath, “Maximum likelihood modeling with Gaussian distributions,” In Proceedings of ICASSP, Seattle, 1998.
【27】 M. Gales, S. Young, The Application of Hidden Markov Models in Speech Recognition. Delft, The Netherlands: now Publishers Ins., 2008
【28】 S. Geirhofer, “Feature Reduction with Linear Discriminant Analysis and its Performance on its Performance on Phoneme Recognition,” ECE272 - Individual Study in ECE Problems, University of Illinois at Urbana-Champaign, 2004.
【29】 R.G. Leonard, “A database for speaker-independent digit recog-nition”, Proceedings of ICASSP84, 1984.