研究生: |
劉承泰 Liu, Cheng-tai |
---|---|
論文名稱: |
嵌入式語音命令系統的設計與改進 Design and Improvement of an Embedded Voice Command System |
指導教授: |
張智星
張俊盛 |
口試委員: |
王新民
呂仁園 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2013 |
畢業學年度: | 101 |
語文別: | 中文 |
論文頁數: | 55 |
中文關鍵詞: | 梅爾倒頻譜係數 、異質性線性鑑別分析 、語音辨識 |
外文關鍵詞: | Mel-frequency cepstral coefficients, heteroscedastic linear discriminant analysis, speech recognition |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
論文的研究目的是改進本實驗室嵌入式語音命令系統的效能。主要重點是加快系統處理速度,並希望降低因加快速度額外產生的錯誤率。
本論文提出的大方向是降低系統所需處理的特徵維度,共有兩種方法。第一種方法是直接降低39維梅爾倒頻譜係數的維度。第二種方法是將特徵合併之後,使用異質性線性鑑別分析進行降維,並且透過放大係數整數化轉換矩陣,置入於系統內以即時進行特徵轉換。我們亦基於第二種方法進行其他實驗,比較不同設定之下的辨識率。
最後實驗結果顯示,使用異質性線性鑑別分析進行降維,除了能加快系統處理速度之外,更可有效降低錯誤率。其整體辨識率不但比直接降維的方法好,甚至在某些條件下可超越原先39維特徵的結果,但是辨識效能會隨聲學模型之mixture component數以及進行分析的方法而改變。因此,我們可針對需求使用最適合的方式進行異質性線性鑑別分析,以求得到最好的效果。
The purpose of this research is to improve the performance of our lab’s embedded voice command system. The goal is to speed up the processing time and reduce the additional errors caused by our method.
In order to do so, we propose two methods to reduce the feature dimension required by the system as follows. The first one is to directly reduce original 39 dimensions. The second one is to use heteroscedastic linear discriminant analysis after increasing the dimension of original feature vectors. Then, we change the floating-point transform matrix to a fixed-point version through a scale factor and store it in the system for the feature transformation in runtime. Based on the second method, different parameter settings are tested.
The final experimental result shows that the second method (heteroscedastic linear discriminant analysis) outperforms the first method (direct feature reduction). The first method even performs better than the original method with 39 dimension of feature in some cases. This result indicates that heteroscedastic linear discriminant analysis is able to effectively accelerate the recognition time while at the same time reduce the error rate. However, the experimental results also show that the performance changes with different number of mixture components in acoustic models and the analysis method. We can therefore choose the most suitable way to do the analysis for the best performance.
[1] 蘇金龍,”Speech Recognition on 32-bit Fixed-Point Processors: Implementation & Discussions”,清華大學碩士論文,2005年。
[2] 陳奕宏,”32位元處理器之定點數MFCC演算法的改進與探討”,清華大學碩士論文,2006年。
[3] 黃俊仁,”嵌入式語音辨識之改良”,清華大學碩士論文,2009年。
[4] 扈均,“Improvement of 32-bit Embedded Speech Recognition Systems”,清華大學碩士論文,2012年。
[5] 陳揚昇,”結合多重聲學模型來改進英語語音評分”,清華大學碩士論文,2011年。
[6] 莊芫綱,”使用異質性線性鑑別分析於特定語料以改進特定應用之語音命令辨識”,清華大學碩士論文,2012年。
[7] 張志豪,”強健性和鑑別力語音特徵擷取技術於大詞彙連續語音辨識之研究”,臺灣師範大學碩士論文,2005年。
[8] Steven Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying (Andrew) Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev and Phil Woodland, The HTK Book (For HTK Version 3.4), Microsoft Corporation, 2009.
[9] N. Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition,”Ph.D. Thesis, Johns Hopkins Univ., Baltimore, MD, 1997
[10] N. Kumar and A. G. Andreou, “Heteroscedastic Discriminant Analysis and Reduced Rank HMMs for Improved Speech Recognition,” Speech Communication ,1998.
[11] Stefan Geirhofer, “Feature Reduction with Linear Discriminant Analysis and its Performance on its Performance on Phoneme Recognition”, ECE272 - Individual Study in ECE Problems, University of Illinois at Urbana-Champaign, 2004
[12] NTU CSIE Communication and Multimedia Laboratory, “Introduction to LDA”, Available on: http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/LDA.pdf.
[13] Jyh-Shing Roger Jang, “Audio Signal Processing and Recognition”, Available on: http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/.
[14] Jyh-Shing Roger Jang, “Data Clustering and Pattern Recognition”, Available on: http://neural.cs.nthu.edu.tw/jang/books/dcpr/.