嵌入式語音命令系統的設計與改進｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	劉承泰 Liu, Cheng-tai
論文名稱：	嵌入式語音命令系統的設計與改進 Design and Improvement of an Embedded Voice Command System
指導教授：	張智星張俊盛
口試委員:	王新民呂仁園
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2013
畢業學年度：	101
語文別：	中文
論文頁數：	55
中文關鍵詞：	梅爾倒頻譜係數、異質性線性鑑別分析、語音辨識
外文關鍵詞：	Mel-frequency cepstral coefficients, heteroscedastic linear discriminant analysis, speech recognition
相關次數：	點閱：2 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

論文的研究目的是改進本實驗室嵌入式語音命令系統的效能。主要重點是加快系統處理速度，並希望降低因加快速度額外產生的錯誤率。
本論文提出的大方向是降低系統所需處理的特徵維度，共有兩種方法。第一種方法是直接降低39維梅爾倒頻譜係數的維度。第二種方法是將特徵合併之後，使用異質性線性鑑別分析進行降維，並且透過放大係數整數化轉換矩陣，置入於系統內以即時進行特徵轉換。我們亦基於第二種方法進行其他實驗，比較不同設定之下的辨識率。
最後實驗結果顯示，使用異質性線性鑑別分析進行降維，除了能加快系統處理速度之外，更可有效降低錯誤率。其整體辨識率不但比直接降維的方法好，甚至在某些條件下可超越原先39維特徵的結果，但是辨識效能會隨聲學模型之mixture component數以及進行分析的方法而改變。因此，我們可針對需求使用最適合的方式進行異質性線性鑑別分析，以求得到最好的效果。

The purpose of this research is to improve the performance of our lab’s embedded voice command system. The goal is to speed up the processing time and reduce the additional errors caused by our method.
In order to do so, we propose two methods to reduce the feature dimension required by the system as follows. The first one is to directly reduce original 39 dimensions. The second one is to use heteroscedastic linear discriminant analysis after increasing the dimension of original feature vectors. Then, we change the floating-point transform matrix to a fixed-point version through a scale factor and store it in the system for the feature transformation in runtime. Based on the second method, different parameter settings are tested.
The final experimental result shows that the second method (heteroscedastic linear discriminant analysis) outperforms the first method (direct feature reduction). The first method even performs better than the original method with 39 dimension of feature in some cases. This result indicates that heteroscedastic linear discriminant analysis is able to effectively accelerate the recognition time while at the same time reduce the error rate. However, the experimental results also show that the performance changes with different number of mixture components in acoustic models and the analysis method. We can therefore choose the most suitable way to do the analysis for the best performance.

摘要    I
Abstract    II
謝誌    III
目錄    IV
表目次    VIII
圖目次    IX
第一章    緒論    1
1 研究動機    1
2 研究背景    1
3 研究方向    2
4 本論文概要    2
第二章  相關理論與知識    3
1 語音辨識流程    3
2 HTK聲學模型訓練    4
2.1 HTK簡介    4
2.2 音素與模型    4
2.3 隱藏式馬可夫模型    5
3 梅爾倒頻譜係數    7
3.1 預強調    8
3.2 音框化    8
3.3 漢明窗    8
3.4 快速傅立葉轉換    9
3.5 三角帶通濾波器    9
3.6 離散餘弦轉換    9
3.7 對數能量    10
3.8 一次與二次微分    10
4 異質性線性鑑別分析    11
4.1 線性鑑別分析    11
4.2 異質性線性識別分析    14
第三章 研究方法    20
1 訓練最佳聲學模型    20
1.1 訓練語料與測試語料簡介    20
1.2 訓練方式    22
1.3 測試方式    24
1.4 測試結果與分析    25
2 系統臭蟲修復    27
3 MFCC特徵降維    30
3.1 直接降維    30
3.2 HLDA降維    31
3.2.1 特徵合併法    32
3.2.2 定點化轉換矩陣    33
第四章 實驗結果與分析    35
1 實驗一：直接降維    36
1.1 實驗目的    36
1.2 實驗方式    36
1.3 實驗結果與分析    36
2 實驗二：使用特徵合併法進行HLDA降維    38
2.1 實驗目的    38
2.2 實驗方式    38
2.3 實驗結果與分析    39
3 實驗三：僅利用EAT語料進行HLDA    44
3.1 實驗目的    44
3.2 實驗方式    44
3.3 實驗結果與分析    44
4 實驗四：使用原39維特徵進行HLDA降維    46
4.1 實驗目的    46
4.2 實驗方式    46
3.3 實驗結果與分析    46
5於嵌入式系統上執行之處理時間    48
5.1 測試平台簡介    48
5.2 測試結果與分析    49
6 測試全向性麥克風語料    50
6.1 測試結果    50
第五章 結論與未來研究方向    52
1 結論    52
2 未來研究方向    53
參考文獻    54
                                

[1] 蘇金龍，”Speech Recognition on 32-bit Fixed-Point Processors: Implementation & Discussions”，清華大學碩士論文，2005年。
[2] 陳奕宏，”32位元處理器之定點數MFCC演算法的改進與探討”，清華大學碩士論文，2006年。
[3] 黃俊仁，”嵌入式語音辨識之改良”，清華大學碩士論文，2009年。
[4] 扈均，“Improvement of 32-bit Embedded Speech Recognition Systems”，清華大學碩士論文，2012年。
[5] 陳揚昇，”結合多重聲學模型來改進英語語音評分”，清華大學碩士論文，2011年。
[6] 莊芫綱，”使用異質性線性鑑別分析於特定語料以改進特定應用之語音命令辨識”，清華大學碩士論文，2012年。
[7] 張志豪，”強健性和鑑別力語音特徵擷取技術於大詞彙連續語音辨識之研究”，臺灣師範大學碩士論文，2005年。
[8] Steven Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying (Andrew) Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev and Phil Woodland, The HTK Book (For HTK Version 3.4), Microsoft Corporation, 2009.
[9] N. Kumar, “Investigation of Silicon-Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition,”Ph.D. Thesis, Johns Hopkins Univ., Baltimore, MD, 1997
[10] N. Kumar and A. G. Andreou, “Heteroscedastic Discriminant Analysis and Reduced Rank HMMs for Improved Speech Recognition,” Speech Communication ,1998.
[11] Stefan Geirhofer, “Feature Reduction with Linear Discriminant Analysis and its Performance on its Performance on Phoneme Recognition”, ECE272 - Individual Study in ECE Problems, University of Illinois at Urbana-Champaign, 2004
[12] NTU CSIE Communication and Multimedia Laboratory, “Introduction to LDA”, Available on: http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/LDA.pdf.
[13] Jyh-Shing Roger Jang, “Audio Signal Processing and Recognition”, Available on: http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/.
[14] Jyh-Shing Roger Jang, “Data Clustering and Pattern Recognition”, Available on: http://neural.cs.nthu.edu.tw/jang/books/dcpr/.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文