整數型態之FFT的實作與改進，及其在語音辨識之應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	周俞璋 Yu-Chang Chou
論文名稱：	整數型態之FFT的實作與改進，及其在語音辨識之應用 Implementation and Improvement of Integer-type FFT for Speech Recognition
指導教授：	張智星 Jyh-Shing Roger Jang
口試委員:
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 產業研發碩士積體電路設計專班 Industrial Technology R&D Master Program on IC Design
論文出版年：	2007
畢業學年度：	95
語文別：	中文
論文頁數：	35
中文關鍵詞：	語音辨識
外文關鍵詞：	Speech Recognition
相關次數：	點閱：57 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

語音辨識系統應用，在現今是個很熱門的一項電腦技術。我們可以從目前的手機和語言學習機上發現其應用，例如：手機上的聲控撥號以及語言學習機的跟讀比對功能。當民眾在使用這些產品時，所要求的就是辨識率、辨識速度、記憶體容量、成本和攜帶便利性等等。在理想的情況下，當然是希望能辨識率高、辨識速度快、占記憶體容量少、成本低和容易攜帶等等優點。不過這都是在理想情況下才能達到，所以目前只能看使用者的需求，來加以取捨以上的優點。
而現在人們大多為了追求攜帶方便、低耗電量和低成本的優點。所以個人行動處理器上的時脈都比PC的低了許多，並且為了低成本，而捨去昂貴的浮點運算功能。若要將PC上應用的語音辨識系統，直接轉移到個人行動處理器上，就無法達到和在PC上執行時，有一樣的效能。
若要將語音辨識系統，應用在個人行動處理器上，必須將原本的浮點運算轉換成整數型態的運算，如此才能配合無浮點運算功能的個人行動處理器，讓使用者能接受容易攜帶且低成本的優點。
在語音辨識上最重要的參數-梅爾倒頻譜係數，在抽取語音特徵的程序中，以整數型態快速傅立葉轉換(integer-FFT)過程，所產生的誤差值是最大的。因此本篇論文所討論的重點，以各種放大係數，對浮點數FFT轉換成整數型態FFT過程中，分析對誤差影響的程度。最後再依據分析結果，找出一個放大係數，使辨識率為最佳的。

In this thesis, we investigate the methodologies of porting the computation of floating-point MFCCs to integer ones. In particular, we focus on the implementation of integer-FFT during the integer-MFCCs computation. We have closely checked the scaling-up factors during each stage of the computation of integer-FFT by using a data-driven approach. These scaling-up factors are carefully chosen for striking a balance between precision and overflow, such that the highest recognition rate can be achieved. Moreover, we have proposed the use of some characteristics in trigonometric functions in building a minimum lookup table without degrading the recognition rate.

第一章 緒論    1
1研究動機    1
2研究方向    1
3章節概述    2
第二章 基礎理論與技術    3
1語音辨識    3
1.1 語音辨識流程    3
1.2 抽取特徵參數    4
1.3 聲音單元    5
1.4 隱藏式馬可夫模型    6
1.5 語音辨識法則    7
1.6 辨識網路    10
2快速傅立葉轉換    11
2.1 快速傅立葉轉換原理    11
2.2 浮點數轉整數型態方式    13
2.3 整數型態FFT1    13
第三章 相關研究與改進方法    15
1相關研究    15
1.1 整數型態FFT2 [ 2]    15
1.2 整數型態FFT3 [ 5]    18
1.3 整數型態FFT4 [ 9]    20
2 改進方法(整數型態FFT5)    23
第四章 實驗結果與討論分析    25
1 誤差分析    25
2 辨識率分析    29
第五章 結論與未來工作    33
1 結論    33
2 未來工作    33
參考文獻    34

                                

[ 1]Jyh-Shing Roger Jang ,Url：音訊處理與辨識線上教材http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/
[ 2]Ying-Jui Chen and Truong Q. Nguyen,” Integer Fast Fourier Transform”, IEEE transactions on signal processing,vol.50,no.3,March 2002.
[ 3]Alan V.Oppenheim ,Ronald W.Schafer with John R.Buck Discrete-Time Signal Processing.2th Edition
[ 4]William H. Press ,Saul A. Teukolsky ,William T. Vetterling and Brian P. Flannery ,Numerical Recipes in C. The Art of Scientific Computing ,Second Edition
[ 5]Yoshikazu Yokotanit, Soontom Oraintarat, Ralf Geiges, Gerald Schullert, and K.R.Rao ,”A Comparison of Integer Fast Fourier Transforms for Lossless Coding”. International Symposium on Communications and Information Technologies 2004 ( ISCIT 2004 ) Sapporo, Japan, October 26- 29,2004
[ 6]Randy Yates, Fixed-Point Arithmetic: An Introduction, Digital Audio Signal Processing ,March 3, 2001
[ 7]Po-Chien Hsueh, Jyh-Shing Roger Jang, “Embedded Speech Recognition” ,NTHU Master Thesis, July 2004
[ 8]Chin-Lung Hart Su, Jyh-Shing Roger Jang “Speech Recognition on 32-bit Fixed-point Processors: Implementation & Discussions”, NTHU Master Thesis ,July 2005
[ 9]Yi-Hung Chen, Jyh-Shing Roger Jang,” Improvement and Discussion of MFCC Algorithm on 32-bit Fixed-point Processors”, NTHU Master Thesis ,July 2006
[ 10]J.W. Cooley and J.W. Tukey, “An algorithm for the machine calculation
of complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, April 1965.
[ 11]Chun-Yi Lee, Jyh-Shing Roger Jang,” Speech Evaluation”, NTHU Master
Thesis ,July 2002
[ 12]Shiuan-Sung Lin, Jyh-Shing Roger Jang, “Optimization of Viterbi Beam Search
in Speech Recognition and Multilingual Speech Recognition”, NTHU Master
Thesis, July 2002
[ 13]Hidden Markov Model Toolkit V3.2 Speech Vision and Robotics Group of the
Cambridge University Engineering Department, 2002.(http://htk.eng.cam.ac.uk/)
[ 14]Lawrence Rabiner, B.H Juang, Fundamentals of speech recognition, Prentice
Hall, 1993.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文