研究生: |
周俞璋 Yu-Chang Chou |
---|---|
論文名稱: |
整數型態之FFT的實作與改進,及其在語音辨識之應用 Implementation and Improvement of Integer-type FFT for Speech Recognition |
指導教授: |
張智星
Jyh-Shing Roger Jang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 產業研發碩士積體電路設計專班 Industrial Technology R&D Master Program on IC Design |
論文出版年: | 2007 |
畢業學年度: | 95 |
語文別: | 中文 |
論文頁數: | 35 |
中文關鍵詞: | 語音辨識 |
外文關鍵詞: | Speech Recognition |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語音辨識系統應用,在現今是個很熱門的一項電腦技術。我們可以從目前的手機和語言學習機上發現其應用,例如:手機上的聲控撥號以及語言學習機的跟讀比對功能。當民眾在使用這些產品時,所要求的就是辨識率、辨識速度、記憶體容量、成本和攜帶便利性等等。在理想的情況下,當然是希望能辨識率高、辨識速度快、占記憶體容量少、成本低和容易攜帶等等優點。不過這都是在理想情況下才能達到,所以目前只能看使用者的需求,來加以取捨以上的優點。
而現在人們大多為了追求攜帶方便、低耗電量和低成本的優點。所以個人行動處理器上的時脈都比PC的低了許多,並且為了低成本,而捨去昂貴的浮點運算功能。若要將PC上應用的語音辨識系統,直接轉移到個人行動處理器上,就無法達到和在PC上執行時,有一樣的效能。
若要將語音辨識系統,應用在個人行動處理器上,必須將原本的浮點運算轉換成整數型態的運算,如此才能配合無浮點運算功能的個人行動處理器,讓使用者能接受容易攜帶且低成本的優點。
在語音辨識上最重要的參數-梅爾倒頻譜係數,在抽取語音特徵的程序中,以整數型態快速傅立葉轉換(integer-FFT)過程,所產生的誤差值是最大的。因此本篇論文所討論的重點,以各種放大係數,對浮點數FFT轉換成整數型態FFT過程中,分析對誤差影響的程度。最後再依據分析結果,找出一個放大係數,使辨識率為最佳的。
In this thesis, we investigate the methodologies of porting the computation of floating-point MFCCs to integer ones. In particular, we focus on the implementation of integer-FFT during the integer-MFCCs computation. We have closely checked the scaling-up factors during each stage of the computation of integer-FFT by using a data-driven approach. These scaling-up factors are carefully chosen for striking a balance between precision and overflow, such that the highest recognition rate can be achieved. Moreover, we have proposed the use of some characteristics in trigonometric functions in building a minimum lookup table without degrading the recognition rate.
[ 1]Jyh-Shing Roger Jang ,Url:音訊處理與辨識線上教材http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/
[ 2]Ying-Jui Chen and Truong Q. Nguyen,” Integer Fast Fourier Transform”, IEEE transactions on signal processing,vol.50,no.3,March 2002.
[ 3]Alan V.Oppenheim ,Ronald W.Schafer with John R.Buck Discrete-Time Signal Processing.2th Edition
[ 4]William H. Press ,Saul A. Teukolsky ,William T. Vetterling and Brian P. Flannery ,Numerical Recipes in C. The Art of Scientific Computing ,Second Edition
[ 5]Yoshikazu Yokotanit, Soontom Oraintarat, Ralf Geiges, Gerald Schullert, and K.R.Rao ,”A Comparison of Integer Fast Fourier Transforms for Lossless Coding”. International Symposium on Communications and Information Technologies 2004 ( ISCIT 2004 ) Sapporo, Japan, October 26- 29,2004
[ 6]Randy Yates, Fixed-Point Arithmetic: An Introduction, Digital Audio Signal Processing ,March 3, 2001
[ 7]Po-Chien Hsueh, Jyh-Shing Roger Jang, “Embedded Speech Recognition” ,NTHU Master Thesis, July 2004
[ 8]Chin-Lung Hart Su, Jyh-Shing Roger Jang “Speech Recognition on 32-bit Fixed-point Processors: Implementation & Discussions”, NTHU Master Thesis ,July 2005
[ 9]Yi-Hung Chen, Jyh-Shing Roger Jang,” Improvement and Discussion of MFCC Algorithm on 32-bit Fixed-point Processors”, NTHU Master Thesis ,July 2006
[ 10]J.W. Cooley and J.W. Tukey, “An algorithm for the machine calculation
of complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, April 1965.
[ 11]Chun-Yi Lee, Jyh-Shing Roger Jang,” Speech Evaluation”, NTHU Master
Thesis ,July 2002
[ 12]Shiuan-Sung Lin, Jyh-Shing Roger Jang, “Optimization of Viterbi Beam Search
in Speech Recognition and Multilingual Speech Recognition”, NTHU Master
Thesis, July 2002
[ 13]Hidden Markov Model Toolkit V3.2 Speech Vision and Robotics Group of the
Cambridge University Engineering Department, 2002.(http://htk.eng.cam.ac.uk/)
[ 14]Lawrence Rabiner, B.H Juang, Fundamentals of speech recognition, Prentice
Hall, 1993.