研究生: |
扈均 Hu, Jim |
---|---|
論文名稱: |
32位元嵌入式語音辨識系統之改進 Improvement of 32-bit Embedded Speech Recognition Systems |
指導教授: | 張智星 |
口試委員: |
王逸如
冀泰石 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 英文 |
論文頁數: | 42 |
中文關鍵詞: | 嵌入式 、特徵擷取 、語音辨識 |
外文關鍵詞: | ASRA |
相關次數: | 點閱:1 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文分析與改進本實驗室改自HTK (hidden Markov model toolkit) 的整數版辨識引擎所使用的MFCCs (Mel-Frequency Cepstral Coefficients)特徵擷取。
我們提出三種改進方式:首先我們換了FFT (fast Fourier transform) 演算法,再來FFT所產生的功率頻譜和Mel-filter bank我們改採用對數值的表示方式,最後我們用兩倍長度的整數以增加精準度來改善乘法運算方式。
實驗結果顯示以上所提出的方法對於整數運算的精準度還有辨識率相較於原本的整數系統都有改善(2~3%)。而進一步測試Viterbi階段時發生的溢位 還有 背景噪音 的影響後,我們發現這些對於辨識率並無直接相關性。
This thesis analyzes and improves on the accuracy of the MFCCs (Mel-Frequency Cepstral Coefficients) feature extraction currently used in our lab’s fixed-point ASR (automatic speech recognition) system based on HTK (hidden Markov model toolkit).
We propose three methods for improvement: first by changing the FFT (fast Fourier transform) algorithm, then by using a logarithmic representation for the power spectrum after FFT and the Mel-filter bank, and lastly we improve the method for multiplication by using double-length integers to achieve higher precision.
Experimental results shows that each of the above methods yields an improvement in both fixed-point computation precision and recognition rates (by 2~3%) over the original fixed-point system. Further experiments on the effects of overflow at the Viterbi stage and background noise show no correlation of these effects with recognition rates.
[1] H. V. Sorensen, J. L. Douglas, M. T. Heideman and S. C. Burrus, "Real-valued Fast Fourier Transform Algorithms," IEEE Transactions on Acoustics, Speech and Signal Processing, Vols. ASSP-35, no. 6, p. 849~863, 6 1987.
[2] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comp., vol. 19, pp. 297-301, 1965.
[3] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, "Fast Fourier Transform (FFT)," in Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992, pp. 504-510.
[4] J. W. Cooley, P. W. Lewis and P. D. Welch, "The fast Fourier transform algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms," Journal of Sound and Vibration, vol. 12, no. 3, pp. 315-337, 22 8 1969.
[5] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, "FFT of Real Functions, Sine and Cosine," in Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992, pp. 504-515.
[6] N. M. Brenner, "Fast Fourier Transform of Externally Stored Data," IEEE Transactions on Audio and Electroacoustics, pp. 128-132, 6 1969.
[7] P.-C. Hsueh and J.-S. R. Jang, "Speech Recognition on 32-bit Fixed-point Processors:Implementation & Discussions," NTHU Master Thesis, July 2004.
[8] C.-L. H. Su and J.-S. R. Jang, "Speech Recognition on 32-bit Fixed-point Processors:Implementation & Discussions," NTHU Master Thesis, July 2005.
[9] Y.-H. Chen and J.-S. R. Jang, "Implementation and Improvement of Integer-type FFT for Speech Recognition," NTHU Master Thesis, July 2006.
[10] Y.-C. Chou and J.-S. R. Jang, "Implementation and Improvement of Integer-type FFT for Speech Recognition," NTHU Master Thesis, July 2007.
[11] C.-J. Huang and J.-S. R. Jang, "On the Improvement of Embedded Speech Recognition," NTHU Master Thesis, June 2009.
[12] D. Huggins-daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar and A. I. Rudnicky, "PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices," Proceedings of ICASSP, 2006.
[13] Cambridge University Engineering Department, "HTK: Hidden markov model Tool Kit," [Online]. Available: http://htk.eng.cam.ac.uk/.
[14] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992.
[15] R. Jang, “Audio Speech Recognition On-line Tutorial,” [線上]. Available: http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/.