32位元嵌入式語音辨識系統之改進｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	扈均 Hu, Jim
論文名稱：	32位元嵌入式語音辨識系統之改進 Improvement of 32-bit Embedded Speech Recognition Systems
指導教授：	張智星
口試委員:	王逸如冀泰石
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊工程學系 Computer Science
論文出版年：	2012
畢業學年度：	100
語文別：	英文
論文頁數：	42
中文關鍵詞：	嵌入式、特徵擷取、語音辨識
外文關鍵詞：	ASRA
相關次數：	點閱：1 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文分析與改進本實驗室改自HTK (hidden Markov model toolkit) 的整數版辨識引擎所使用的MFCCs (Mel-Frequency Cepstral Coefficients)特徵擷取。

我們提出三種改進方式:首先我們換了FFT (fast Fourier transform) 演算法，再來FFT所產生的功率頻譜和Mel-filter bank我們改採用對數值的表示方式，最後我們用兩倍長度的整數以增加精準度來改善乘法運算方式。

實驗結果顯示以上所提出的方法對於整數運算的精準度還有辨識率相較於原本的整數系統都有改善(2~3%)。而進一步測試Viterbi階段時發生的溢位還有背景噪音的影響後，我們發現這些對於辨識率並無直接相關性。

This thesis analyzes and improves on the accuracy of the MFCCs (Mel-Frequency Cepstral Coefficients) feature extraction currently used in our lab’s fixed-point ASR (automatic speech recognition) system based on HTK (hidden Markov model toolkit).

We propose three methods for improvement: first by changing the FFT (fast Fourier transform) algorithm, then by using a logarithmic representation for the power spectrum after FFT and the Mel-filter bank, and lastly we improve the method for multiplication by using double-length integers to achieve higher precision.

Experimental results shows that each of the above methods yields an improvement in both fixed-point computation precision and recognition rates (by 2~3%) over the original fixed-point system. Further experiments on the effects of overflow at the Viterbi stage and background noise show no correlation of these effects with recognition rates.

CHAPTER 1    INTRODUCTION    1
1 OBJECTIVE OF RESEARCH    1
2 BACKGROUND    2
3 METHODOLOGY    2
4 CHAPTER SUMMARY    3
CHAPTER 2    RELATED THEORIES    4
1 SPEECH RECOGNITION FLOW    4
2 MFCC    5
2.1    Pre-Emphasis    6
2.2    Hamming Window    7
2.3    Fast Fourier Transform (FFT)    7
2.4    Triangular Band-Pass Filter (TBF)    7
2.5    Discrete Cosine Transform (DCT)    8
2.6    Weighted Cepstrum    9
2.7    Log Energy    9
2.8    Delta Cepstrum Coefficients    9
3 IMPLEMENTATION OF FOURIER TRANSFORM    10
3.1    Mathematical Formulas    10
3.1.1    Discrete Fourier Transform    11
3.1.2    Fast Fourier Transform    11
3.1.3    Radix-2 Cooley-Tukey FFT Algorithm    11
3.2    High Performance Computational Algorithms    12
3.2.1    Real-Valued Sequence with Complex-Valued FFT    13
3.2.2    Real-Valued Sequence with Real-Valued FFT    14
CHAPTER 3    IMPLEMENTATIONS    15
1 ASR/HTK FLOATING-POINT FEATURE EXTRACTION    15
2 ASR FIXED-POINT FEATURE EXTRACTION    15
2.1    Integer Values    16
2.2    Multiplication Operations    16
2.3    Binary Search Log Tables    17
2.4    Bugs and Redundancies    17
2.4.1    Bugs    17
2.4.2    Redundant Log Tables    18
CHAPTER 4    PROPOSED IMPROVEMENTS    19
1 USE OF REAL-VALUED FFT    19
2 DOUBLE LENGTH MULTIPLY-THEN-SHIFT    19
3 LOGARITHMIC POWER SPECTRUM    20
CHAPTER 5    EXPERIMENTS    22
1 CORPUS    22
2 REAL-VALUED FFT EXPERIMENTATION    23
2.1    FFT Multiplication Count Comparison    23
2.2    FFT Spectrum Comparison    24
2.3    Large Frame Set Comparison    27
3 RECOGNITION RATES COMPARISON    27
3.1    Recognition Rates after applying Real-Valued FFT    28
3.2    Recognition Rates after applying Double Length Multiply-Then-Shift    29
3.3    Recognition Rates after applying Logarithmic Power Spectrum    30
3.4    Average Recognition Rate Comparisons of All Types of Feature Extraction Systems    31
3.4.1    Average Recognition Rate Comparison of TangPoem 10 Years Corpus    32
3.4.2    Average Recognition Rate Comparison of iPhone_TieYin Corpus    33
4 ERROR ANALYSIS    34
4.1    Effect of Background Noise Level    35
4.2    Effect of Overflow count    36
5 PROCESSING TIME ON EMBEDDED SYSTEMS    37
5.1    Test System Specifications    38
5.2    Time Consumption Experiment    38
CHAPTER 6    CONCLUSION AND FUTURE WORK    39
1 CONCLUSION    39
2 FUTURE WORK    40
REFERENCES    42
APPENDIX A: DFT TO RADIX-2 FFT FORMULA DERIVATION    A
APPENDIX B: EXCERPT FROM TANGPOEM    B
APPENDIX C: EXCERPT FROM TIEYIN    C
APPENDIX D: EXCERPT FROM IPHONE2012    D
                                

[1] H. V. Sorensen, J. L. Douglas, M. T. Heideman and S. C. Burrus, "Real-valued Fast Fourier Transform Algorithms," IEEE Transactions on Acoustics, Speech and Signal Processing, Vols. ASSP-35, no. 6, p. 849~863, 6 1987.
[2] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comp., vol. 19, pp. 297-301, 1965.
[3] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, "Fast Fourier Transform (FFT)," in Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992, pp. 504-510.
[4] J. W. Cooley, P. W. Lewis and P. D. Welch, "The fast Fourier transform algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms," Journal of Sound and Vibration, vol. 12, no. 3, pp. 315-337, 22 8 1969.
[5] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, "FFT of Real Functions, Sine and Cosine," in Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992, pp. 504-515.
[6] N. M. Brenner, "Fast Fourier Transform of Externally Stored Data," IEEE Transactions on Audio and Electroacoustics, pp. 128-132, 6 1969.
[7] P.-C. Hsueh and J.-S. R. Jang, "Speech Recognition on 32-bit Fixed-point Processors：Implementation & Discussions," NTHU Master Thesis, July 2004.
[8] C.-L. H. Su and J.-S. R. Jang, "Speech Recognition on 32-bit Fixed-point Processors：Implementation & Discussions," NTHU Master Thesis, July 2005.
[9] Y.-H. Chen and J.-S. R. Jang, "Implementation and Improvement of Integer-type FFT for Speech Recognition," NTHU Master Thesis, July 2006.
[10] Y.-C. Chou and J.-S. R. Jang, "Implementation and Improvement of Integer-type FFT for Speech Recognition," NTHU Master Thesis, July 2007.
[11] C.-J. Huang and J.-S. R. Jang, "On the Improvement of Embedded Speech Recognition," NTHU Master Thesis, June 2009.
[12] D. Huggins-daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar and A. I. Rudnicky, "PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices," Proceedings of ICASSP, 2006.
[13] Cambridge University Engineering Department, "HTK: Hidden markov model Tool Kit," [Online]. Available: http://htk.eng.cam.ac.uk/.
[14] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992.
[15] R. Jang, “Audio Speech Recognition On-line Tutorial,” [線上]. Available: http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文