簡易檢索 / 詳目顯示

研究生: 扈均
Hu, Jim
論文名稱: 32位元嵌入式語音辨識系統之改進
Improvement of 32-bit Embedded Speech Recognition Systems
指導教授: 張智星
口試委員: 王逸如
冀泰石
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2012
畢業學年度: 100
語文別: 英文
論文頁數: 42
中文關鍵詞: 嵌入式特徵擷取語音辨識
外文關鍵詞: ASRA
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文分析與改進本實驗室改自HTK (hidden Markov model toolkit) 的整數版辨識引擎所使用的MFCCs (Mel-Frequency Cepstral Coefficients)特徵擷取。

    我們提出三種改進方式:首先我們換了FFT (fast Fourier transform) 演算法,再來FFT所產生的功率頻譜和Mel-filter bank我們改採用對數值的表示方式,最後我們用兩倍長度的整數以增加精準度來改善乘法運算方式。

    實驗結果顯示以上所提出的方法對於整數運算的精準度還有辨識率相較於原本的整數系統都有改善(2~3%)。而進一步測試Viterbi階段時發生的溢位 還有 背景噪音 的影響後,我們發現這些對於辨識率並無直接相關性。


    This thesis analyzes and improves on the accuracy of the MFCCs (Mel-Frequency Cepstral Coefficients) feature extraction currently used in our lab’s fixed-point ASR (automatic speech recognition) system based on HTK (hidden Markov model toolkit).

    We propose three methods for improvement: first by changing the FFT (fast Fourier transform) algorithm, then by using a logarithmic representation for the power spectrum after FFT and the Mel-filter bank, and lastly we improve the method for multiplication by using double-length integers to achieve higher precision.

    Experimental results shows that each of the above methods yields an improvement in both fixed-point computation precision and recognition rates (by 2~3%) over the original fixed-point system. Further experiments on the effects of overflow at the Viterbi stage and background noise show no correlation of these effects with recognition rates.

    CHAPTER 1 INTRODUCTION 1 1.1 OBJECTIVE OF RESEARCH 1 1.2 BACKGROUND 2 1.3 METHODOLOGY 2 1.4 CHAPTER SUMMARY 3 CHAPTER 2 RELATED THEORIES 4 2.1 SPEECH RECOGNITION FLOW 4 2.2 MFCC 5 2.2.1 Pre-Emphasis 6 2.2.2 Hamming Window 7 2.2.3 Fast Fourier Transform (FFT) 7 2.2.4 Triangular Band-Pass Filter (TBF) 7 2.2.5 Discrete Cosine Transform (DCT) 8 2.2.6 Weighted Cepstrum 9 2.2.7 Log Energy 9 2.2.8 Delta Cepstrum Coefficients 9 2.3 IMPLEMENTATION OF FOURIER TRANSFORM 10 2.3.1 Mathematical Formulas 10 2.3.1.1 Discrete Fourier Transform 11 2.3.1.2 Fast Fourier Transform 11 2.3.1.3 Radix-2 Cooley-Tukey FFT Algorithm 11 2.3.2 High Performance Computational Algorithms 12 2.3.2.1 Real-Valued Sequence with Complex-Valued FFT 13 2.3.2.2 Real-Valued Sequence with Real-Valued FFT 14 CHAPTER 3 IMPLEMENTATIONS 15 3.1 ASR/HTK FLOATING-POINT FEATURE EXTRACTION 15 3.2 ASR FIXED-POINT FEATURE EXTRACTION 15 3.2.1 Integer Values 16 3.2.2 Multiplication Operations 16 3.2.3 Binary Search Log Tables 17 3.2.4 Bugs and Redundancies 17 3.2.4.1 Bugs 17 3.2.4.2 Redundant Log Tables 18 CHAPTER 4 PROPOSED IMPROVEMENTS 19 4.1 USE OF REAL-VALUED FFT 19 4.2 DOUBLE LENGTH MULTIPLY-THEN-SHIFT 19 4.3 LOGARITHMIC POWER SPECTRUM 20 CHAPTER 5 EXPERIMENTS 22 5.1 CORPUS 22 5.2 REAL-VALUED FFT EXPERIMENTATION 23 5.2.1 FFT Multiplication Count Comparison 23 5.2.2 FFT Spectrum Comparison 24 5.2.3 Large Frame Set Comparison 27 5.3 RECOGNITION RATES COMPARISON 27 5.3.1 Recognition Rates after applying Real-Valued FFT 28 5.3.2 Recognition Rates after applying Double Length Multiply-Then-Shift 29 5.3.3 Recognition Rates after applying Logarithmic Power Spectrum 30 5.3.4 Average Recognition Rate Comparisons of All Types of Feature Extraction Systems 31 5.3.4.1 Average Recognition Rate Comparison of TangPoem 10 Years Corpus 32 5.3.4.2 Average Recognition Rate Comparison of iPhone_TieYin Corpus 33 5.4 ERROR ANALYSIS 34 5.4.1 Effect of Background Noise Level 35 5.4.2 Effect of Overflow count 36 5.5 PROCESSING TIME ON EMBEDDED SYSTEMS 37 5.5.1 Test System Specifications 38 5.5.2 Time Consumption Experiment 38 CHAPTER 6 CONCLUSION AND FUTURE WORK 39 6.1 CONCLUSION 39 6.2 FUTURE WORK 40 REFERENCES 42 APPENDIX A: DFT TO RADIX-2 FFT FORMULA DERIVATION A APPENDIX B: EXCERPT FROM TANGPOEM B APPENDIX C: EXCERPT FROM TIEYIN C APPENDIX D: EXCERPT FROM IPHONE2012 D

    [1] H. V. Sorensen, J. L. Douglas, M. T. Heideman and S. C. Burrus, "Real-valued Fast Fourier Transform Algorithms," IEEE Transactions on Acoustics, Speech and Signal Processing, Vols. ASSP-35, no. 6, p. 849~863, 6 1987.
    [2] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comp., vol. 19, pp. 297-301, 1965.
    [3] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, "Fast Fourier Transform (FFT)," in Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992, pp. 504-510.
    [4] J. W. Cooley, P. W. Lewis and P. D. Welch, "The fast Fourier transform algorithm: Programming considerations in the calculation of sine, cosine and Laplace transforms," Journal of Sound and Vibration, vol. 12, no. 3, pp. 315-337, 22 8 1969.
    [5] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, "FFT of Real Functions, Sine and Cosine," in Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992, pp. 504-515.
    [6] N. M. Brenner, "Fast Fourier Transform of Externally Stored Data," IEEE Transactions on Audio and Electroacoustics, pp. 128-132, 6 1969.
    [7] P.-C. Hsueh and J.-S. R. Jang, "Speech Recognition on 32-bit Fixed-point Processors:Implementation & Discussions," NTHU Master Thesis, July 2004.
    [8] C.-L. H. Su and J.-S. R. Jang, "Speech Recognition on 32-bit Fixed-point Processors:Implementation & Discussions," NTHU Master Thesis, July 2005.
    [9] Y.-H. Chen and J.-S. R. Jang, "Implementation and Improvement of Integer-type FFT for Speech Recognition," NTHU Master Thesis, July 2006.
    [10] Y.-C. Chou and J.-S. R. Jang, "Implementation and Improvement of Integer-type FFT for Speech Recognition," NTHU Master Thesis, July 2007.
    [11] C.-J. Huang and J.-S. R. Jang, "On the Improvement of Embedded Speech Recognition," NTHU Master Thesis, June 2009.
    [12] D. Huggins-daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar and A. I. Rudnicky, "PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices," Proceedings of ICASSP, 2006.
    [13] Cambridge University Engineering Department, "HTK: Hidden markov model Tool Kit," [Online]. Available: http://htk.eng.cam.ac.uk/.
    [14] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 1992.
    [15] R. Jang, “Audio Speech Recognition On-line Tutorial,” [線上]. Available: http://neural.cs.nthu.edu.tw/jang/books/audiosignalprocessing/.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE