簡易檢索 / 詳目顯示

研究生: 蘇金龍
Chin-Lung Hart Su
論文名稱: 32位元定點運算處理器之語音辨識:實作與探討
Speech Recognition on 32-bit Fixed-point Processors:Implementation & Discussions
指導教授: 張智星
Jyh-Shing Roger Jang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2005
畢業學年度: 93
語文別: 英文
論文頁數: 28
中文關鍵詞: 語音辨識定點運算嵌入式系統加速參數預估梅爾倒頻譜參數隱藏式馬可夫模型記號傳遞演算法
外文關鍵詞: speech recognition, fixed-point computation, embedded system, speedup, parameter estimation, MFCC, HMM, token-passing algorithm
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著科技的進步,越來越多的電子產品都有了語音辨識的力;舉例來說:手機上的語音撥號、PDA上的語音命令以及銀行電話總機的身分證語音辨識…等。但是由於低階機器上的運算能力有限以及缺少了浮點數的運算,因此發展與應用都不夠普及。
    本研究中針對這些限制提出了一連串的解決方式,並實作出32位元嵌入式系統上的語音辨識核心。語音辨識的過程可以大致上分成三個部分:擷取語音特徵、建立聲學模型和尋找最佳路徑;這三部分都有著大量的浮點數運算,因此我們主要使用的方式是將系統中的大部分運算改為定點數,並利用查表法來加速部分步驟。本系統在辨識率降低大概3%的情況下,分別使特徵擷取的過程加速了約6.7倍,尋找最佳路徑的過程加速了約4.1倍。
    此外,還提出調整語音辨識過程中所用到的參數時所須注意的事項;例如:調整Pruning bean-width必須兼顧辨識率及辨識速度,並從中取得平衡;調整Scaling factor必須注意運算溢位。為了方便偵測溢位,我們特別設計出一個新的C++ class以達到目的。希望未來設計完整的調整參數方式時可以將此觀點考量進去。


    Due to the advancement of modern technologies, more and more digital devices are capable of recognizing speech for various applications, such as voice dialing on cell phones, voice commands on PDAs and voice-based telephone operators. However, the lack of floating-point arithmetic and the limited computing power of these mobile devices constrain the domain of speech-based applications.
    This study proposes some methods to overcome these constraints. We have also implemented a recognition system on a 32-bit processor to show the feasibility of the proposed approach. In general, the process of speech recognition could be divided into three steps, including feature extraction, acoustic model construction (training), and Viterbi search for the most-likely path (recognition). Since all of these time-consuming steps are floating-point operations, one straightforward way to reduce computation time is to use fixed-point operations instead. Moreover, we also built look-up tables to speed up the evaluation of some mathematical functions. The feature extraction is about 6.7 times faster and Viterbi decoding is about 4.1 times faster than their floating-point counterparts, while the recognition rate only drops about 3%.
    We have also discussed the effects of several recognition parameters on the recognition results. For example, we have tried several values of the pruning beam-width in order to achieve a balance between the recognition rate and the computation time. We have also explored the scaling factor at various stages, which affects the occurrence of overflow. For better debugging, we have designed a new C++ class that can be used to detect overflows and handle the situation correctly. We sincerely hope that these proposed methods can pave a road to a better and more convenient world of speech-based applications.

    Abstract Index List of figures List of tables Acknowledgement 1. Introduction 1.1 System overview 1.2 Thesis organization 2. Previous work 3. Proposed method 3.1 Fundamental algorithm 3.1.1 Feature extraction-MFCC 3.1.2 Building acoustic model-HMM 3.1.3 Searching the most-likely path-Viterbi algorithm 3.2 Implementation on embedded system 3.2.1 MFCC 3.2.2 HMM 3.2.3 Token-passing algorithm 4. Experimental results 4.1 Accuracy 4.2 Recognition rate 4.3 Speed 4.4 Parameter estimation 5. Conclusion and future work References

    [1] Brian Delaney, Nikil Jayant, Mat Hans, Tajana Simunic, and Andrea Acquaviva, “A Low-Power, Fixed-Point, Front-End Feature Extraction for A Distributed Speech Recognition System”, HP Laboratories Technical Report, Oct. 9, 2001.
    [2] Laura Miyakawa and Lee Hetherington, “A Quantized Fixed-Point Front-End for Distributed Speech Recognition”, MIT Laboratory for Computer Science Research Abstracts, Mar. 2003
    [3] Christophe L´evy, Georges Linar`es1, Pascal Nocera and Jean-Franc¸ois Bonastre, “Reducing Computational and Memory Cost for Cellular Phone Embedded Speech Recognition System”, IEEE ICASSP ITT-P1.6, May 2004
    [4] Jia-Ching Wang, Jhing-Fa Wang, Yu-Sheng Weng, “Chipdesign of MFCC Extraction for Speech Recognition”, INTEGRATION, the VLSI journal 32 (2002) 111–131
    [5] Yuet-Ming Lam, Man-Wai Mak, Philip Heng-Wai Leong, “Fixed-Point Implementations of Speech Recognition Systems”, GSPx Conference, Apr. 3, 2003
    [6] Sukun Kim, Sergiu Nedevschi and Rabin K. Patra, “Hardware Speech Recognition in Low Cost, Low Power Devices”, Computer Science Division (University of California, Berkeley) CS252 Class Project, Spring 2003
    [7] Po-Chien Hsueh, Jyh-Shing Roger Jang, “Embedded Speech Recognition”, Master Thesis, July 2004
    [8] Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, Spoken Language Processing, Prentice-Hall, Inc., 2001
    [9] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev and Phil Woodland, The HTK Book (for HTK Version 3.2.1), Cambridge University Engineering Department, Dec. 2002
    [10] S.J. Young, N.H. Russell and J.H.S. Thornton, “Token Passing: A Simple Conceptual Model for Connected Speech Recognition Systems”, CUED Technical Report F_INFENG/TR38, Cambridge University, 1989.
    [11] The FFT Demystified. V2.1, http://www.eptools.com/tn/T0001/INDEX.HTM

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE