研究生: |
陳奕宏 Yi-Hung Edward Chen |
---|---|
論文名稱: |
32位元處理器之定點數MFCC演算法的改進與探討 Improvement and Discussion of MFCC Algorithm on 32-bit Fixed-point Processors |
指導教授: |
張智星
Jyh-Shing Roger Jang |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 38 |
中文關鍵詞: | 梅爾倒頻譜 、離散餘弦轉換 、快速傅立葉轉換 、對數表 、開方根表 、調整參數 |
外文關鍵詞: | MFCC, DCT, FFT, Log table, Square Root Table, Scale Up/Down Parameter |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
由於科技的蓬勃發展下,使得人類使用手持裝置的比例有逐年往上昇高的趨勢,像是最近很風行的蘋果電腦推出之 IPOD mp3 隨聲聽,HTC 在Microsoft大力加持下,大賣的Smart Phone手機,以及擁有更高頻寬及影音享受的3G手機的問世,再再都顯示出手持嵌入式系統將會是近幾十年來最當紅的炸子機,除了硬體的日趨縮小外,軟體在其上的運用的需求也大量的增加許多,像是及時影像傳輸,GPS導航等等軟體應用也紛紛出籠,而語音辨識的功能也是這諸多軟體方面應用之一,試著想像對著你的手持mp3裝置,用說的來找尋你所要聽的歌曲或是對著你的手機說 “最近的電影院” ,然後你的手持裝置就會播放你所要聽的歌曲或是顯示出距離你最近之電影院的地圖並告知你該如走哪條路到那邊。很不幸的這些美妙的應用目前還不能實現,最大的因素還是在於手持嵌入式系統的運算能力過於薄弱無法再有效的時間內完成所需的語音辨識應用。
最近手持裝置產品在市場的推陳出新及各家廠商競爭之下,慢慢的配備了較佳運算能力的中央處理器以及較大的儲存空間,以提供消費者更多的應用上的需求。雖然在配備上有所提升,但是所有的機器還是沒有搭載浮點運算器,導致我們必須要採取整數型態的資料來替代語音辨識中慣用的浮點資料型態。本論文將會嘗試著建立一個自動化的系統,讓從擷取聲音的特徵到建立整個整數型態的聲學模型及最後的ASR(語音辨識系統),均可以依據不同的語料,提供較正確合理的轉換參數,讓整個ASR系統能在嵌入式手持裝置上運作得宜。
In this thesis, we investigate the possibility of porting the computation of floating-point MFCCs to fixed-point ones. In particular, we focus on the platform of 32-bit fixed-point processors. We have closely checked the scaling factors during each stage of the computation of MFCC by using a data-driven approach. These scaling factors are carefully chosen such that the highest precision is achieved with low probabilities of overflow. Moreover, we have proposed a binary-search-based table lookup such that the required table size is reduced. In summary, the proposed methodology can greatly reduce the memory requirement without degrading recognition rates.
[1] Shiuan-Sung Lin, Jyh-Shing Roger Jang, “Optimization of Viterbi Beam Search in Speech Recognition and Multilingual Speech Recognition”, NTHU Master Thesis, July 2002
[2] Po-Chien Hsueh, Jyh-Shing Roger Jang, “Embedded Speech Recognition”, NTHU Master Thesis, July 2004
[3] Chin-Lung Hart Su, Jyh-Shing Roger Jang “Speech Recognition on 32-bit Fixed-point Processors: Implementation & Discussions”, NTHU Master Thesis, July 2005
[4] Jia-Ching Wang, Jhing-Fa Wang, Yu-Sheng Weng, “Chipdesign of MFCC Extraction for Speech Recognition”, INTEGRATION, the VLSI journal 32 (2002) 111–131
[5] Soontorn Oraintara, Ying-Jui Chen, Trunong Q. Nguyen, “Integer Fast Fourier Transformation”, IEEE Transactions on Signal Processing, Vol. 50, NO.3, March 2002
[6] Laura Miyakawa and Lee Hetherington, “A Quantized Fixed-Point Front-End for Distributed Speech Recognition”, MIT Laboratory for Computer Science Research Abstracts, Mar. 2003
[7] Juhani Saastamoninen, Evgeny Karpov, Ville Hautamaki, Pasi Franti, “Automatic Speaker Recognition for Series 60 Mobile Devices”, University of Joensuu, Dept, of Computer Science, 3rd August 2004
[8] Bojana Gajic, Kuldip K. Paliwal “Robust Parameters for Speech Recognition Based on Subband Spectral Centroid Histograms”, Eurospeech 2001
[9] Hidden Markov Model Toolkit V3.2 Speech Vision and Robotics Group of the Cambridge University Engineering Department, 2002.(http://htk.eng.cam.ac.uk/)
[10] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, “Spoken Language Processing”, Prentice Hall PTR, 2001
[11] Jyh-Shing Roger Jang, 線上中文教材:音訊處理與辨識
Url:http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/