簡易檢索 / 詳目顯示

研究生: 林玄松
Lin, Shiuan Sung
論文名稱: Viterbi搜尋的最佳化以及多語系辨識
Viterbi Beam Search Optimization and Multilingual Speech Recognition
指導教授: 張智星
Jyh-Shing Roger Jang
呂仁園
Ren-Yuan Ly
江永進
Yuang-Chin Chiang
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊工程學系
Computer Science
論文出版年: 2002
畢業學年度: 90
語文別: 英文
論文頁數: 27
中文關鍵詞: Viterbi搜尋多語系最佳化
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 大部份成功開發的語音辨識系統都是奠基於計算量相當大的隱藏式馬可夫模型(Hidden Markov Modesl).這份論文的第一部份便是在探討Viterbi搜尋在隱藏式馬可夫模型如何最佳化的問題.起初先探討辨識過程當中有哪些地方是計算上的瓶頸,我們使用什麼樣的方法去解決,根據大量資料實驗的結果,這份論文提出了一個方法,這個方法所產生出來的曲線可以在辨識率下降最少的情況下,讓辨識時間降到近乎最佳化的程度.並且探討了曲線變化過程當中辨識率和辨識時間的關係.以唐詩三百首錄音當測試資料的辨識實驗也證明了所提方法的可行性.
    而在論文的第二部份,我們將論文第一部份所提出來的最佳化的方法應用到多語系(Multilingual)的辨識.這套多語系辨識系統足以處理國語,台語,英語以及這三種語言組合而成的句子.針對不同的語言,都有相對應的聲學模型(Acoustic Model).因此提升不少系統發展的彈性.也因為將論文第一部份所提的搜尋最佳化方法整合進來,所以即使在多語系辨識系統中有較多的聲學模型和較大的Tree Lexicon,整體的辨識率和辨識時間仍然維持相當的水準.而在論文當中我們也提到實做多語系辨識所碰到的問題:盡量讓聲學模型變少,但卻又能有足夠的資料代表性.面對有限的語料時如何將類似的聲學模型合併,我們也會有所說明.經實驗證明,這個方法在固定範圍詞彙的語音辨識也具有相當的可行性.


    Most successful speech recognition systems are based on Hidden Markov Models (HMM), which rely on computation-intensive Viterbi search during recognition. The first part of this thesis focuses on the optimization of Vierbi beam search in HMM decoding for isolated-word speech recognition. The proposed data-driven method can effectively identify a near-optimal beam search ranking curve that can reduce the computation time to an acceptable amount while minimizing the reduction in recognition rate based on a set of sample data. Experimental results based on the most famous 300 poems in Tang Dynasty of China demonstrate the feasibility of the proposed approach.
    In the second part of this thesis, we applied the proposed approach to a multilingual speech recognition system that can deal with Mandarin Chinese, Taiwanese, English, and their combinations. The system employs different acoustic models for different languages, and hence possesses a high degree of flexibility and modularity. We also described how to treat similar phone models as an equivalence class in order to reduce the total number of phone models for a limited speech corpus. Experimental results demonstrated its feasibility for automatic speech recognition for fixed-domain vocabularies.

    Contents Part1: Optimization of Viterbi Beam Search……………….1 1.Introduction………………………………………………….2 2. Viterbi Beam Search in HMM Decoding………………………3 3. Proposed Approach to BSRC Optimization……………………4 4. Experimental Results……………………………………………7 5. Conclusion……………………………………………………….14 Part2: Multilingual Speech Recognition………………………….15 1. Introduction…………………………………………………16 2. Acoustic Model Training.………………………………………17 3. Tree Lexicon…………………………………………………19 4. Experimental Results and Discussions…………………21 5. Conclusions …………………………………………………….26 Reference……………………………………………………………27

    Lowerre B. (1976) The HARPY speech recognition system. PhD thesis, Dept. of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, USA.
    Huang X., Acero A., and Hon H.-W. (2001) Chapter 12 of Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River, New Jersey.
    Rabiner L. and Juang B.-W. (1993) Fundamentals of Speech Recognition. Prentice Hall PTR, Upper Saddle River, New Jersey.
    HTK (2002) Hidden Markov Model Toolkit V3.1. Speech Vision and Robotics Group of the Cambridge University Engineering Department. (http://htk.eng.cam.ac.uk/)
    Jang J.-S., Sun C.-T. and Mizutani E. (1997) Neural-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence Prentice Hall PTR, Upper Saddle River, New Jersey.
    TIMIT Acoustic-Phonetic Continuous Speech Corpus (http://www.ldc.upenn.edu/Catalog/LDC93S1.html)

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE