簡易檢索 / 詳目顯示

研究生: 彭郁雅
Peng, Yu-Ya,
論文名稱: 長音檔與文本之快速對位
Efficient Alignment between Long Utterances and Texts
指導教授: 張智星
Jang, Jyh-Shing Roger
口試委員: 張俊盛
李俊仁
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 39
中文關鍵詞: 長音檔與文字對位動態規劃
外文關鍵詞: Long utterance-text alignment, Dynamic Programming
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在長音檔與文本的對位處理方式中,HTK因處理限制與冗長的處理時間,使得對於長音檔與文本的對位功能有實行上的困難。探討前人的研究,前人多透過大詞彙語音辨識,將聲音與文字的對位問題轉換成文字與文字的對位問題,但由於大詞彙語音辨識的辨識成果有限,且辨識所需時間比HTK的處理時間更長。因此,本論文提出一種不須透過大詞彙語音辨識的處理方式─利用動態規劃的方式讓切割後的聲音片段找到時間相對應的句子,再使用HTK對短音檔與句子做強制對位,希望能降低HTK的處理時間。
    本論文由於合作計畫的關係,實驗語料採用TED網站所提供的演講影音檔,而文本則是演講內容的文字稿。實驗材料分別透過HTK、SailAlign (採用大詞彙語音辨識的對位系統)與本論文的動態規劃法做對位處理,而實驗結果發現,受到實驗語料內不可預測的聲音(例如:掌聲、笑聲等)影響,使得動態規劃法成效遜色於HTK和SailAlign,但在對位所需的時間方面,則是降低37%以上的處理時間,明顯達到研究目的之一。


    This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload.
    We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%.

    摘要 I Abstract II 謝誌 III 目錄 IV 表目次 VI 圖目次 VII 第一章 緒論 1 1.1 研究背景 1 1.2 研究目的 2 1.2.1 研究問題 2 1.3 名詞解釋 2 1.3.1 對位 2 第二章 文獻探討 3 2.1 語音與文字對位 (Speech-text Alignment) 3 2.1.1 HTK 3 2.1.2 現有的語音與文字對位方法 4 2.1.3 SailAlign 6 2.2 動態規劃 7 第三章 實驗方法 8 3.1 實驗語料 8 3.2 實驗架構 10 3.2.1 實驗一 10 3.2.2 實驗二 16 3.2.3 成效評估方式 20 第四章 實驗結果與分析 23 4.1 實驗一數據分析 23 4.2 實驗二數據分析 28 4.2.1 HTK 強制對位、SailAlign與動態規劃法成效之比較 28 4.2.2 HTK 強制對位、SailAlign與動態規劃法處理時間之比較 32 4.2.3 動態規劃法錯誤分析 33 第五章 結論與未來展望 36 5.1 結論 36 5.2 未來研究方向 36 參考文獻 38

    [1] TED [Online]. Available: http://www.ted.com/
    [2] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, “The HTK book (for HTK version 3.3),” Cambridge University Engineering Department, Tech. Rep., Jun. 2012. [Online] Available: http://htk.eng.cam.ac.uk/
    [3] P. Moreno, C. Joerg, J.-M. van Thong, and O. Glickman, “A recursive algorithm for the forced alignment of very long audio segments,” Proc. Int’l Conf. on Spoken Language Processing, 1998.
    [4] D. Caseiro, H. Meinedo, A. Serralheiro, I. Trancoso, and J. Neto, “Spoken book alignment using WFSTs”, Proc. of HLT, 2002
    [5] P. J. Moreno and C. Alberti, “A factor automaton approach for the forced alignment of long speech recordings,” Proc. IEEE Int’l Conf. Acous., Speech, and Signal Processing, 2009.
    [6] T. J. Hazen, “Automatic alignment and error correction of human generated transcripts for long speech recordings”, INTERSPEECH, 2006
    [7] N. Braunschweiler, M.J.F. Gales, and Sabine Buchholz, “Lightly supervised recognition for automatic alignment of large coherent speech recordings”, INTERSPEECH, 2010
    [8] A. Katsamanis, M. P. Black, P. G. Georgiou, L. Goldstein, and S. Narayanan, “SailAlign: Robust long speech-text alignment”, Proc. of Workshop on New Tools and Methods for Very Large Scale Research in Phonetic Sciences, Jan. 28-31, 2011
    [9] Kuang-hua Chen and Hsin-Hsi Chen, “A Part-of-Speech-Based Alignment Algorithm”, Proceedings of the 15th COLING, pp. 166-171, 1994.
    [10] Jyh-Shing Roger Jang, "Machine Learning Toolbox", available from the link at the author's homepage at " http://mirlab.org/jang/matlab/toolbox/machineLearning".
    [11] I. Dagan, K. W. Church, and W. A. Gale, “Robust Bilingual Word Alignment for Machine Aided Translation”, In Proceedings of the Workshop on Very Large Corpora, June 1993.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE