研究生: |
彭郁雅 Peng, Yu-Ya, |
---|---|
論文名稱: |
長音檔與文本之快速對位 Efficient Alignment between Long Utterances and Texts |
指導教授: |
張智星
Jang, Jyh-Shing Roger |
口試委員: |
張俊盛
李俊仁 |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications |
論文出版年: | 2012 |
畢業學年度: | 100 |
語文別: | 中文 |
論文頁數: | 39 |
中文關鍵詞: | 長音檔與文字對位 、動態規劃 |
外文關鍵詞: | Long utterance-text alignment, Dynamic Programming |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
在長音檔與文本的對位處理方式中,HTK因處理限制與冗長的處理時間,使得對於長音檔與文本的對位功能有實行上的困難。探討前人的研究,前人多透過大詞彙語音辨識,將聲音與文字的對位問題轉換成文字與文字的對位問題,但由於大詞彙語音辨識的辨識成果有限,且辨識所需時間比HTK的處理時間更長。因此,本論文提出一種不須透過大詞彙語音辨識的處理方式─利用動態規劃的方式讓切割後的聲音片段找到時間相對應的句子,再使用HTK對短音檔與句子做強制對位,希望能降低HTK的處理時間。
本論文由於合作計畫的關係,實驗語料採用TED網站所提供的演講影音檔,而文本則是演講內容的文字稿。實驗材料分別透過HTK、SailAlign (採用大詞彙語音辨識的對位系統)與本論文的動態規劃法做對位處理,而實驗結果發現,受到實驗語料內不可預測的聲音(例如:掌聲、笑聲等)影響,使得動態規劃法成效遜色於HTK和SailAlign,但在對位所需的時間方面,則是降低37%以上的處理時間,明顯達到研究目的之一。
This thesis describes our research on the efficient alignment of long utterances to texts. Currently, the most use toolkit for speech to text alignment is the Hidden Markov Model Toolkit (HTK), but HTK is not designed for relatively longer inputs and would cause stack overflows when handling such inputs. Another known system which uses large vocabulary recognition, SailAlign, can handle longer inputs, but has lower recognition rates and is relatively more time consuming. We present a method that doesn’t require large vocabulary recognition – by using dynamic programming algorithm to slice and combine each audio file and text into segments of sentences, then use force-alignment with HTK, thus reducing HTK’s overall workload.
We use speech recordings from the website TED as corpus, and the recordings’ manuscript for the text. We experiment on alignment efficiency using the said corpus with HTK, SailAlign and the proposed DP method. For unverified causes, results suggest a poorer accuracy with the DP method, but processing time is decreased over 37%.
[1] TED [Online]. Available: http://www.ted.com/
[2] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, “The HTK book (for HTK version 3.3),” Cambridge University Engineering Department, Tech. Rep., Jun. 2012. [Online] Available: http://htk.eng.cam.ac.uk/
[3] P. Moreno, C. Joerg, J.-M. van Thong, and O. Glickman, “A recursive algorithm for the forced alignment of very long audio segments,” Proc. Int’l Conf. on Spoken Language Processing, 1998.
[4] D. Caseiro, H. Meinedo, A. Serralheiro, I. Trancoso, and J. Neto, “Spoken book alignment using WFSTs”, Proc. of HLT, 2002
[5] P. J. Moreno and C. Alberti, “A factor automaton approach for the forced alignment of long speech recordings,” Proc. IEEE Int’l Conf. Acous., Speech, and Signal Processing, 2009.
[6] T. J. Hazen, “Automatic alignment and error correction of human generated transcripts for long speech recordings”, INTERSPEECH, 2006
[7] N. Braunschweiler, M.J.F. Gales, and Sabine Buchholz, “Lightly supervised recognition for automatic alignment of large coherent speech recordings”, INTERSPEECH, 2010
[8] A. Katsamanis, M. P. Black, P. G. Georgiou, L. Goldstein, and S. Narayanan, “SailAlign: Robust long speech-text alignment”, Proc. of Workshop on New Tools and Methods for Very Large Scale Research in Phonetic Sciences, Jan. 28-31, 2011
[9] Kuang-hua Chen and Hsin-Hsi Chen, “A Part-of-Speech-Based Alignment Algorithm”, Proceedings of the 15th COLING, pp. 166-171, 1994.
[10] Jyh-Shing Roger Jang, "Machine Learning Toolbox", available from the link at the author's homepage at " http://mirlab.org/jang/matlab/toolbox/machineLearning".
[11] I. Dagan, K. W. Church, and W. A. Gale, “Robust Bilingual Word Alignment for Machine Aided Translation”, In Proceedings of the Workshop on Very Large Corpora, June 1993.