針對口頭演講自動推薦演講停頓點｜國立清華大學博碩士論文庫

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃思茹 Huang, Szu-Ju
論文名稱：	針對口頭演講自動推薦演講停頓點 Automatic Determination of Speech Pause in Oral Presentation
指導教授：	張智星 Jang, Jyh-Shing 張俊盛 Chang, Jason S.
口試委員:	徐嘉連 Jia-Lien, Hsu 呂仁園 Ren-yuan, Lyu
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 資訊系統與應用研究所 Institute of Information Systems and Applications
論文出版年：	2014
畢業學年度：	102
語文別：	英文
論文頁數：	49
中文關鍵詞：	停頓點推薦
外文關鍵詞：	Pause suggestion
相關次數：	點閱：36 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

我們時常將標點符號視為語句上可呼吸停頓的位址，然而，並不是所有停頓都發生在標點符號的位址，也不是所有標點符號都會停頓。本篇論文中，我們介紹一個可以針對英文語言學習者輸入的演講文稿自動推薦適當的停頓點的系統。在使用的方法中，我們必須將演講文稿裡面的標點符號去除，並且產生適當的特徵。其中包括自動產生標記停頓點的訓練資料、自動針對訓練資料產生文字上的特徵值，並且自動訓練分類器協助判斷停頓點。最終的評估顯示我們提出的方法在針對標記停頓點上有相當不錯表現。

Punctuation marks in text usually tend to be taken as breath pauses. However, not all pauses occur at punctuation marks, and, in fact, not all punctuations are designed to be pauses. In this paper, we introduce a method for suggesting speech pauses for a given script submitted by English language learners. In our approach, a text is transformed into a non-punctuated text with features aimed at suggesting appropriate pauses in speech. The method involves automatically generating training data annotated with pauses, automatically transform the training data into linguistic features, and automatically training a discriminative classifier. Evaluation shows that the proposed method achieves a satisfactory performance in suggesting pauses in given speech.

Abstract ii
Acknowledgments iv
Contents vi
List of Figures viii
List of Tables x
Introduction 1
Related Work 5
Method 9
1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Learning to Suggest Appropriate Pauses . . . . . . . . . . . . . . . . . . 11
2.1 Speech-Text Alignment (Forced Alignment) . . . . . . . . . . . . 11
2.2 Pause Candidate Selection . . . . . . . . . . . . . . . . . . . . . 13
2.3 Feature Generation . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Training a Machine Learning Classifier . . . . . . . . . . . . . . 19
3 Run-Time Pauses Suggesting . . . . . . . . . . . . . . . . . . . . . . . . 19
Experimental Setting 23
1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Consistency of Manually Annotated Pauses . . . . . . . . . . . . 26
3 Detail Setting of the Proposed Method . . . . . . . . . . . . . . . . . . . 27
3.1 Acoustic Models Training Setting . . . . . . . . . . . . . . . . . 27
3.2 Threshold Determination . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Feature Generation . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 Machine Learning Classifiers Compared . . . . . . . . . . . . . . . . . . 32
Evaluation 36
1 Classifiers Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2 Feature Set Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Overall Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Conclusion and Future Work 46
References 47

                                

[1] Berger, A. L., Pietra, V. J. D., & Pietra, S. A. D. (1996, March). A maximum entropy approach to natural language processing. Comput. Linguist., 22(1), 39–71. Retrieved from http://dl.acm.org/citation.cfm id=234285.234289

[2] Bosker, H. R., Pinget, A.-F., Quene, H., Sanders, T., & de Jong, N. H. (2013, April). What makes speech sound fluent? the contributions of pauses, speed and repairs. Language Testing, 30(2), 159-175.

[3] Chiang, C.-Y., Wang, Y.-R., & Chen, S.-H. (2012, March). Punctuation generation inspired linguistic features for mandarin prosodic boundary prediction. In Acoustics, speech and signal processing (icassp), 2012 ieee international conference on (p. 4597-4600). doi: 10.1109/ICASSP.2012.6288942

[4] Derwing, T. M., Rossiter, M. J., Munro, M. J., & Thomson, R. I. (2004, December). Second language fluency: Judgments on different tasks. Language Learning, 54, 655-679.

[5] Hirschberg, J., & Prieto, P. (1996). Training intonational phrasing rules automatically for english and spanish text-to-speech. Speech Communication, 18.3, 281-290.

[6] Hosom, J.-P. (2002). Automatic phoneme alignment based on acoustic-phonetic modeling. In Interspeech.

[7] Huang, J., & Zweig, G. (2002). Maximum entropy model for punctuation annotation from speech.

[8] Koehn, P., Abney, S., Hirschberg, J., & Collins, M. (2000, ). Improving intonational phrasing with syntactic information. In Acoustics, speech, and signal processing, 2000. icassp ’00. proceedings. 2000 ieee international conference on (Vol. 3, p. 1289-1290 vol.3). doi: 10.1109/ICASSP.2000.861813

[9] Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Departmental Papers (CIS).

[10] Leeser, M. J. (2004, 12). The effects of topic familiarity, mode, and pausing on second language learners’ comprehension and focus on form. Studies in Second Language Acquisition, 26, 587–615. Retrieved from http://journals.cambridge.org/ article_S0272263104040033 doi: 10.1017/S0272263104040033

[11] Lu, W., & Ng, H. T. (2010). Better punctuation prediction with dynamic conditional random fields. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 177–186). Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=1870658.1870676

[12] McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy markov models for information extraction and segmentation. In (p. 591-598).

[13] MLA Beeferman, D., Berger, A., & Lafferty, J. (1998). Cyberpunc: A lightweight punctuation annotation system for speech. In (Vol. 2).

[14] MLA Kim, J.-H., & Woodland, P. C. (2001). The use of prosody in a combined system for punctuation generation and speech recognition.

[15] Raupach, M. (1980). Temporal variables in firse and second language speech production. In Temporal variables in speech (p. 263-270).

[16] Riazantseva, A. (2001, 12). Second language proficiency and pausing a study of russian speakers of english. Studies in Second Language Acquisition, 23, 497–526. Retrieved from http://journals.cambridge.org/article_S027226310100403X

[17] Sajavaara, K. (1987). Second language speech production: Factors affecting fluency. In Psycholinguistic models of production (p. 45-65).

[18] Tavakoli, P. (2011). Pausing patterns: differences between l2 learners and native speakers. ELT Journal, 65 (1): 71-79.

[19] Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. (2005). Developing a robust part-of-speech tagger for biomedical text. In (p. 382-392).

[20] Viola, I. C., & Madureira, S. (2008). The roles of pause in speech expression.

[21] Wang, M. Q., & Hirschberg, J. (1991). Predicting intonational phrasing from text. In Proceedings of the 29th annual meeting on association for computational linguistics (pp. 285–292). Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from http://dx.doi.org/10.3115/981344.981381 doi: 10.3115/981344.981381.

[22] Wennerstorm, A. (2000). The role of intonation in second language fluency. Perspectives on fluency, 102-127.

全文公開日期本全文未授權公開 (校內網路)
全文公開日期本全文未授權公開 (校外網路)

簡易檢索 / 詳目顯示

相關論文