研究生: |
宋光婷 Sung, Kuang-Ting |
---|---|
論文名稱: |
國語連續語音訊號中阻塞音偵測與辨識之研究 A Study on Detection and Recognition of Obstruents in Continuous Mandarin Speech |
指導教授: |
王小川
Wang, Hsiao-Chuan |
口試委員: | |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2006 |
畢業學年度: | 94 |
語文別: | 中文 |
論文頁數: | 90 |
中文關鍵詞: | 基於知識的聲學語音學特徵 、史奈夫聽覺模型 、阻塞音偵測 、阻塞音分類 、阻塞音辨識 |
外文關鍵詞: | knowledge-based acoustic-phonetic feature, Seneff Auditory Model, Obstruent Detection, Obstruent Classification, Obstruent Recognition |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
新一代語音辨認技術將是以知識為基礎的方式加上資料驅動模式,建構含有語音與語言學知識的語音辨識系統,並以多組特徵參數取代僅以一組特徵參數作為語音辨認的聲學特徵參數。除了傳統的聲學頻譜參數之外,還可以擷取多組語音特徵參數,用以偵測發音方法、發音部位、口腔形狀等語音事件。
基於這個概念,本論文針對國語連續語音,做靜音(silence)、響音(sonorant)、與阻塞音(obstruent)之偵測,然後再進行阻塞音之分類與辨識之研究。國語阻塞音共包含有18個音素,可以依照發音部位的不同,將之區分為塞音(stop)、擦音(fricative)以及塞擦音(affricate)三種類型;塞音與塞擦音依照發音時是有有吐氣行為,可區分為不送氣音與送氣音兩種類型,個別包含了三種不同發音部位之六種音素;摩擦音依照發音時聲帶是否振動,可區分為不帶聲音與帶聲音兩種類形,包含了五種不同發音部位之六種音素。本論文探討可以分辨發音方法與發音部位的特徵參數,以標註過的訓練語料來統計各特徵參數之分布,並決定其作為語音事件偵測與辨識之門檻值。
本論文之語音特徵參數是從聽覺模型(auditory model)推導出來,聽覺模型是一種模擬人耳聽覺神經處理聲音訊號的過程,本論文採用Seneff聽覺模型(Seneff auditory model)做為前端處理器,取其兩種輸出,即包絡頻譜(envelope spectrum)與同步頻譜(synchrony spectrum),利用此兩組輸出來計算特徵參數。
A study on acoustic-phonetic features for the obstruent detection and classification based on the knowledge of Mandarin speech is proposed. Seneff auditory model is used as the front-end processor for extracting acoustic-phonetic features. These features are rich in their information content in a hierarchical decision process to detect and classify the Mandarin obstruents. The preliminary experiments showed that accuracy of obstruent detection is about 84%. An algorithm based on the information of feature distribution is applied to further classify the obstruents into stops, fricatives, and affricates. The average accuracy is about 80%. The proposed approach based on the feature distribution is simple and effective. It could be a very promising method for searching acoustic-phonetic features for the phone recognition in continuous speech recognition.
[1] Ahmed M. Abdelatty Ali,“Segmentation and Categorization
of Phonemes in Continuous Speech”, Technical Report, TRCST25JUL98, Center for Sensor Technologies, University of Pennsylvania, 1998.
[2] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul Mueller,
“An Acoustic-Phonetic Feature-Based System for the Automatic Recognition of Fricative Consonants”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), Volume 2, 12-15, pp.961- 964, May 1998.
[3] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul Mueller,
“Robust Auditory-Based Speech Processing Using the Average
Localized Synchrony Detection”, IEEE Transaction on
Signal and Audio Processing, Volume 10, Issue 5, pp.
279-292 , July 2001.
[4] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul Mueller,“ Acoustic-Phonetic Features for the Automatic
Classification of Stop Consonants“, IEEE Transaction on
Signal and Audio Processing, Volume 9, Issue 8, pp.
833–841, Nov. 2001.
[5] Aversano, G.; Esposito, A.; Marinaro, M. ,“A New
Text-Independent Method for Phoneme Segmentation”, IEEE
Transaction on Circuits and Systems, Volume 2, 14-17 pp.
516-519 , Aug.2001.
[6] Behrens, S. and Blumstein, S. E.,“on the Role of the
Amplitude of the Fricative Noise in the Perception of Place
of Articulation in Voiceless Fricative Consonants”, J.
Acoust. Soc. Am., 84, pp. 861-867,1988.
[7] Chin-Hui Lee,“From Knowledge-Ignorant to Knowledge-Rich Modeling : A New Speech Research Paradigm for Next Generation Automatic Speech Recognition”,ICSLP2004, Keynote speech.
[8] DaiJayram, A.K.V.; Ramasubramanian, V.; Sreenivas,
T.V.,“Robust Parameters for Automatic Segmentation of
Speech”, IEEE Transaction on Acoustics, Speech, and
Signal Processing(ICASSP), Volume 1, pp. 1513-1516, 2002.
[9] Das, S.; Hansen, J.H.L,“Detection of Voice Onset Time
(VOT)for Unvoived Stops (/p/, /t/, /k/) Using the Teager
Energy Operator (TEO) for Automatic Detection of Accented
English“, IEEE Transaction on Signal Processing(NORSIG
2004), pp.344-347, 2004.
[10] Grayden, D.B., Scordilis, M.S.,“Phonemic Segmentation of
Fluent Speech”, IEEE Transaction on Acoustics, Speech,
and Signal Processing(ICASSP), Volume 1, 19-22, pp.
73-76,Apr.1994.
[11] Guoning Hu, DeLiang Wang,“Separation of Stop Consonants “, IEEE Transaction on Acoustics, Speech, and Signal
Processing(ICASSP), Volume 2, 6-10, pp.749-752, Apr. 2003.
[12] Hu Hongtao, Du Limin, “Temporal Pre-Classification for
Chinese Voiceless Consonant Speech”, IEEE Transaction on Signal Processing, Volume 1, 14-18, pp.781-784, Oct.1996.
[13] James M. Kates, ”A Time-Domain Digital Cochlear Model,”
IEEE Transaction on Signal Processing ,Vol. 39, NO.12,
December 1991.
[14] Kimberley, B.; Searle, C.,“Automatic Discrimination of Fricative Consonants Based on Human Audition”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), pp.89-92, Apr.1979.
[15] Ray D. Kent and Charles Read, “The Acoustic Analysis of Speech”, Published by Singular Publishing Group, Inc., First Edition, 1992.
[16] Reynolds, D.A.; Rose, R.C., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), Volume 3, Issue 1, pp.72-83, Jan. 1995.
[17] Stephanie Seneff, “A Computational Model for the
Peripheral Auditory System : Application to Speech Recognition Research,” ICASSP 86.
[18] Stephanie Seneff, “A Joint Synchrony/Mean-rate Model of
Auditory Speech Processing,” Journal of Phonetics 16,
55-76 1988.
[19] Svendsen, T.; Soong, F.,“On the Automatic Segmentation of Speech Signals”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), Volume 14, pp. 77 - 80, Apr.1987.
[20] Stevens, K.N., et al, “Acoustic and Perceptual
Characteristics of Voicing in Fricatives and Fricative
Clusters”, J. Acoust. Soc. Am., 91, pp. 2979-3000, 1992.
[21] 王小川,“語音訊號處理”,全華科技圖書,民國九十三年三月。
[22] 蔡佳君,“國語發音和語法” ,臺灣學生書局,民國六十七年
八月再版。
[23] 羅肇錦,“國語學”,五南圖書出版公司,民國七十九年一月。