簡易檢索 / 詳目顯示

研究生: 宋光婷
Sung, Kuang-Ting
論文名稱: 國語連續語音訊號中阻塞音偵測與辨識之研究
A Study on Detection and Recognition of Obstruents in Continuous Mandarin Speech
指導教授: 王小川
Wang, Hsiao-Chuan
口試委員:
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 90
中文關鍵詞: 基於知識的聲學語音學特徵史奈夫聽覺模型阻塞音偵測阻塞音分類阻塞音辨識
外文關鍵詞: knowledge-based acoustic-phonetic feature, Seneff Auditory Model, Obstruent Detection, Obstruent Classification, Obstruent Recognition
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 新一代語音辨認技術將是以知識為基礎的方式加上資料驅動模式,建構含有語音與語言學知識的語音辨識系統,並以多組特徵參數取代僅以一組特徵參數作為語音辨認的聲學特徵參數。除了傳統的聲學頻譜參數之外,還可以擷取多組語音特徵參數,用以偵測發音方法、發音部位、口腔形狀等語音事件。
    基於這個概念,本論文針對國語連續語音,做靜音(silence)、響音(sonorant)、與阻塞音(obstruent)之偵測,然後再進行阻塞音之分類與辨識之研究。國語阻塞音共包含有18個音素,可以依照發音部位的不同,將之區分為塞音(stop)、擦音(fricative)以及塞擦音(affricate)三種類型;塞音與塞擦音依照發音時是有有吐氣行為,可區分為不送氣音與送氣音兩種類型,個別包含了三種不同發音部位之六種音素;摩擦音依照發音時聲帶是否振動,可區分為不帶聲音與帶聲音兩種類形,包含了五種不同發音部位之六種音素。本論文探討可以分辨發音方法與發音部位的特徵參數,以標註過的訓練語料來統計各特徵參數之分布,並決定其作為語音事件偵測與辨識之門檻值。
    本論文之語音特徵參數是從聽覺模型(auditory model)推導出來,聽覺模型是一種模擬人耳聽覺神經處理聲音訊號的過程,本論文採用Seneff聽覺模型(Seneff auditory model)做為前端處理器,取其兩種輸出,即包絡頻譜(envelope spectrum)與同步頻譜(synchrony spectrum),利用此兩組輸出來計算特徵參數。


    A study on acoustic-phonetic features for the obstruent detection and classification based on the knowledge of Mandarin speech is proposed. Seneff auditory model is used as the front-end processor for extracting acoustic-phonetic features. These features are rich in their information content in a hierarchical decision process to detect and classify the Mandarin obstruents. The preliminary experiments showed that accuracy of obstruent detection is about 84%. An algorithm based on the information of feature distribution is applied to further classify the obstruents into stops, fricatives, and affricates. The average accuracy is about 80%. The proposed approach based on the feature distribution is simple and effective. It could be a very promising method for searching acoustic-phonetic features for the phone recognition in continuous speech recognition.

    第一章 緒論 1.1 研究動機..........................................1 1.2 中文發音性質......................................1 1.3 研究方向..........................................4 1.4 論文結構..........................................5 第二章 前端處理器 – Seneff聽覺模型 2.1 Seneff聽覺模型架構................................6 2.2 Seneff聽覺模型 - 階段一:線性臨界頻帶濾波.........8 2.2.1 預處理濾波器................................9 2.2.2 串接零點濾波器組...........................10 2.2.3 共振濾波器組...............................11 2.2.4 合成之臨界頻帶濾波器組.....................12 2.3 Seneff聽覺模型 - 階段二:毛細胞神經模型..........13 2.3.1 半波整流...................................13 2.3.2 短時段調適.................................14 2.3.3 低通濾波...................................15 2.3.4 快速自動增益控制...........................16 2.4 Seneff聽覺模型 - 階段三:包絡偵測器與同步偵測器...17 2.4.1 包絡偵測器.................................17 2.4.2 廣義同步偵測器.............................18 2.4.3 平均局部同步偵測器.........................19 第三章 阻塞音偵測與分類 3.1 阻塞音偵測.......................................20 3.1.1 架構流程...................................20 3.1.2 步驟一:靜音偵測...........................21 3.1.3 步驟二:響音偵測...........................25 3.1.4 步驟三:阻塞音偵測.........................26 3.1.5 步驟四:連續性條件限制......................29 3.1.6 實驗3.1 – 阻塞音偵測實驗..................29 3.1.6.1 實驗語料..............................30 3.1.6.2 實驗3.1-效能評估(一):音框屬性混淆矩陣30 3.1.6.3 實驗3.1-效能評估(二):阻塞音偵測效能..31 3.1.6.4 實驗結果檢討..........................31 3.2 阻塞音分類.......................................32 3.2.1 架構流程...................................32 3.2.2 聲學特徵參數萃取...........................33 3.2.3 高斯混合模型與最大相似度準則...............37 3.2.4 實驗結果與討論.............................39 3.2.4.1 實驗語料..............................39 3.2.4.2 實驗3.2.1 – 阻塞音分類實驗...........40 3.2.4.3 實驗3.2.2 - 塞音、非塞音分類實驗......41 3.2.4.4 實驗結果檢討..........................41 第四章 摩擦音辨識 4.1 架構流程.........................................43 4.2 帶聲偵測.........................................44 4.3 發音部位偵測.....................................47 4.4 實驗結果與討論...................................52 4.4.1 實驗語料...................................52 4.4.2 實驗4.1 - 帶聲偵測實驗.....................53 4.4.3 實驗4.2 - 發音部位偵測實驗.................55 4.4.4 實驗4.3 - 帶聲暨發音部位偵測實驗...........56 4.4.5 實驗結果檢討...............................57 第五章 塞音辨識 5.1 架構流程.........................................59 5.2 送氣偵測.........................................60 5.3 發音部位偵測.....................................63 5.4 實驗結果與討論...................................70 5.4.1 實驗語料...................................70 5.4.2 實驗5.1 - 送氣偵測實驗.....................71 5.4.3 實驗5.2 - 發音部位偵測實驗.................72 5.4.4 實驗5.3 - 送氣暨發音部位偵測實驗...........73 5.4.5 實驗結果檢討...............................74 第六章 結論與未來展望 6.1 結論.............................................75 6.2 未來展望.........................................78 參考文獻................................................79 附表一 臨界頻帶濾波器組參數設定與帶通範圍..........82 附表二 實驗用TCC300讀文語句內容....................85 附表三 國語注音錄音語料內容..........................90

    [1] Ahmed M. Abdelatty Ali,“Segmentation and Categorization
    of Phonemes in Continuous Speech”, Technical Report, TRCST25JUL98, Center for Sensor Technologies, University of Pennsylvania, 1998.
    [2] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul Mueller,
    “An Acoustic-Phonetic Feature-Based System for the Automatic Recognition of Fricative Consonants”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), Volume 2, 12-15, pp.961- 964, May 1998.
    [3] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul Mueller,
    “Robust Auditory-Based Speech Processing Using the Average
    Localized Synchrony Detection”, IEEE Transaction on
    Signal and Audio Processing, Volume 10, Issue 5, pp.
    279-292 , July 2001.
    [4] Ahmed M. Abdelatty Ali, Jan Van der Spiegel, Paul Mueller,“ Acoustic-Phonetic Features for the Automatic
    Classification of Stop Consonants“, IEEE Transaction on
    Signal and Audio Processing, Volume 9, Issue 8, pp.
    833–841, Nov. 2001.
    [5] Aversano, G.; Esposito, A.; Marinaro, M. ,“A New
    Text-Independent Method for Phoneme Segmentation”, IEEE
    Transaction on Circuits and Systems, Volume 2, 14-17 pp.
    516-519 , Aug.2001.
    [6] Behrens, S. and Blumstein, S. E.,“on the Role of the
    Amplitude of the Fricative Noise in the Perception of Place
    of Articulation in Voiceless Fricative Consonants”, J.
    Acoust. Soc. Am., 84, pp. 861-867,1988.
    [7] Chin-Hui Lee,“From Knowledge-Ignorant to Knowledge-Rich Modeling : A New Speech Research Paradigm for Next Generation Automatic Speech Recognition”,ICSLP2004, Keynote speech.
    [8] DaiJayram, A.K.V.; Ramasubramanian, V.; Sreenivas,
    T.V.,“Robust Parameters for Automatic Segmentation of
    Speech”, IEEE Transaction on Acoustics, Speech, and
    Signal Processing(ICASSP), Volume 1, pp. 1513-1516, 2002.
    [9] Das, S.; Hansen, J.H.L,“Detection of Voice Onset Time
    (VOT)for Unvoived Stops (/p/, /t/, /k/) Using the Teager
    Energy Operator (TEO) for Automatic Detection of Accented
    English“, IEEE Transaction on Signal Processing(NORSIG
    2004), pp.344-347, 2004.
    [10] Grayden, D.B., Scordilis, M.S.,“Phonemic Segmentation of
    Fluent Speech”, IEEE Transaction on Acoustics, Speech,
    and Signal Processing(ICASSP), Volume 1, 19-22, pp.
    73-76,Apr.1994.
    [11] Guoning Hu, DeLiang Wang,“Separation of Stop Consonants “, IEEE Transaction on Acoustics, Speech, and Signal
    Processing(ICASSP), Volume 2, 6-10, pp.749-752, Apr. 2003.
    [12] Hu Hongtao, Du Limin, “Temporal Pre-Classification for
    Chinese Voiceless Consonant Speech”, IEEE Transaction on Signal Processing, Volume 1, 14-18, pp.781-784, Oct.1996.
    [13] James M. Kates, ”A Time-Domain Digital Cochlear Model,”
    IEEE Transaction on Signal Processing ,Vol. 39, NO.12,
    December 1991.
    [14] Kimberley, B.; Searle, C.,“Automatic Discrimination of Fricative Consonants Based on Human Audition”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), pp.89-92, Apr.1979.
    [15] Ray D. Kent and Charles Read, “The Acoustic Analysis of Speech”, Published by Singular Publishing Group, Inc., First Edition, 1992.
    [16] Reynolds, D.A.; Rose, R.C., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), Volume 3, Issue 1, pp.72-83, Jan. 1995.
    [17] Stephanie Seneff, “A Computational Model for the
    Peripheral Auditory System : Application to Speech Recognition Research,” ICASSP 86.
    [18] Stephanie Seneff, “A Joint Synchrony/Mean-rate Model of
    Auditory Speech Processing,” Journal of Phonetics 16,
    55-76 1988.
    [19] Svendsen, T.; Soong, F.,“On the Automatic Segmentation of Speech Signals”, IEEE Transaction on Acoustics, Speech, and Signal Processing(ICASSP), Volume 14, pp. 77 - 80, Apr.1987.
    [20] Stevens, K.N., et al, “Acoustic and Perceptual
    Characteristics of Voicing in Fricatives and Fricative
    Clusters”, J. Acoust. Soc. Am., 91, pp. 2979-3000, 1992.
    [21] 王小川,“語音訊號處理”,全華科技圖書,民國九十三年三月。
    [22] 蔡佳君,“國語發音和語法” ,臺灣學生書局,民國六十七年
    八月再版。
    [23] 羅肇錦,“國語學”,五南圖書出版公司,民國七十九年一月。

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE