簡易檢索 / 詳目顯示

研究生: 羅祥友
Lao, Shung-You
論文名稱: 基於基頻與倍頻結構之語音偵測研究
Voice/nonvoice Detection Based on Fundamental Frequency and Harmonic Structure
指導教授: 劉奕汶
Liu, Yi-Wen
口試委員: 陳倩瑜
Chen, Chien-Yu
張智星
Jang, Jyh-Shing
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 63
中文關鍵詞: 語音偵測倍頻結構
外文關鍵詞: voice detection, harmonic structure
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文針對聲音式救護系統的前處理—語音偵測,進行研究與探討。由於聲音式救護系統無法假設,任何的呼喊聲在設定時間內發出;因此,聲音式救護系統的應用情境,需要作語音偵測,以在任何的時間點,判別擷取聲音訊號為生活聲響或者語音訊號。此外,本系統擷取是遠距離的聲音訊號,若單靠音量或訊雜比的權重比較方式,進行判別,可能會因為生活聲響的音量與訊雜比的權重不一定比語音訊號小,而產生誤判。故本論文提出一改善方法,利用語音訊號的週期性,延伸出分辨語音與非語音的特徵—基頻(Fundamental frequency)與倍頻結構(Harmonic structure),進行語音偵測。所謂的基頻,即是聲音訊號的音高,本論文利用自相關函數計算訊號的基頻;由於語音訊號的週期性,頻譜上基頻與倍頻皆出現峰值。依此特徵可以判斷輸入聲音為語音與否。經過收集的語料庫分析可以在訊雜比為5dB以上,有誤判率(False positive rate)在28%以內;錯失率(Miss rate)在10%以內。
    透過頻譜上的分析,發現樂器所發出的聲音訊號,(以下簡稱「器樂」),擁有倍頻結構的特徵。不過,頻譜能量上的呈現,器樂相較於語音起伏差異甚大(Total Variation, TV)。經過設計的公式計算起伏的差異,可以有效的分開語音與器樂。經過收集的語料庫分析在訊雜比為10dB時,有誤判率在11%以內;錯失率也在20%以內。其中提到的語料庫包含前往台北的雙連安養中心與仁濟安老所錄製長者的語音訊號和生活聲響,也包含實驗室同學錄製的生活聲響、語音訊號以及新竹小太陽醫院協助錄製的咳嗽聲。
    硬體實作方面,利用分時演算法計算訊號的頻譜來觀察倍頻結構,而快速傅立葉轉換縮短自相關函數找出基頻的計算時間,以上的方法在TI 提供的DSP開發板進行實作,主要使用到中斷副程式(Interrupt)與多通道緩衝串口(McBSP)兩種硬體溝通方式搭配撰寫程式實作。


    This thesis focuses on the front-end processing with voice detection of Voicecare system which is able to understand human calls for help. Since we cannot assume that calls for help occur at any pre-determined time, a Voicecare device needs a voice detector to distinguish whether the sound is a daily sound or voice at any time. Moreover, this device receives distant sounds; wrong judgments may be made if only comparing the volume or weighted SNR because the volume and weighted SNR of daily sounds are not definitely lower than the value of voice. This thesis addresses the problem and proposes to use fundamental frequency and harmonic structure to differentiate voice from nonvoice. Fundamental frequency is the pitch of voice; this thesis uses autocorrelation function to calculate the fundamental frequency of signals. Moreover, because of the periodicity of voice signals, there are peaks at fundamental and harmonic frequencies in the spectrum. Voice and nonvoice can be determinate based on the characteristics of signals as mentioned above. Experiments show that the false positive rate is within 28% and the miss rate is within 10%, if the SNR is above 5dB.
    By observing the magnitude spectrum of instrumental music and voice, we find out that the instrumental music with the characteristics of harmonic structure is misclassified as voice. However, the variation of instrumental music in magnitude spectrum is more dramatic than the variation of voice. Because of this observation, we can classify instrumental music and voice by calculating the total variation in magnitude spectrum. Experiments show that the false positive rate is within 11% and the miss rate is within 20%, if the SNR is 10 dB.
    A collection of sound files was recorded in Suang-Lien Elderly Center in Taipei (台北雙連安養中心) and Yan-Chai Elderly Center in Taipei (台北仁濟安老所), including voice and daily sounds of the elderly. In addition, voice and daily sounds gathered from classmates in Acoustic and Hearing laboratory and coughing sounds provided by新竹小太陽診所.
    For hardware development, we use “decimation in time” to calculate the spectrum and observe the harmonic structure. Also, Fast Fourier transform is utilized to shorten the computation time of autocorrelation function. These methods were implemented on a DSP board (Texas Instrument C6416). Hardware communication techniques, including techniques such as Interrupt and Multi-channel Buffered Serial Port (McBSP), are adopted to enable real-time implementation on the DSP board.

    摘要 I Abstract II 致謝 III 目次 IV 圖目次 VI 表目次 VIII 第一章 緒論 1 1.1 研究動機 1 1.2 研究內容 3 1.3 章節大綱 4 第二章 文獻回顧 5 2.1 音量 5 2.2 過零率 6 2.3 頻譜平坦度(Spectral Flatness Measure, SFM) 8 2.4 長時間頻譜差異法(Long-Term Spectral Divergence, LTSD) 9 2.5 離散餘弦結合倒頻譜分析 12 2.6 綜合評述 14 第三章 基頻、倍頻結構與語料庫環境 16 3.1 自相關函數(Autocorrelation Function, ACF)擷取時域訊號基頻 16 3.2 希爾伯轉換 19 3.3 倍頻結構 21 3.4 程式架構 25 3.5 頻譜能量總變動率(Total Variation, TV) 26 3.6 語料庫環境 32 第四章 分析與探討 33 4.1 錄音系統介紹、模擬空間響應與加雜訊方式 33 4.2 ROC原理與分析結果 34 4.3 訊雜比與空間響應分析 38 第五章 硬體實作 41 5.1 DSP板介紹與放大電路 41 5.2 中斷副程式(Interrupt) 44 5.3 多通道緩衝串口(Multi-channel Buffered Serial Port, McBSP) 47 5.4 程式架構與問題 49 第六章 結論與未來展望 53 6.1 結論 53 6.2 未來展望 55 參考文獻 56 附錄 快速傅立葉轉換 59

    [1] 張智星,"音訊處理與辨識",網路線上課程,可由作者之網頁 http://www.cs.nthu.edu.tw/~jang連結到此線上課程。
    [2] 王小川,"語音訊號處理",全華科技圖書,中華民國九十三年。
    [3] Y. W. Liu, “Hilbert transform and applications,” Fourier Transform Applications, Salih Mohammed Salih (Ed.), InTech (Rijeka, Croatia), 2012.
    [4] M. Lahat, R. J. Niederjohn, and D. A. Krubsack, “A spectral autocorrelation method for measurement of the fundamental frequency of noise corrupted speech,” IEEE Trans. Acoustic, Speech, Signal Processing, vol.ASSP-35, pp. 741–750, June 1987.
    [5] R. E. Yantorno, K. R. Krishnamachari, J. M. Lovekin, D. S. Benincasa, and S. J. Wenndt, “The spectral autocorrelation peak valley ratio (SAPVR) – a usable speech measure employed as a co-channel detection system,” IEEE Workshop on Intelligent Signal Processing, Hungary, pp. 193-197, May 2001.
    [6] J. Ramírez, J. C. Segura, M. C. Benítez, A. de la Torre, and A. Rubio, “Efficient voice activity detection algorithms using long-term speech information,” Speech Communication, vol. 42, no. 3-4, pp. 271–287, 2004.
    [7] J. Allen and D. Berkley, “Image method for efficiently simulating small-room acoustics,” Journal of the Acoustical Society of America, vol. 65(4), pp. 943-950, April 1979.
    [8] T. Fukuda, O. Ichikawa, M. Nishimura, “Improved voice activity detection using static harmonic features,” Proc. IEEE Int. Conference on Acoustics Speech and Signal Processing, pp. 4482-4485, 2010.
    [9] R. Tucker, “Voice activity detection using a periodicity measure,” Proc. Inst. Elect. Eng., vol. 139, no. 4, pp. 377–380, Aug. 1992.
    [10] T. Fukuda, O. Ichikawa, and M. Nishimura, “Long-term spectro-temporal and static harmonic features for voice activity detection,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 834-844, 2010.
    [11] J. S. Jang, “Speech and Audio Processing Toolbox,” available from the link at the author's homepage at “http://mirlab.org/jang”.
    [12] J. S. Jang, “Utility Toolbox,” available from the link at the author's homepage at “http://mirlab.org/jang”.
    [13] 2011交通大學嵌入式DSP程式設計培訓課程。
    [14] Oppenheim, A. V. & Schafer, R.W. (2010). Discrete-Time Signal Processing, 3rd Edition, Pearson (Boston, USA).
    [15] T. Fawcett., “An introduction to ROC analysis,” Pattern Recognition Letters, 27(8):861–874, 2006.
    [16] Low voltage audio power amplifier, datasheet QW-R107-007, D, Unisonic Technologies Co., Ltd., 2002.
    [17] M. Y. Choi, H. J. Song, H. S. Kim, “Speech/music discrimination for robust speech recognition in robots,” Proc. IEEE Int. Symposium on Robot and Human Interactive Communication, pp. 118-121, 2007.
    [18] H. Chatrzarrin; A. Arcelus; R. Goubran; F. Knoefel, “Feature extraction for the differentiation of dry and wet cough sounds, ” Proc. IEEE Int. Workshop on Medical Measurements and Applications, pp. 162-166, 30-31 May 2011.
    [19] 盧怡仁、蔡偉和,單晶片於數位信號處理的應用(以TMS320C6000的開發平台為例),文魁出版社,中華民國九十六年。
    [20] N. Dahnoun. 2000, Digital Signal Processing Implementation Using the Tms320c6000 DSP Platform (1st ed.), Addison-Wesley Longman Publishing Co., Inc (Boston, USA).
    [21] 廖育志,“結合雜訊抑制與帶聲語音重建之語音增強系統”,國立清華大學電機碩士論文,Jul 2010.
    [22] 楊佳興,“使用麥克風陣列實現即時語音純化與真人語音活動偵測系統”,國立交通大學電控碩士論文,Jul 2004.
    [23] Y. Guo, Q. Fu, and Y. Yan, “Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection,” Proc. Interspeech, pp. 2949-2952, 2007.
    [24] K. Ishizuka, T. Nakatani, M. Fujimoto, and N. Miyazaki, “Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio,” Proc. Interspeech, pp. 230-233, 2007.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE