簡易檢索 / 詳目顯示

研究生: 陶柏戎
Tao, Po Jung
論文名稱: 運用多個聯網麥克風進行室內環境語音與音樂之增強:波束成形方法開發與評估
Acoustic Enhancement of Music and Speech in an Indoor Space using Multiple Networked Microphones: Development and Evaluation of Beamforming Methods
指導教授: 劉奕汶
Liu, Yi Wen
口試委員: 徐正炘
朱大舜
李夢麟
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 中文
論文頁數: 48
中文關鍵詞: 聲學增強
外文關鍵詞: Acoustic, Enhancement
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 聲音訊號至今為止無論是在多媒體或是通訊領域中都扮演著重要的角
    色,而如何增強聲音訊號免除雜訊的干擾一直都是通訊產品關心的重要
    議題。在聲音波束形成中最常被討論的兩個方法為:1. Delay and Sum 2.Minimum Variance Distortionless Response (MVDR). 本論文在室內空間使用Raspberry Pi 簡易的開發平台並搭配多支麥克風來收集聲音訊號以建立聲音波束形成的系統,實現以上兩種方法。其中在同步訊號中,time difference of arrival (TDOA) 的部分將帶入Fracitonal Delay 的概念以提高準確度,另外本論文提出在時域修正權重讓雜訊能量統一的方法提升訊號增強的效果。聲音訊號在不同麥克風中有不同能量的雜訊干擾,本論文假設這些雜訊都具有隨機且彼此皆無相關性的特性,而所有訊號之間在雜訊能量相等時,可以達較好的訊號增強效果。經過實驗,在多個SNR 彼此相異的訊號中,本論文所提出的權重修正方法相較於傳統方法使聲音訊號提高更多的訊噪比(SNR),成功降低訊號噪音並改善室內聲音訊號的品質。


    Acoustic signals play an important role in both multimedia and communication,and acoustic signal enhancement using beamforming is still a challenging issue, especially for communication equipments. Two main methods are frequently used in acoustic beamforming : 1. Delay and Sum 2. Minimum Variance Distortionless Response (MVDR). In this research, we intend to construct an acoustic beamforming system to implement the two methods in the indoor environment by using multiple microphones and a low-cost single board computer, Raspberry Pi. For the estimation of time difference of arrival (TDOA), fractional delay is calculated to increase the resolution in time. In order to improve the acoustic enhancement, we propose a method that determines weights to make the power of noise leveled across different channels. We assume the noise is uncorrelated across channels, so the signalto-noise ratio can be improved simply by summation if signals are accurately aligned in time. The experiment result shows that the proposed method of determining the weights successfully enhances the signal in the indoor environmentand increases signal to noise ratio (SNR) in comparison with the conventional Delay and Sum method .

    1 緒論1 1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 文獻回顧. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 章節介紹. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 實驗方法介紹6 2.1 Delay and Sum 和系統流程介紹. . . . . . . . . . . . . . . . . 7 2.2 Minimum Variance Distortionless Response . . . . . . . . . . 9 2.3 TDOA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Time Domain Cross-correlation . . . . . . . . . . . . . 13 2.3.2 Frequency Domain General Cross-correlation . . . . . 14 2.3.3 Fractional Delay . . . . . . . . . . . . . . . . . . . . . 16 2.4 Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 實驗與結果分析22 3.1 設備與流程. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 實驗內容與結果. . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 近距離線性排列麥克風錄音. . . . . . . . . . . . . . . 25 3.2.2 遠距離線性排列麥克風錄音. . . . . . . . . . . . . . . 27 3.2.3 遠距分散排列麥克風錄音. . . . . . . . . . . . . . . . 28 3.2.4 情境模擬雜訊. . . . . . . . . . . . . . . . . . . . . . . 29 3.3 結果效益評估. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 結論與未來展望37 4.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    [1] F. Fujii, N. Hogaki, and Y. Watanabe, “A simple and robust binaural sound source localization system using interaural time difference as a cue,” in 2013 IEEE International Conference on Mechatronics and Automation, pp. 1095–1101, IEEE, Aug. 2013.
    [2] M. H. Moattar and M. M. Homayounpour, “A review on speaker diarization systems and approaches,” Speech Communication, vol. 54, no. 10,pp. 1065–1103, 2012.
    [3] L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2,pp. 257–286, 1989.
    [4] Ö. Yilmaz and S. Rickard, “Blind separation of speech mixtures via timefrequency masking,” IEEE Transactions on Signal Processing, vol. 52,no. 7, pp. 1830–1846, 2004.
    [5] J. Van de Sande, “Real-time beamforming and sound classification parametergeneration in public environments,” PhD Thesis, Delft University of Technology, 2012.
    [6] S. Gannot, S. S. Member, D. Burshtein, and E. Weinstein, “Signal enhancement using beamforming and nonstationarity with applications to speech,” IEEE Transactions on Signal Processing, vol. 49, no. 8,
    pp. 1614–1626, 2001.
    [7] Raspberry Pi, “Raspberry Pi - Teach, Learn, and Make with Raspberry Pi avaliable:https://www.raspberrypi.org/.”
    [8] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in 1979 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 208–211, IEEE, 1979.
    [9] P. Scalart and J. V. Filho, “Speech enhancement based on a priori signal to noise estimation,” in 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 629–632 vol. 2, IEEE,1996.
    [10] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 32, no. 6,pp. 1109–1121, 1984.
    [11] F. Jabloun and B. Champagne, “Incorporating the human hearing properties in the signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing,, vol. 11, no. 6, pp. 700–708, 2003.
    [12] C. M. Schmid, R. Feger, S. Scheiblhofer, and A. Stelzer, “Measurementbased
    delay-and-sum signal processing for linear antenna arrays,” in 2010 IEEE International Conference on Wireless Information Technology and Systems, pp. 1–4, IEEE, Aug 2010.
    [13] D. Kolossa, S. Araki, M. Delcroix, T. Nakatani, R. Orglmeister, and S. Makino, “Missing feature speech recognition in a meeting situation with maximum SNR beamforming,” in 2008 IEEE International Symposium
    on Circuits and Systems, pp. 3218–3221, IEEE, 2008.
    [14] X. Anguera, C. Wooters, and J. Hernando, “Acoustic beamforming for speaker diarization of meetings,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 7, pp. 2011–2022, 2007.
    [15] T. Yamada, S. Nakamura, and K. Shikano, “Robust speech recognition with speaker localization by a microphone array,” in 1996 Fourth International
    Conference on Spoken Language Processing., vol. 3, pp. 1317–1320, IEEE, 1996.
    [16] T. Betlehem and R. Williamson, “Acoustic beamforming exploiting directionality of human speech sources,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. V–365–8,IEEE, 2003.
    [17] Y. Tamai, S. Kagami, H. Mizoguchi, K. Sakaya, K. Nagashima, and T. Takano, “Circular microphone array for meeting system,” in Proceedings of IEEE Sensors 2003 (IEEE Cat. No.03CH37498), vol. 2,pp. 1100–1105, IEEE, 2003.
    [18] Z. Yuan and R. C. Hendriks, “Distributed delay and sum beamformer for speech enhancement in wireless sensor networks via randomized gossip,”in 2012 IEEE International Conference on Acoustics, Speech and Signal
    Processing, no. 1, pp. 4037–4040, IEEE, 2012.
    [19] D. E. Ba, D. Florencio, and C. Zhang, “Enhanced MVDR beamforming for arrays of directional microphones,” in 2007 IEEE International Conference on Multimedia and Expo, pp. 1307–1310, IEEE, Jul. 2007.
    [20] Y. Kaneda and J. Ohga, “Adaptive microphone-array system for noise reduction,” IEEE Transactions on Acoustics, Speech, and Signal Processing,vol. 34, no. 6, pp. 1391–1400, Dec. 1986.
    [21] A. Abdeen and L. Ray, “Design and performance of a real-time acoustic
    beamforming system,” in 2013 IEEE Sensors, pp. 1–4, IEEE, Nov. 2013.
    [22] S. Haykin and K. Liu, Handbook on array processing and sensor networks.Hoboken, NJ, USA: John Wiley & Sons, Inc., Jan. 2010.
    [23] S. Sur, T. Wei, and X. Zhang, “Autodirective audio capturing through a synchronized smartphone array,” Mobisys, pp. 28–41, 2014.
    [24] F. Rashid-Farrokhi, K. J. R. Liu, and L. Tassiulas, “Transmit beamforming and power control for cellular wireless systems,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1437–1449,1998.
    [25] K. C. Ho and Y. T. Chan, “Solution and performance analysis of geolocation by TDOA,” IEEE Transactions on Aerospace and ElectronicSystems, vol. 29, no. 4, pp. 1311–1322, 1993.
    [26] T. G. Dvorkind and S. Gannot, “Time difference of arrival estimation of speech source in a noisy and reverberant environment,” Signal Processing,
    vol. 85, no. 1, pp. 177–204, 2005.
    [27] Y. Rui and D. Florencio, “Time Delay Estimation in the Presence of Correlated Noise and Reverberation,” IEEE Transactions on Acoust.,Speech, Signal Processing, vol. 2, no. 2, pp. 133–136, 2004.
    [28] V. Välimäki, “Discrete-time modeling of acoustic tubes using fractional delay filters,” PhD Thesis, Helsinki University of Technology, 1995.
    [29] P. R. Roth, “Effective measurements using digital signal analysis,” IEEE Spectrum, vol. 8, no. 4, pp. 62–70, Apr. 1971.
    [30] J. Kuhn, “Detection performance of the smooth coherence transform(SCOT),” in 1978 IEEE International Conference on Acoustics, Speech,and Signal Processing, vol. 3, pp. 678–683, IEEE, 1978.
    [31] C. H. Knapp and G. C. Carter, “The generalized correlation method for estimation of time delay,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-24, no. 4, pp. 320–327, 1976.
    [32] M. Brandstein and H. Silverman, “A robust method for speech signal time-delay estimation in reverberant rooms,” in 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1,
    pp. 375–378, IEEE, 1997.
    [33] T. Laakso, V. Valimaki, M. Karjalainen, and U. Laine, “Splitting the unit delay [FIR/all pass filters design],” IEEE Signal Processing Magazine,vol. 13, no. 1, pp. 30–60, 1996.
    [34] G.-S. Liu and C.-H. Wei, “Programmable fractional sample delay filter with Lagrange interpolation,” Electronics Letters, vol. 26, no. 19,p. 1608, 1990.
    [35] J.-P. Thiran, “Recursive digital filters with maximally flat group delay,”IEEE Transactions on Circuit Theory, vol. 18, no. 6, pp. 659–664, 1971.
    [36] Z. Wei, H. Jun-Ying, M. Ji-Dan, and S. Xue-Li, “An algorithm of weak signal’s detection on the condition of strong interfernce,” in 2010 Second International Conference on Computer Modeling and Simulation, vol. 3,
    pp. 230–232, IEEE, Jan 2010.
    [37] M. Salovarda, I. Bolkovac, and H. Domitrovic, “Estimating perceptual
    audio system quality using PEAQ algorithm,” in 2005 18th International Conference on Applied Electromagnetics and Communications, pp. 1–4,IEEE, 2005.
    [38] Y. Hu and P. C. Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 1, pp. 229–238, 2008.
    [39] A. Deleforge and R. Horaud, “The cocktail party robot : sound source separation and localisation with an active binaural head,” Perception and Recognition, pp. 431–438, 2012.
    [40] N. Roman, “Binaural tracking of multiple moving sources,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing,vol. 5, pp. V–149–52, IEEE, 2003.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)

    QR CODE