簡易檢索 / 詳目顯示

研究生: 李宗翰
Lee, Tsung-Han
論文名稱: 用於機器語音助理之聲源定位與分離
Source localization and signal extraction with application in robotic voice assistant
指導教授: 白明憲
Bai, Ming-sian R.
口試委員: 劉奕汶
Liu, Yi-Wen
李昇憲
Li, Sheng-Shian
學位類別: 碩士
Master
系所名稱: 工學院 - 動力機械工程學系
Department of Power Mechanical Engineering
論文出版年: 2019
畢業學年度: 108
語文別: 英文
論文頁數: 46
中文關鍵詞: 數位語音助理聲學陣列系統壓縮感知
外文關鍵詞: Voice assistant, Acoustic Array System, Compressive Sensing
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文基於娛樂目的實現了一個機器人語音助理(voice assistant, VA)系統,此系統包含四個模組―環形麥克風陣列、雲端聊天機器人、線性喇叭陣列以及運動控制運算元,這篇論文將負責實時聲學陣列收音系統以及系統整合兩個部分。在機器聽覺方面,本論文基於到達時間差(time difference of arrival, TDOA)提出一個快速聲源定位演算法TDOA-DOAE以及利用壓縮感知(compressive sensing, CS)演算法提升語音品質,並將演算法實作在機器人語音助理系統上。與傳統波束成行(beamforming)方法相比,壓縮感知演算法在低頻帶時,仍然保持高空間解析度,更適合應用在頻率相對低的語音訊號。除此之外,此特性也讓我們得以將演算法實作在口徑較小的緊湊型麥克風陣列(compact microphone array, CMA)。同時,我們也比較了多種迭代式壓縮感知演算法:LASSO最陡梯度法(steepest decent)、LASSO座標下降法(coordinate decent)、正交匹配追蹤演算法(orthogonal matching pursuit, OMP)以及壓縮牛頓法(compressive Newton’s method, CNT)。正交匹配追蹤演算法可以得出最佳解析度的波束型(beam-pattern)、對噪音的抵抗以及低運算複雜度,但實驗結果證明其複雜度對於實時系統仍舊太高。運算速度對於機器語音助理來說相當重要,本論文提出的TDOA-DOAE定位演算法不僅能實時追蹤聲源更能加速提取信號的程序,強化後的信號經由Amazon®雲端LEX服務的處理後,透過線性喇叭陣列回饋給使用者,運動模型上則是由控制兩個直流有刷馬達完成。整個聲學陣列系統以及運動控制單元藉由三個微型控制器實現在機器人作業系統(robot operating system, ROS)。實驗結果證明,此系統能夠正確解讀使用者語意並給使用者身臨其境的音效體驗。


    A robotic voice assistant (VA) is implemented for entertainment services. The robot consists of four modules: a circular microphone array, a cloud-based chatbot, a linear loudspeaker array, and a motion control unit. This thesis covers the work for robot audition and system integration of the project. Compressive sensing (CS) techniques and a proposed fast Time Difference of Arrival-Direction Of Arrival Estimation (TDOA-DOAE) algorithm are introduced to tackle robot audition. In contrast to conventional beamformers, CS beamformers preserve high resolution at low frequency band, which is advantageous in speech-related applications and is applicable to small-aperture the compact microphone arrays (CMA) as well. This thesis compares the performance of various iterative CS algorithms: LASSO-steepest descent (LASSO-SD), LASSO-coordinate descent (LASSO-CD), orthogonal matching pursuit (OMP), and compressive Newton’s method (CNT). OMP with modified stopping criterion demonstrates the thinnest beam-pattern, robustness to noise, and the lowest computational complexity. However, CS is computationally intensive and therefore unsuitable for real-time application. Since VA is a highly speed-critical device. A fast algorithm is proposed and is implemented on the basis of time difference of arrival (TDOA) to track the user position and extract voice commands in real-time fashion. The extracted voice command is processed by Amazon® LEX. The response from the cloud is broadcast at the robot end by a four-element linear loudspeaker array. Motion of the robot is controlled by using two servo motors. The preceding signal processing and motion control units are handled by using three micro-controllers working in concert under robot operating system (ROS). Experimental results have demonstrated the proposed robot voice assistant is able to interpret human commands correctly and render binaural effects immersively.

    摘 要 I ABSTRACT II 誌 謝 III LIST OF TABLES VI LIST OF FIGURES VII CHAPTER 1 INTRODUCTION 1 CHAPTER 2 COMPRESSIVE BEAMFORMER 5 2.1 ESM AND THE FAR-FIELD ARRAY MODEL 5 2.2 ACOUSTIC INVERSE PROBLEM 6 2.3 INTRODUCTION TO CS 6 2.4 RELAXATION METHODS 7 2.4.1 LASSO-steepest descent (LASSO-SD) 7 2.4.2 LASSO-coordinate descent (LASSO-CD) 9 2.5 GREEDY ALGORITHM 10 2.5.1 Orthogonal matching pursuit (OMP) 11 2.6 THRESHOLDING METHODS 12 2.6.1 Compressive Newton’s method (CNT) 12 CHAPTER 3 FAST TDOA-DOA ESTIMATION 16 3.1 PLANE-WAVE PROPAGATION MODEL 16 3.2 TDOA AND THE PHASE SHIFT OF TWO SENSORS 17 3.3 EXTENSION TO ARBITRARY ARRAY GEOMETRY 17 3.4 UCA FORMULATION 18 3.5 SIGNAL EXTRACTION 19 CHAPTER 4 SYSTEM INTEGRATION 23 4.1 UCA GEOMETRY 23 4.2 SYSTEM ARCHITECTURE 23 CHAPTER 5 RESULTS 29 5.1 PERFORMANCE OF ITERATIVE CS BEAMFORMERS 29 5.2 TDOA-DOAE PERFORMANCE 30 CHAPTER 6 CONCLUSIONS 42 REFERENCES 43

    [1] S. Argentieri, P. Danes, and P. Soueres, “A survey on sound source localization in robotic: From binaural to array processing methods”, Comput. Speech Language, vol. 34, no. 1, pp. 87-112, 2015.
    [2] M. R. Bai, J. G. Ih and J. Benesty, “Acoustic Array Systems: Theory, Implementation and Application,” Wiley-IEEE Press, 1st edition, Singapore, Chaps.3-4, 2013.
    [3] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. Antennas and propagation, vol. 34, pp.276-280, 1986.
    [4] J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, “Robust localization in reverberant rooms,” in Microphone Arrays: Signal Processing Techiniques and Application, M. S. Brandstein and D. B. Ward, Eds., pp. 157–180, Springer, 2001.
    [5] M.S. Brandstein and H.F. Silverman, “A robust method for speech signal time-delay estimation in reverberant rooms,” in Proc. IEEE Int. Conf. Acoustics, Speech & Signals Process. IEEE, vol. 1, pp. 375–378, 1997,.
    [6] E. Candes and M. Wakin, “An introduction to compressive sampling,” IEEE Signal Processing Mag., vol. 25, no. 2, pp. 21–30, 2008.
    [7] G. F. Edelmann and C. F. Gaumond, “Beamforming using compressive sensing,” J. Acoust. Soc. Am. 130(4), 232–237, 2011.
    [8] F. Chen, L. Shen, B. W. Suter, and Y. Xu, “A Fast and Accurate Algorithm for ℓ1 Minimization Problems in Compressive Sampling,” EURASIP Journal on Advances in Signal Processing, vol. 1, pp. 1-12, 2015.
    [9] P. Gerstoft, C. F. Mecklenbr€auker, W. Seong, and M. Bianco, “Introduction to compressive sensing in acoustics,” J. Acoust. Soc. Am. 143(6), 3731–3736, 2018.
    [10] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4655–4666, 2007.
    [11] J. Murray, S. Wermter, and H. Erwin, “Auditory robotic tracking of sound sources using hybrid cross-correlation and recurrent networks,” in Proc. Int. Conf. Intelligent Robots and Systems (IROS)’05, Edmonton, Canada, 2005, pp. 891–896.
    [12] A. Mahajan and M. Walworth. 3-d position sensing using the difference in the time-of-flights from a wave source to various receivers. IEEE Transactions on Robotics and Automation, 17(1):91-94, 2001.
    [13] Y. Huang, J. Benesty, and G. W. Elko, “Passive acoustic source localization for video camera steering,” in Proc. IEEE ICASSP, vol. 2, June 2000, pp. 909–912.
    [14] H. C. So, “Source localization: Algorithms and analysis”, Handbook of Position Localization: Theory, Practice and Advances, Chapter 2, edited by S. A. Zekavat and R. M. Buehrer, Wiley-IEEE, 2011.
    [15] K. W. Cheung, H.C. So, W. K. Ma, and Y. T. Chan, “Least square algorithms for time-of-arrival-based mobile location,” IEEE Trans. Signal Processing, vol. 52, no. 4, pp.1121-1128, Apr., 2004.
    [16] Y. T. Chan and K. C. Ho, “A simple and efficient estimator for hyperbolic location,” IEEE Trans. Signal Processing, vol. 42, pp. 1905–1915, Aug. 1994.
    [17] J.O. Smith and J.S. Abel, “Closed-form least-squares source location estimation from range-difference measurements,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. 35, pp. 1661–1669, 1987.
    [18] N. P. Valdivia and E. G. Williams, “Study of the comparison of the methods of equivalent sources and boundary element methods for near-field acoustic holography,” J. Acoust. Soc. Am. 120, 3694–3705, 2006.
    [19] J. Hald, “Basic theory and properties of statistically optimized near-field acoustical holography,” J. Acoust. Soc. Am. 125, 2105–2120, 2009.
    [20] A. Xenaki, E. Fernandez-Grande, and P. Gerstoft, “A sparse equivalent source method for near-field acoustic holography,” J. Acoust. Soc. Am. 141, 532–542, 2017.
    [21] G. Chardon, L. Daudet, A. Peillot, F. Ollivier, N. Bertin, and R. Gribonval, “Near-field acoustic holography using sparse regularization and compressive sampling principles,” J. Acoust. Soc. Am. 132, 1521–1534, 2012.
    [22] E. Candes, J. Romberg and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. Pure and Applied Math. 59(8), 1207-1223, 2006.
    [23] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, New York, Chap. 1-7, 2004.
    [24] M. R. Bai and C. C. Chen, “Application of Convex Optimization to Acoustical Array Signal Processing,” J. Sound Vibration, 332(5), 6596-6616, 2013.
    [25] M. Grant, and S. Boyd, cvx, Version 1.21 MATLAB software for disciplined convex programming available at http://cvxr.com/cvx, (last viewed August 20, 2016).
    [26] I. Daubechies, M. Defrise, and C. D. Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun. Pure Appl. Math., vol. 57, pp. 1413–1457, 2004.
    [27] T. T. Wu and K. Lange, “Coordinate descent algorithms for Lasso penalized regression,” Annals of Applied Statistics, 2(1), pp. 224–244, 2008.
    [28] E. J. Cand`es, and T. Tao, “Decoding by linear programming”, IEEE Transactions on Information Theory, 51(12), pp. 4203–4215, 2005.
    [29] T. T. Cai and L. Wang, ‘‘Orthogonal matching pursuit for sparse signal recovery with noise,’’ IEEE Trans. Inf. Theory, vol. 57, no. 7, pp. 4680–4688, 2011.
    [30] M. R. Bai, C. Chung, and S-S. Lan, “Iterative algorithm for solving acoustic source characterization problems under block sparsity constraints,” J. Acoust. Soc. Am. 143, 3747–3757, 2018.
    [31] R. Siegwart, I. Nourbakhsh, and D. Scaramuzza, Introduction to Autonomous Mobile Robots, 2nd Ed. MIT Press, 2011.
    [32] B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo, Robotics: Modeling, Planning and Control, Springer, 2010.
    [33] P. Corke, Robotics, Vision and Control, Springer, 2011.
    [34] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. B. Foote, J. Leibs, R. Wheeler, and A.Y. Ng, “ROS: An open-source robot operating system,” in Proc. ICRA Open-Source Softw. Workshop, 2009.

    QR CODE