研究生: |
許逸誠 Hsu, Yi-Cheng |
---|---|
論文名稱: |
用於語音助理之聲學信號增強與雙耳音效技術 Acoustic signal enhancement and binaural audio rendering for a voice assistant |
指導教授: |
白明憲
Bai, Ming-Sian |
口試委員: |
丁川康
Ting, Chuan-Kang 簡仁宗 Chien, Jen-Tzung |
學位類別: |
碩士 Master |
系所名稱: |
工學院 - 動力機械工程學系 Department of Power Mechanical Engineering |
論文出版年: | 2019 |
畢業學年度: | 108 |
語文別: | 英文 |
論文頁數: | 84 |
中文關鍵詞: | 聲源定位 、粒子群最佳化 、音訊擷取 、雙耳音效 |
外文關鍵詞: | source localization, PSO, signal extraction, binaural audio |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文結合麥克風陣列與喇叭陣列的技術,建立一套能應用於語音助理上的聲學陣列系統。在麥克風陣列系統,多重信號分類法(Multiple signal classification, MUSIC)的方法被用來來估算平面波的方向。在估算階段本論文提出利用修正型粒子群最佳化法結合適應性參數(Modified particle swarm optimizer with adaptive coefficients, MPSO-AC)演算法去更有效率的搜索多聲源的位置。本論文亦提出使用提克諾夫正規化結合閥值技術(Tikhonov Regularization with thresholding ,TIKR-thresholding)方法來擷取聲源訊號。定位的結果指出MPSO-AC搜索相較於傳統等間距網格式搜索在運算量有大幅度的減少,並且能達到更高的定位精準度。信號提取的部分,在多聲源的環境下實驗結果顯示TIKR-thresholding的方法能夠在多聲源的環境下顯著地增強信號的品質。在喇叭陣列系統中,以時域欠定多通道逆預濾波器(time-domain underdetermined multichannel inverse pre-filters, TUMIF)的雙耳音訊呈現系統為基礎,並將此系統應用在串音消除(cross-talk cancellation system, CCS )、聲源拓寬(source widening)和五通道虛擬環繞音效(5-channel virtual surround audio)等三種技術上。在設計逆濾波器時採用了TIKR 演算法來限制逆濾波器的增益值。客觀和主觀的測試結果證明了TUMIF方法對於雙耳音效的有效性。
This thesis combines the microphone array and the loudspeaker array technology to establish an acoustic array system that can be applied to voice assistant. In microphone array system, Multiple Signal Classification (MUSIC) is used to estimate the Direction of Arrival (DOA) of sources. In addition, the Modified Particle Swarm Optimizer algorithm with adaptive coefficients (MPSO-AC) is used to locate the multiple source efficiently. This thesis also applies Tikhonov Regularization with Thresholding (TIKR-thresholding) to extract source signals. The results of source localization show that the MPSO-AC search effectively reduces the computational load and achieve higher localization accuracy than the uniform grid search, and MPSO with fixed coefficients. The separation results also show that the quality of source signals extracted by using the TIKR-thresholding method is significantly enhanced in multi-sources scenarios. The loudspeaker array system is based on binaural audio rendering system designed with the time-domain underdetermined multichannel inverse pre-filters (TUMIF). The system is applied to cross-talk cancellation system (CCS), source widening, and 5-channel virtual surround audio. The TIKR algorithm is used to limit the gain of the inverse filters. Results of objective and subjective tests have demonstrated the efficacy of the TUMIF approach for binaural audio rendering.
[1] Z. Wang, H. Zhang, G, Bi, “Speech signal recovery based on source separation and noise suppression”, Journal of Computer and Communications, vol. 2, pp.112-120, 2014.
[2] Y. H. Kim, J. W. Choi, Sound visualization and manipulation, Hoboken, NJ, USA:Wiley, Sep. 2013.
[3] P. C. Loizou, Speech Enhancement¬-Theory and Practice. Boca Raton, FL, USA:CRC, Taylor and Francis, 2007.
[4] M. R. Bai, J.-G. Ih, J. Benesty, Acoustic Array Systems: Theory, Implementation, and Application, Wiley-IEEE Press, no. 1st, Chaps. 3-4, 2013.
[5] M. Bertero, T. Poggio, and V. Torre, “Ill-Posed Problems in Early Vision,” Proceedings of the IEEE, vol.76, 1988, pp. 869-889.
[6] R. Schmidt, “Multiple emitter location and signal parameter estimation”, IEEE Transaction on Antennas and Propagation, vol. 34, no. 3, pp. 276-280, March 1986.
[7] J. Kennedy and R. C. Eberhart, “Particle swarm optimization”, Proceedings of ICNN'95 - International Conference on Neural Networks, Perth, WA, Australia, pp. 1942-1948, 1995.
[8] R. C. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory”, IEEE Proceedings of the sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, pp. 39-43, October 1995.
[9] K. A. De Jong, “An analysis of the behavior of a class of genetic adaptive systems”, Ph.D. dissertation, Univ. Michigan, Ann Arbor, MI, 1975.
[10] D. E. Goldberg and J. Richardson, “Genetic algorithms with sharing for multimodal function optimization”, Proc. 2nd Int. Conf. Genetic Algorithms, Cambridge, MA, pp. 41–49, 1987.
[11] D. Beasley, D. R. Bull, R. R. Martin, “A sequential niche technique for multimodal function optimization”, Evol. Comput., vol. 1,no. 2, pp. 101–125, 1993.
[12] M. Bessaou, A. Petrowski, P. Siarry, “Island model cooperating with speciation for multimodal optimization”, Proc. 6th Int. Conf. Parall. Prob. Solv. from Nat.: PPSN VI, pp. 16–20, 2000.
[13] X. Yin and N. Germay, “A fast genetic algorithm with sharing scheme using cluster analysis methods in multi-modal function optimization”, Proc. Int. Conf. Artif. Neural Netwo. Genet. Algorith., pp. 450–457, 1993.
[14] A. Petrowski, “A clearing procedure as a niching method for genetic algorithms”, Proc. Int. Conf. Evol. Comput. , pp. 798–803, May. 1996.
[15] J. P. Li, M. E. Balazs, G. T. Parks, P. J. Clarkson, “A species conserving genetic algorithm for multimodal function optimization”, Evol. Comput., vol. 10, no. 3, pp. 207–234, 2002.
[16] Angeline, P.J., “Using selection to improve particle swarm optimization”, In Proceedings of IEEE Congress on Evolutionary Computation, Anchorage, Alaska, pp. 84-89, 1998.
[17] Chen Dong, Gaofeng Wang and Zhenyi Chen, “A method of self-adaptive inertia weight for PSO”, In Proceedings of IEEE Congress on Computer Science and Software Engineering, Wuhan, China, pp.1195-1198, 2008.
[18] Y. Shi and R. C. Eberhart, “A modified particle swarm optimizer,” in Proc. IEEE World Congr. Comput. Intell, pp. 69–73, 1998.
[19] Y. Shi and R. C. Eberhart, “Empirical study of particle swarm optimization,” in Proc. IEEE Congr. Evol. Comput., pp. 1945–1950, 1999.
[20] A. Chatterjee and P. Siarry, “Nonlinear inertia weight variation for dynamic adaptation in particle swarm optimization,” Comput. Oper. Res.,vol. 33, no. 3, pp. 859–871, Mar. 2004.
[21] Y. Shi and R. C. Eberhart, “Fuzzy adaptive particle swarm optimization,” in Proc. IEEE Congr. Evol. Comput., vol. 1, pp. 101–106, 2001.
[22] A. Ratnaweera, S. Halgamuge, and H. Watson, “Particle swarm optimization with self-adaptive acceleration coefficients,” in Proc. 1st Int. Conf. Fuzzy Syst. Knowl. Discovery, pp. 264–268, 2003.
[23] A. Ratnaweera, S. Halgamuge, and H. Watson, “Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients,” IEEE Trans. Evol. Comput., vol. 8, no. 3, pp. 240–255, Jun. 2004.
[24] T. Yamaguchi and K. Yasuda, “Adaptive particle swarm optimization: Self-coordinating mechanism with updating information,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., Taipei, Taiwan, pp. 2303–2308, Oct. 2006.
[25] P. K. Tripathi, S. Bandyopadhyay, and S. K. Pal, “Adaptive multi-objective particle swarm optimization algorithm,” in Proc. IEEE Congr. Evol. Comput., Singapore, pp. 2281–2288, 2007.
[26] C.W Groetsch, The theory of Tikhonov regularization for Fredholm equation of the first kind, Pitman, Boston, 1984.
[27] E. Vincent, R. Gribonval, C. Fevotte, “Performance measurement in bblind audio source separation”, IEEE Trans. Audio Speech Language Process., vol. 14, pp. 1462-1469, Jul. 2006.
[28] T. Sporer, “Wave field synthesis - generation and reproduction of natural sound environments”, in Proc. of the 7th int. conference of digital audio effects, Naples, Italy, 2004.
[29] S. Spors, R. Rabenstein, J. Ahrens, “The theory of wave field synthesis revisited”, Proc. AES 124th Conv. Audio Eng. Soc., May. 2008.
[30] D. de Vries, Wave Field Synthesis, AES Monograph, AES, New York, 2009.
[31] F. M. Fazi, “Sound field reproduction”, Ph.D. dissertation, University of Southampton, 2010.
[32] M. Kolundzija, C. Faller, and M. Vetterli, “Sound field reconstruction: An improved approach for wave field synthesi”, in Proc. 126th Conv. Audio Eng. Soc., Munich, Germany, 2009.
[33] M. Kolundzija, C. Faller, and M. Vetterli, “ Designing practical filters for sound field reconstruction,” in Proc. 127th Conv. Audio Eng. Soc., New York, 2009.
[34] M. A. Gerzon, “Ambisonic in multichannel broadcasting and video”, J. Audio Eng. Soc., vol. 33, pp. 859-871, 1985.
[35] B. B. Bauer, ‘‘Stereophonic earphones and binaural loudspeakers’’, J. Audio Eng. Soc., vol. 9, pp. 148-151, 1961.
[36] W. F. Druyvesteyn and J. Garas, “Personal sound”, J. Audio Eng. Soc., vol. 45, pp. 685–701, 1997.
[37] C. Kyriakakis, “Fundamental and technological limitations of immersive audio systems”, IEEE Proceedings, vol. 86, pp. 941-951, 1998.
[38] R. Nicol, “Binaural Technology”, AES Monograph, Now York, 2010.
[39] B. Gardner, K. Martin, HRTF measurements of KEMAR dummy-head microphone, 1994.
[40] M. R. Schroeder and B. S. Atal, ‘‘Computer simulation of sound transmission in rooms’’, IEEE Conv. Rec., pp. 150-155, 1963.
[41] P. Damaske and V. Mellert, ‘‘A procedure for generating directionally accurate sound images in the upper- half space using two loudspeakers’’, Acustica, vol. 22, pp. 154-162, 1969.
[42] D. H. Cooper, “Calculator program for head-related transfer functions”, J. Audio Eng. Soc., vol. 30, pp. 34-38, 1982.
[43] W. G. Gardner, “Transaural 3D audio”, MIT Media Laboratory Tech. Report, 342 , 1995.
[44] D. H. Cooper, J. L. Bauck, “Prospects for transaural recording”, J. Audio Eng. Soc., vol. 37, pp.3-19, 1989.
[45] J. L. Bauck and D. H. Cooper, “Generalized transaural stereo and applications”, J. Audio Eng. Soc., vol. 44, pp. 683-705, 1996.
[46] M. R. Bai, and C. C. Lee, “Objective and subjective analysis of effects of loudspeaker span on crosstalk cancellation in spatial sound reproduction”, J. Acoust. Soc. Am., Sept. 2006.
[47] D. B. Ward and G. W. Elko, “Optimal loudspeaker spacing for robust crosstalk cancellation”, Proc. IEEE Int. Conf. Acoust. Speech Signal Processing (ICASSP-98), vol. 6, pp. 3541-3544, 1998.
[48] D. B. Ward and G. W. Elko, “Effect of loudspeaker position on the robustness of acoustic crosstalk cancellation”, IEEE Signal Process Lett., vol 6, pp. 106-108, May. 1999.
[49] M. R. Bai, C. W. Tung, and C. C. Lee, “Optimal design of loudspeaker arrays for robust cross-talk cancellation using the Taguchi method and the genetic algorithm”, J. Acoust. Soc. Am., vol. 117, pp. 2802-2813, 2005.
[50] O. Kirkeby, P. A. Nelson, H. Hamada, “Fast deconvolution of multichannel systems using regularization”, IEEE Trans. Speech Audio Processing, vol. 6, pp. 189-195, Mar. 1998.
[51] O. Kirkeby and P. A. Nelson, “Digital Filter Design for Inversion Problems in Sound Reproduction”, J. Audio Eng. Soc., vol. 47, no. 7/8, pp. 583-595, Aug. 1999.
[52] J. F. Claerbout, Earth Soundings Analysis: Processing versus Inversion (PVI), 1992.
[53] M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics”, IEEE Trans. Acoust. Speech Signal Processing, vol. 36, pp. 145-152, 1988.
[54] S. G. Norcross, G. A. Soulodre, M. C. Lavoie, “Subjective investigations of inverse filtering”, J. Audio Eng. Soc., vol. 52, no. 10, pp. 1003-1028, Oct. 2004.
[55] A. J. Klockars, G. R. Hancock, M. J. McAweeney, “Power of unweighted and weighted versions of simultaneous and sequential multiple-comparison procedures”, Psychological Bulletin, vol. 118, pp. 300–307, 1995.
[56] R. Brits, "Niching strategies for particle swarm optimization", M.S. thesis, Dept. Faculty of Natural & Agricultural Sciences, Pretoria Univ., Pretoria, South Africa 2002.
[57] ITU-R Recommendation BS.1534-1, “Method for the subjective assessment of intermediate sound quality (MUSHRA)”, International Telecommulications Union, Geneva, Switzerland, 2001.
[58] C Dinu, C Andrei, "PEAQ – an Objective Method to Assess the Perceptual Quality of Audio Compressed Files", Proceedings of International Symposium on System Theory, SINTES 12, Oct. 2005.