研究生: |
陳柏瑞 Chen Bo Rui |
---|---|
論文名稱: |
頻域上雙聲道聲源分離方法:運算簡化以及音質改進之做法 Source Separation in Frequency Domain: Computation Cost Reduction and Sound Quality Enhancement |
指導教授: |
劉奕汶
Liu, Yi Wen |
口試委員: |
白明憲
Bai, Ming Sian 李夢麟 Lee, Meng Ling 李祈均 Lee, Chi Chun |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2017 |
畢業學年度: | 105 |
語文別: | 中文 |
論文頁數: | 52 |
中文關鍵詞: | 聲源分離 、獨立成分分析 、排列問題 、膨脹問題 |
外文關鍵詞: | source separation, ICA, scaling problem, permutation problem |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文於頻域上以獨立成分分析法做聲源分離,在現實環境中聲音以摺積混合抵達麥克風,透過短時間傅立葉轉換,可以有效簡化時域上獨立成分分析計算,但有兩個不確定因素影響訊號重組,分別為膨脹問題與排列問題。本論文提出一套演算法針對膨脹問題與排列問題做處理,並利用到達時間差定位聲源來向來判斷是否為雙聲源,避免單一聲源做分離時影響分離效果與增加運算時間,對於膨脹問題,使用高斯混合模型近似各個頻率柱上的分離訊號與混合訊號的分布,找出最大權重參數對應之平均數差來處理,減少高頻部分因膨脹問題所造成之雜訊;對於排列問題部分,利用同一聲源的各個頻帶在時間變化的呈現會有較大的相關性,將相關性較大的分離訊號歸類為同一聲源,本論文首先找出一組相關性較大之多個頻率柱做為基準,欲排列之分離訊號再與基準頻率柱比較,並找出最佳排列順序,簡化繁複的排列步驟與運算複雜度。與文獻[30]相比,本論文提出方法之運算時間減少17秒,訊號干擾比SIR值增加4dB,並於現場問卷調查評估音質與分離效果部分,分數亦增加1.4分。評估分離效果部分,以混合音檔與分離音檔各三組給受測者聆聽並寫下字句來判斷正確率,三組分離音檔之正確字數相較於混合音檔各提升41%、26%、45%之正確率,由結果可知,本論文提出之演算法與文獻[30]相比可以有效簡化排列問題之複雜度與減少雜音,達到提升音質與增進運算速度之目標。
In a real environment, sound sources are mixed through convolution mixture, and it is difficult to separate sources in the time domain. Therefore, we use independent component analysis (ICA) in the frequency domain. Using ICA in the frequency domain could reduce the computation, but there are two important ambiguities: scaling problem and permutation problem. These ambiguities affect reconstruction of separated source. In this thesis, a new approach is proposed for solving the scaling problem and permutation problem. Besides, Time difference of arrival (TDOA) is used to confirm that two sources exist simultaneously. To solve the scaling problem, the Gaussian mixture model is uesd to approximate the distribution of the separated signal and the mixed signal. The difference between the mean of separated signal and the mean of the mixed signal is compensated to solve the scaling problem. Considering the permutation problem, the present algorithm relies on the assumption that the correlations should be high between the temporal envelopes of neighboring frequencies from the same sound source. First, we find the five neighboring frequency bins which have a high correlation with each other as a standard. After that, separated source in other frequency bins could confirm permutation through the correlation with the standard. We compare with the result of the approach of [30]. Computation time is reduced by 17 seconds and SIR enhances by 4 dB. In the part of the questionnaire, we get a higher score than [30]. 66 subjects were recruited to conduct a listening comprehension test. The accuracy of listening comprehension of separated sources is 41%, 26%, 45% higher than unprocessed sounds. The results show that our approach reduces computation cost and enhances sound quality when compared to the existing method [30].
[1] M. G. Lopez P., H. Molina Lozano, L. P. Sanchez F., and L. N. Oliva Moreno, “Blind Source Separation of audio signals using independent component analysis and wavelets,” in CONIELECOMP 2011, 21st International Conference on Electrical Communications and Computers, 2011, pp. 152–157.
[2] J. Nikunen and T. Virtanen, “Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation,” IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727–739, Mar. 2014.
[3] Y. Yang, Z. Li, X. Wang, and D. Zhang, “Noise source separation based on the blind source separation,” in 2011 Chinese Control and Decision Conference (CCDC), 2011, pp. 2236–2240.
[4] Yun-Hsuan Hsiao, “Multiple Source Tracking and Separation Using MUSIC Algorithm,” Sound and Music Innovative Technologies College of Engineering National Chiao Tung University in 2011.
[5] Li-Wen Ho, “Using Generalized Gaussian Mixture Model to Detect Sound Locations of Unknown Number of Sources for Sound Segregation,” Communication Engineering College of Electrical and Computer Engineering National Chiao-Tung University in 2012.
[6] A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, 13.4(2000): 411-430.
[7] S. Wold, K. Esbensen, and P. Geladi, “Principle component analysis,” Chemometrics and intelligent laboratory systems, 1987.
[8] J.-F. Cardoso, , “Blind signal separation: statistical principles,” Proc. IEEE, vol. 86, no. 10, pp. 2009–2025, 1998.
[9] M. Zibulevsky and B. A. Pearlmutter, “Blind Source Separation by Sparse Decomposition in a Signal Dictionary,” Neural Computation, vol. 13, no. 4, pp. 863–882, Apr. 2001.
[10] Zhitang Chen, Laiwan Chan, “New approaches for solving permutation indeterminacy and scaling ambiguity in frequency domain separation of convolved mixtures,” Proceedings of International Joint Conference on Neural Networks, San Jose, California, USA, July 31 – pp. 911-918, August 5.2011
[11] Yi-Ru Lian, “An investigation of frequency domain ICA for speech signal separation,” Department of Electrical and Control Engineering College of Electrical Engineering and Computer Science National Chiao Tung University in 2004.
[12] C. Jutten and J. Herault, “Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture,” Signal processing 24.1 (1991): 1-10.
[13] P. Comon, “Independent component analysis, a new concept?,” Signal processing 36.3 (1994): 287-314.
[14] A. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation, vol. 7, no. 6, pp. 1129–1159, Nov. 1995.
[15] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis. ,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–34, Jan. 1999.
[16] Masour, A.;Jutten, C., “What should we say about the kurtosis?,” Signal Processing Letters, IEEE, Volume:6, Issue:12, Dec.1999, P321-322.
[17] Huber, P. “Projection pursuit,” The Annals of Statistics in 1985, 13(2):435–475
[18] T. Cover and J. Thomas, Elements of information theory. John Wiley & Sons, Inc., 2012.
[19] M. Jones and R. Sibson, “What is projection pursuit? ,” Journal of the Royal Statistical Society. Series A (General) (1987): 1-37.
[20] A. Hyvärinen, “New approximations of differential entropy for independent component analysis and projection pursuit,” Advances in Neural Information Processing Systems 10 (1998): 273-279.
[21] H. Shen and K. Huper, “Newton-Like methods for parallel independent component analysis,” in 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, 2006, pp. 283–288.
[22] S. Choi, S. Amari, A. Cichocki, and R. Liu, “Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels,” First International Workshop on Independent Component Analysis and Signal Separation. 1999.
[23] D. Luenberger, Optimization by vector space methods, John Wiley & Sons, Inc., 1969.
[24] K. Matsuoka, “Minimal distortion principle for blind source separation,” in Proceedings of the 41st SICE Annual Conference. SICE 2002., 2002, vol. 4, pp. 2138–2143.
[25] M.S. Lewichi and T.J. Sejnowski, “Learning Overcomplete Representation,”
Neural Computation, vol. 12, no. 2, pp. 337-365, 2000.
[26] Dmitry M. Malioutov, Müjdat Çetin, and Alan S. Willsky, “Homotopy continuation for sparse signal representation,” Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
[27] M. Z. Ikram and D. R. Morgan, “A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation,” in IEEE International Conference on Acoustics Speech and Signal Processing, 2002, vol. 1, pp. I–881–I–884.
[28] F. Nesta, T. S. Wada, and B.-H. Juang, “Coherent spectral estimation for a robust solution of the permutation problem,” in 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009, pp. 105–108.
[29] D. Nion, K. N. Mokios, N. D. Sidiropoulos, and A. Potamianos, “Batch and adaptive PARAFAC-Based blind separation of convolutive speech mixtures,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1193–1207, Aug. 2010.
[30] Huang-Yi Li, “Solving the permutation problem in frequency domain source separation based on the correlation of envelopes between frequencies,”清大碩士論文,2015.
[31] L. Parra and C. Spence, “Convolutive blind separation of non-stationary sources,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 3, pp. 320–327, May 2000.
[32] V. G. Reju, “Underdetermined convolutive blind source separation via time–frequency masking,” IEEE Trans. Audio, Speech, and Language Processing, vol. 18, no. 1, pp. 101–116, Jan. 2010
[33] M. Joho, H. Mathis, and R. Lambert, “Overdetermined blind source separation: Using more sensors than source signals in a noisy mixture,” Proc. ICA. 2000.
[34] E. Bingham and A. Hyvärinen, “A fast fixed-point algorithm for independent component analysis of complex valued signals,” International journal of neural systems 10.01 (2000): 1-8.
[35] Chao-Wen Li, “A probabilistic model for sound direction of arrival estimation based on signal-to-noise ratios in the frequency domain,” 清大碩士論文, 2015.
[36] E. Vincent, R. Gribonval, and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462–1469, Jul. 2006.
[37] R. Mazur, J. O. Jungmann, and A. Mertins, “A new clustering approach for solving the permutation problem in convolutive blind source separation,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013, pp. 1–4.
[38] http://www.kecl.ntt.co.jp/icl/signal/sawada/demo/bss2to4/index.html.
[39] https://www.google.com/intl/en/chrome/demos/speech.html