頻域上之聲源分離: 利用滑動k-平均演算法解決排列問題

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳邦尹 Chen, Bang-Yin
論文名稱：	頻域上之聲源分離: 利用滑動k-平均演算法解決排列問題 Source Separation in the Frequency Domain: Solving the Permutation Problem by a Sliding K-means Method
指導教授：	劉奕汶 Liu, Yi-Wen
口試委員:	黃朝宗 Huang, Chao-Tsung 曹昱 Tsao, Yu 林守德 Lin, Shou-De
學位類別：	碩士 Master
系所名稱：	電機資訊學院 - 電機工程學系 Department of Electrical Engineering
論文出版年：	2018
畢業學年度：	107
語文別：	英文
論文頁數：	61
中文關鍵詞：	獨立成分分析、膨脹問題、排列問題
外文關鍵詞：	component
相關次數：	點閱：4 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

本論文於頻域上做聲源分離，在現實環境中，各聲源是以摺積混合的方式到達麥克風。前人為了降低其運算複雜度，將混合訊號透過短時間傅立葉轉換轉換到時頻域上，再以獨立成份分析法分離每個頻率柱上的訊號。然，分離後的訊號面臨了膨脹問題與排列問題。解決排列問題較複雜，因此也是本論文的探討重點。對於排列問題而言，前人基於相同聲源頻率間的能量包絡具有高相關性而發展出一套相關性演算法。而本論文提出的滑動k-平均演算法將與之比較與分析。經過獨立成份分析法以及解決完兩個後續問題之後，我們可以將每個頻率柱上的解混合矩陣求出，並透過實際測量環境的頻率響應進而算出混合矩陣作為正確答案。理論上，兩者應互為反矩陣。因此，本論文發明了一套評分系統來檢驗兩矩陣相乘後的對角集中度，並定義了兩個客觀指標來量化與評估分離結果。在本實驗中，我們將歌手依據不同性別的組合分為三組。k-平均演算法能達到 90.5% 的排列準確度，將滑動的過程加入後，排列準確度又可以普遍上升1% ~ 3%。另一方面，前人提出的相關性演算法雖能達到更高的排列準確度但卻很容易受到不同參數設定的影響而顯得不夠穩定。以上結果顯示了本論文於解決排列問題而提出之演算法，其效果足以與前人方法抗衡又增加了更高的穩定性。

This thesis aims at solving source separation problem in the frequency domain. In an actual environment, mixed source signals are convolutive mixtures. Some previous works indicate that it is easier to separate convolutive mixtures in the 2-dimensional time-frequency domain after applying short-time Fourier transform (STFT) to the signals. Then, independent component analysis (ICA) is utilized to separate the sources in each frequency bin. However, this leaves two uncertain factors to handle, namely the scaling problem and the permutation problem. Among these two problems, the latter is the focus in this thesis. Considering the permutation problem, the correlation method and the sliding k-means method are proposed and compared based on the assumption that higher correlations should be found between the temporal envelopes of neighboring frequency bins from the same source. After going through ICA and solving these two problems, the un-mixing matrix can be calculated. To evaluate the performance, we measured the frequency response of the environment and obtained the mixing matrix which can serve as the ground truth. Then, a scoring system combining both matrices and two objective indices are defined to quantify and evaluate the separation performance objectively. In our experiments, we divide the singers into 3 groups (male+male, female+female, male+female). Among 3 groups, the permutation accuracy of the k-means method can reach at least 90.5 % with respect to different parameters. After introducing the "sliding process", the permutation accuracy generally rises 1~3 %. On the other hand, the correlation method can reach higher permutation accuracy than the k-means method but is vulnerable to parametric variations and shows great instability. The results have shown that our new approach is stable and also yields a comparable performance.

Abstract                                         ⅰ
中文摘要                                                                 ⅱ
Contents                                                                     ⅲ
List of Figures                                                             ⅴ
List of Tables                                                                 ⅷ

  Introduction                                                         1
1    Motivation and purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1
2    Research Methods  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2
3    Source Separation Systems  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
3.1    Determined Systems  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
3.2    Underdetermined Systems  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
3.3    Overdetermined Systems  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
4    Research Goals  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
5    Organization of This Thesis  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5

  Independent Component Analysis (ICA)                                6
1    Background Knowledge of ICA  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
2    Hypotheses  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  8
3    Pretreatment of Singing Signals  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  10
3.1    Centering  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  10
3.2    De-correlation  . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .  10
4    Measure of non-Gaussianity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
4.1    Kurtosis  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  11
4.2    Negentropy  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  12
4.3    Approximations of Negentropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  13
5    Optimization Algorithm  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  14
6    Uncertain Factors  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  16
6.1    The Scaling Problem  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  17
6.2    The Permutation Problem  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  17

  Solving Uncertain Factors                                 20
1    Solving the Scaling Problem  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  20
2    Solving the Permutation Problem  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  22
2.1    The Correlation Method  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  23
2.2    The Sliding K-means Method  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  27
3    Flow Diagram of the System  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  32

  Experiments and Results                                               34
1    Equipment and the Environment  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  34
2    Evaluation Methods  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  37
2.1    Measuring the Frequency Response of the Environment  . . . . . . . . . . . . .  37
2.2    Defining a New Scoring System  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  40
2.3    Pitch Detection by YIN Algorithm  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  42
3    Results  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  43

  Conclusion and Future Work                                         53
1    Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  53
2    Future Work  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  54

References                                                                 55

Appendix                                                                 59
A.1     Derivation of Independent Component Analysis in the Complex Domain  . . .  59
A.2     Suggestions from the Committees  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  61




                                

[1] Y. Yang, Z. Li, X. Wang and D. Zhang, "Noise source separation based on the blind source separation," IEEE Control and Decision Conference (CCDC), pp. 2236-2240, May, 2011.
[2] T. Virtanen, "Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria," IEEE Trans. on Audio, Speech, and Language Proc. Vol. 15, no. 3, pp. 1066-1074, 2007.
[3] J. F. Cardoso, "Source separation using higher order moments," IEEE Int. Conf., Acoust. Speech, and Signal Processing (ICASSP), pp. 2109-2112, May 1989.
[4] M. Zibulevsky and B. A. Pearlmutter, "Blind source separation by sparse decomposition in a signal dictionary," Neural Computation, vol. 13, no. 4, pp. 863-882, 2001.
[5] A. Hyvärinen and E. Oja, "Independent component analysis: algorithms and applications," Neural Networks, vol. 13, no. 4-5, pp. 411-430, 2000.
[6] S. Wold, K. Esbensen and P. Geladi, "Principal component analysis," Chemometrics and Intelligent Laboratory Systems, vol. 2, no. 1-3, pp. 37-52, 1987.
[7] J. F. Cardoso, "Blind signal separation: statistical principles," Proceedings of the IEEE, vol. 86, no. 10, pp. 2009-2025, 1998.
[8] A. Mansour, K. Mitsuru and N. Ohnishi, "Blind separation for instantaneous mixture of speech signals: Algorithms and performances," Proceedings of the IEEE Region Ten Conference, vol. 1, pp. 26-32, 2000.
[9] T. Nishikawa, H. Saruwatari and K. Shikano, "Comparison of time-domain ICA, frequency-domain ICA and multistage ICA for blind source separation," 2002.
[10] V. G. Reju, S. N. Koh and Y. Soon, "Underdetermined convolutive blind source separation via time–frequency masking," IEEE Trans. on Audio, Speech, and Language Proc. Vol. 18, no. 1, pp.101-116, 2010.
[11] Y. Xue, Y. Wang and Q. Sun, "A novel method for overdetermined blind source separation," IEEE Int. Conf. Info. Sci. and Engine. pp. 1751-1784, 2010.
[12] Z. Chen and L. Chan. "New approaches for solving permutation indeterminacy and scaling ambiguity in frequency domain separation of convolved mixtures," IEEE Int. Joint Conf. Neural Networks (IJCNN), pp. 911-918, July 2011.
[13] D. Mallis, T. Sgouros and N. Mitianoudis, "Convolutive Audio Source Separation using Robust ICA and Reduced Likelihood Ratio Jump," IFIP Int. Conf. on Artif. Intell. Applied. and Innov., Springer, Cham, pp. 230-241, 2016.
[14] C. Jutten and J. Herault, "Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture," Signal processing, vol. 24, no. 1, pp. 1-10, 1991.
[15] P. Comon, "Independent component analysis, a new concept?," Signal processing, vol. 36, no. 3, pp. 287-314, 1994.
[16] A. J. Bell and T. J. Sejnowski, "An information-maximization approach to blind separation and blind deconvolution," Neural Computation, vol. 7, no. 6, pp. 1129-1159, 1995.
[17] A. Hyvarinen, "Fast and robust fixed-point algorithms for independent component analysis," IEEE Transactions on Neural Networks, vol. 10, no. 3, pp. 626-634, 1999.
[18] C. Borb and R. Martin, "On the construction of window functions with constant-overlap-add constraint for arbitrary window shifts," IEEE Int. Conf., Acoust. Speech, and Signal Processing (ICASSP), pp. 337-340, 2012.
[19] E. Bingham and A. Hyvärinen, "A fast fixed-point algorithm for independent component analysis of complex valued signals," International Journal of Neural Systems, vol. 10, no. 1, pp. 1-8, 2000.
[20] M. Novey and T. Adali, "On extending the complex FastICA algorithm to noncircular sources," IEEE Trans. Signal Processing, vol. 56, no. 5, pp. 2148-2154, 2008.
[21] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley Series in Telecommunications and Signal Processing, 2012.
[22] M. C. Jones and R. Sibson, "What is projection pursuit?," Royal Statistical Society. Series A (General), pp. 1-37, 1987.
[23] A. Hyvarinen, "New approximations of differential entropy for independent compent analysis and projection pursuit," Advances in Neural Information Processing Systems 10 (1998): 279-279.
[24] H. Shen and K. Huper. "Newton-like methods for parallel independent component analysis," IEEE Machine Learning for Signal Processing, pp. 283-288, 2006.
[25] S. Choi, S. I. Amari, A. Cichocki and E. W. Liu, "Natural gradient learning with a nonholonomic constraint for blind deconvolution of multiple channels," In Proc. Int. Workshop on Indep. Comp. Ana. and Blind Signal Separation, pp. 371-376, 1999.
[26] S. Guillermo and H. Calderon, "A Comparison of SOBI, FastICA, JADE and Infomax Algorithms," Proceedings of the 8th International Multi-Conference on Complexity, Informatics and Cybernetics (IMCIC), 2017.
[27] D. G. Luenberger, Optimization by vector space methods, John Wiley & Sons, Inc., 1969.
[28] K. Matsuoka, "Minimal distortion principle for blind source separation," Proc. Int. Conf. Ind. Compon. Anal., Blind Signal Separation, vol. 4, August 2002.
[29] M. S. Lewicki and T. J. Sejnowski, "Learning overcomplete representations," Neural Computation, vol.12, no. 2, pp. 337-365, 2000.
[30] M. I. D. Morgan, "A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation," IEEE Int. Conf., Acoust. Speech, and Signal Processing (ICASSP), 2002.
[31] F. Nesta, T. S. Wada, and B. H. Juang, "Coherent spectral estimation for a robust solution of the permutation problem," IEEE Workshop Applications of Signal Processing to Audio and Acoustics, pp. 105-108, 2009.
[32] D. Nion, K. N. Mokios, N. D. Sidiropoulos and A. Potamianos, "Batch and adaptive PARAFAC-based blind separation of convolutive speech mixtures," IEEE Trans. on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1193-1207, 2010.
[33] Huang-Yi Li, "Solving the permutation problem in frequency domain source separation based on the correlation of envelopes between frequencies," 國立清華大學, 2015.
[34] Bo-Rui Chen, "Source separation in frequency domain: computation cost reduction and sound quality enhancement," 國立清華大學, 2017.
[35] D. T. Pham, C. Serviere and H. Boumaraf, "Blind separation of convolutive audio mixtures using nonstationarity," Proceedings of the International Cartographic Association, pp. 981-986, April 2003.
[36] H. Traunmüller and A. Eriksson, "The frequency range of the voice fundamental in the speech of male and female adults," Unpublished Manuscript, 1995.
[37] A. De Cheveigné and H. Kawahara, "YIN, a fundamental frequency estimator for speech and music," J. Acoust. Soc. of Am. Vol. 111, no. 4, pp. 1917-1930, 2002.
[38] G. B. Stan, J. J. Embrechts and D. Archambeau, "Comparison of different impulse response measurement techniques," J. Audio Eng. Soc. Vol. 50, no. 4, pp. 249-262, 2002.
[39] J. Benesty, J. Chen, Y. Huang and I. Cohen, "Pearson correlation coefficient," Noise Reduction in Speech Processing. Springer, Berlin, Heidelberg, pp. 1-4, 2009.
[40] M. Müller. Information retrieval for music and motion. New York: Springer, 2007.

簡易檢索 / 詳目顯示

相關論文