研究生: |
張鈞閔 Chang, Chun-Min |
---|---|
論文名稱: |
運用對抗式學習網路結合多觀點輔助開發強健性聲音情緒識別 Development of Robust Speech Emotion Recognition using Adversarial Learning Network with Multiple Perspective Auxiliary |
指導教授: |
李祈均
Lee, Chi-Chun |
口試委員: |
劉奕汶
Liu, Yi-Wen 馬席彬 Ma, Hsi-Pin 曹昱 Tsao, Yu 古倫維 Ku, Lun-Wei |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 英文 |
論文頁數: | 101 |
中文關鍵詞: | 跨資料庫情緒辨識 、對抗式學習 、轉移學習 、領域適應 |
外文關鍵詞: | Cross-corpus emotion recognition, Adversarial learning, Transfer learning, Domain adaptation |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
聲音情緒辨識從聲音去提取情緒資訊以此幫助機器感知人類的情緒進而去了解他們的心理狀態。然而因為無法蒐錄大量有標籤的情緒資料以及取得這些相關情境情緒的變異度,情境的複雜度都會讓聲音情緒辨識的建模變得更加困難。在實際使用環境上,最大的挑戰就是我們沒有相當足夠的情緒資料來訓練可信的情緒模型。在這篇論文裡,我們嘗試著重在運用對抗式學習網路結合多觀點輔助開發商業應用之強健性聲音情緒識別。藉由將多重觀點的概念運用在以領域遷移與轉移學習來做開發的情緒識別裡,目標資料庫能使用來源資料庫的資訊做更好的訓練藉此來達到聲音情緒識別的落地化-跨資料庫情緒識別建模。
在這篇論文裡面,我們從不同角度出發提出了兩個結合對抗學習與多觀點輔助的架構,跨資料庫整合與領域適應。第一,我們提出了跨資料庫整合的方式,藉由融合多重觀點的資訊來直接去對低資料目標資料庫建模,透過外資料庫對抗資訊輔助學習了內情緒資料庫的強化式聲音向量來提升情緒識別的準確率。第二,我們以領域g適應為出發點提出ADESCO網絡,以此來解決資料庫間的錯配問題和情感失真問題並且建立共享分佈來確保領域遷移地進行,除此之外我們也將多重觀點的概念融入以此直接鍵結來源與目標資料庫來限制領域遷移演算法能找到正確的差異來提高情緒識別的準確率。而我們的結果也與目前最先進的算法比較,並且得到好的結果,同時我們也做了一連串的分析來展現這兩個架構的效率與可行性。
Speech emotion recognition(SER) has been developed to derive the emotion information helping machine to perceive emotion and to evaluate the status of mind.
However, the complexity of contexts takes more difficulty to model SER due to hardly assessment for large-scaled labeled emotion corpus and variability of context-dependent emotion.
In this dissertation, we dress the development of robust speech emotion recognition for commercial application. By locating the concept of the alternative perspective in two different SER researches--the cross-corpora integration and the domain adaptation, the target emotion database employs the perspective from other emotion databases to make a good training of the model with cross-corpus emotion recognition which is an alternative landing method for commercial application.
We propose two different networks, cross-corpora integration and domain adaptation with adversarial training and multiple perspective according to distinct angles.
First, we proposed a cross-corpora integration method by applying multiple emotion perspectives to strengthen the lacked-of-data emotion recognition based on the transferring method learning a vector for lacked-of-data database through adversarially learning from large corpus. Our result outperforms those of the state-of-the arts, and also a series of analysis show the efficacy of our proposed model.
Second, the ADESCO network solves the mismatch and semantic distortion between emotion databases. It generalizes the share distribution as well as gathers the source perspective connecting the source and target directly with anchor to constrain the distribution of the classifier to increase the performance of SER.
[1] K. Panettal, “5 trends appear on the gartner hype cycle for emerging technologies, 2019.”
[2] D. E. Broadbent, Perception and communication. Elsevier, 2013.
[3] M. Nardelli, G. Valenza, A. Greco, A. Lanata, and E. P. Scilingo, “Arousal recognition system based on heartbeat dynamics during auditory elicitation,” in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, pp. 6110–6113, IEEE, 2015.
[4] R. W. Picard and R. Picard, Affective computing, vol. 252. MIT press Cambridge, 1997.
[5] S. S. Rautaray and A. Agrawal, “Vision based hand gesture recognition for human computer interaction: a survey,” Artificial Intelligence Review, vol. 43, no. 1, pp. 1–54, 2015.
[6] M. Chen, Y. Zhang, Y. Li, M. M. Hassan, and A. Alamri, “Aiwac: Affective interaction through wearable computing and cloud technology,” IEEE Wireless Communications, vol. 22, no. 1, pp. 20–27, 2015.
[7] W.-L. Zheng and B.-L. Lu, “Investigating critical frequency bands and channels for eeg-based emotion recognition with deep neural networks,” IEEE Transactions on Autonomous Mental Development, vol. 7, no. 3, pp. 162–175, 2015.
[8]M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: Features, classification schemes, and databases,” Pattern Recognition, vol. 44, no. 3, pp. 572–587, 2011.
[9]R. Banse and K. R. Scherer, “Acoustic profiles in vocal emotion expression.,” Journal of personality and social psychology, vol. 70, no. 3, p. 614, 1996.
[10]K. R. Scherer and J. S. Oshinsky, “Cue utilization in emotion attribution from auditory stimuli,” Motivation and emotion, vol. 1, no. 4, pp. 331–346, 1977.
[11] J.-A. Bachorowski, “Vocal expression and perception of emotion,” Current directions in psychological science, vol. 8, no. 2, pp. 53–57, 1999.
[12] A. Camurri, I. Lagerlöf, and G. Volpe, “Recognizing emotion from dance movement: comparison of spectator recognition and automated techniques,” International journal of human-computer studies, vol. 59, no. 1-2, pp. 213–225, 2003.
[13] R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, “Emotion recognition in human-computer interaction,” IEEE Signal processing magazine, vol. 18, no. 1, pp. 32–80, 2001.
[14] D. L. Roter, R. M. Frankel, J. A. Hall, and D. Sluyter, “The expression of emotion through nonverbal behavior in medical visits,” Journal of general internal medicine, vol. 21, no. 1, pp. 28–34, 2006.
[15] C.-M. Chang and C.-C. Lee, “Adversarially-enriched acoustic code vector learned from out-of-context affective corpus for robust emotion recognition,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7395–7399, IEEE, 2019.
[16] C.-M. Chang and C.-C. Lee, “Fusion of multiple emotion perspectives: Improving affect recognition through integrating cross-lingual emotion information,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pp. 5820–5824,IEEE, 2017.
[17] C.-M. Chang, B.-H. Su, S.-C. Lin, J.-L. Li, and C.-C. Lee, “A bootstrapped multi-view weighted kernel fusion framework for cross-corpus integration of multimodal emotion recognition,” in Affective Computing and Intelligent Interaction (ACII), 2017 Seventh International Conference on, pp. 377–382, IEEE, 2017.
[18] C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan, “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, p. 335, 2008.
[19] C. Busso, S. Parthasarathy, A. Burmania, M. AbdelWahab, N. Sadoughi, and E. M. Provost, “Msp-improv: An acted corpus of dyadic interactions to study emotion perception,” IEEE Transactions on Affective Computing, vol. 8, no. 1, pp. 67–80, 2016.
[20] A. M. Turk, “Amazon mechanical turk,” Retrieved August, vol. 17, p. 2012, 2012.
[21] H.-C. Chou, W.-C. Lin, L.-C. Chang, C.-C. Li, H.-P. Ma, and C.-C. Lee, “Nnime: The nthu-ntua chinese interactive multimodal emotion corpus,” in Affective Computing and ntelligent Interaction (ACII), 2017 Seventh International Conference on, pp. 292–298, IEEE, 2017.
[22] A. Metallinou, C.-C. Lee, C. Busso, S. Carnicke, and S. Narayanan, “The usc creativeit database: A multimodal database of theatrical improvisation,” Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, p. 55, 2010.
[23] M. Grimm, K. Kroschel, and S. Narayanan, “The vera am mittag german audio-visual emotional speech database,” in Multimedia and Expo, 2008 IEEE International Conference on, pp. 865–868, IEEE, 2008.
[24] F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, “Introducing the recola multimodal corpus of remote collaborative and affective interactions,” in 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp. 1–8, IEEE, 2013.
[25] R. Van Bezooijen, S. A. Otto, and T. A. Heenan, “Recognition of vocal expressions of emotion: A three-nation study to identify universal characteristics,” Journal of Cross-Cultural Psychology, vol. 14, no. 4, pp. 387–406, 1983.
[26] B. Schuller, B. Vlasenko, F. Eyben, M. Wollmer, A. Stuhlsatz, A. Wendemuth, and G. Rigoll, “Cross-corpus acoustic emotion recognition: Variances and strategies,” IEEE Transactions on Affective Computing, vol. 1, no. 2, pp. 119–131, 2010.
[27] J. Deng, X. Xu, Z. Zhang, S. Frühholz, and B. Schuller, “Universum autoencoder-based domain adaptation for speech emotion recognition,” IEEE Signal Processing Letters, vol. 24, no. 4, pp. 500–504, 2017.
[28] J. A. Russell, “Is there universal recognition of emotion from facial expression? a review of the cross-cultural studies.,” Psychological bulletin, vol. 115, no. 1, p. 102, 1994.
[29]P. Ekman, “Universals and cultural differences in facial expressions of emotion.,” in Nebraska symposium on motivation, University of Nebraska Press, 1971.
[30] N. Liu, Y. Zong, B. Zhang, L. Liu, J. Chen, G. Zhao, and J. Zhu, “Unsupervised cross-corpus speech emotion recognition using domain-adaptive subspace learning,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5144–5148, IEEE, 2018.
[31]S. Sahu, R. Gupta, G. Sivaraman, W. AbdAlmageed, and C. Espy-Wilson, “Adversarial auto-encoders for speech based emotion recognition,” Proc. Interspeech 2017, pp. 1243–1247, 2017.
[32] J. Gideon, M. McInnis, and E. M. Provost, “Improving cross-corpus speech emotion recognition with adversarial discriminative domain generalization (addog),” IEEE Transactions on Affective Computing, 2019.
[33] K.FengandT.Chaspari, “A review of generalizable transfer learning in automatic emotion recognition,” Frontiers in Computer Science, vol. 2, p. 9, 2020.
[34]A. Hassan, R. Damper, and M. Niranjan, “On acoustic emotion recognition: compensating for covariate shift,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 7, pp. 1458–1468, 2013.
[35] Y.-Q. Miao, R. Araujo, and M. S. Kamel, “Cross-domain facial expression recognition using supervised kernel mean matching,” in 2012 11th International Conference on Machine Learning and Applications, vol. 2, pp. 326–332, IEEE, 2012.
[36] T. Kanamori, S. Hido, and M. Sugiyama, “A least-squares approach to direct importance estimation,” Journal of Machine Learning Research, vol. 10, no. Jul, pp. 1391–1445, 2009.
[37] Y. Tsuboi, H. Kashima, S. Hido, S. Bickel, and M. Sugiyama, “Direct density ratio estimation for large-scale covariate shift adaptation,” Journal of Information Processing, vol. 17, pp. 138–155, 2009.
[38] J. Gideon, S. Khorram, Z. Aldeneh, D. Dimitriadis, and E. M. Provost, “Progressive neural networks for transfer learning in emotion recognition,” Proc. Interspeech 2017, pp. 1098–1102, 2017.
[39] S. Latif, R. Rana, S. Younis, J. Qadir, and J. Epps, “Cross corpus speech emotion classification-an effective transfer learning technique,” arXiv preprint arXiv:1801.06353, 2018.
[40] S. Latif, R. Rana, S. Younis, J. Qadir, and J. Epps, “Transfer learning for improving speech emotion classification accuracy,” Proc. Interspeech 2018, pp. 257–261, 2018.
[41] J. Chang and S. Scherer, “Learning representations of emotional speech with deep convolutional generative adversarial networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, pp. 2746–2750, IEEE, 2017.
[42] M. Abdelwahab and C. Busso, “Domain adversarial for acoustic emotion recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 12, pp. 2423–2435, 2018.
[43] S. Sahu, R. Gupta, and C. Espy-Wilson, “On enhancing speech emotion recognition using generative adversarial networks,” Proc. Interspeech 2018, pp. 3693–3697, 2018.
[44] X. Zhu, Y. Liu, Z. Qin, and J. Li, “Data augmentation in emotion classification using generative adversarial networks,” arXiv e-prints, pp. arXiv–1711, 2017.
[45] S. E. Eskimez, D. Dimitriadis, R. Gmyr, and K. Kumanati, “Gan-based data generation for speech emotion recognition,” Proc. Interspeech 2020, pp. 3446–3450, 2020.
[46]A. Chatziagapi, G. Paraskevopoulos, D. Sgouropoulos, G. Pantazopoulos, M. Nikandrou, T. Giannakopoulos, A. Katsamanis, A. Potamianos, and S. Narayanan, “Data augmentation using gans for speech emotion recognition.,” in Interspeech, pp. 171–175, 2019.
[47] S. Latif, M. Asim, R. Rana, S. Khalifa, R. Jurdak, and B. W. Schuller, “Augmenting generative adversarial networks for speech emotion recognition,” Proc. Interspeech 2020, pp. 521–525, 2020.
[48] G.-Y. Chao, Y.-S. Lin, C.-M. Chang, and C.-C. Lee, “Enforcing Semantic Consistency for Cross Corpus Valence Regression from Speech Using Adversarial Discrepancy Learning,” in Proc. Interspeech 2019, pp. 1681–1685, 2019.
[49] C.-M. Chang, G.-Y. Chao, and C.-C. Lee, “Enforcing semantic consistency for cross corpus emotion prediction using adversarial discrepancy learning in emotion,” IEEE Transactions on Affective Computing, 2021.
[50] W. Zheng, W. Zheng, and Y. Zong, “Multi-scale discrepancy adversarial network for crosscorpus speech emotion recognition,” Virtual Reality & Intelligent Hardware, vol. 3, no. 1, pp. 65–75, 2021.
[51] J. Deng, Z. Zhang, E. Marchi, and B. Schuller, “Sparse autoencoder-based feature transfer learning for speech emotion recognition,” in 2013 humaine association conference on affective computing and intelligent interaction, pp. 511–516, IEEE, 2013.
[52]J. Deng, X. Xu, Z. Zhang, S. Frühholz, and B. Schuller, “Semisupervised autoencoders for speech emotion recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 1, pp. 31–43, 2018.
[53] C. Fu, J. Shi, C. Liu, C. T. Ishi, and H. Ishiguro, “Aaec: An adversarial autoencoder-based classifier for audio emotion recognition,” in Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, pp. 45–51, 2020.
[54] S. Latif, R. Rana, S. Khalifa, R. Jurdak, J. Epps, and B. W. Schuller, “Multi-task semi-supervised adversarial autoencoding for speech emotion recognition,” IEEE Transactions on Affective Computing, 2020.
[55] C. Fu, C. Liu, C. T. Ishi, and H. Ishiguro, “Maec: Multi-instance learning with an adversarial auto-encoder-based classifier for speech emotion recognition,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6299–6303, 2021.
[56] Y. Gao, J. Liu, L. Wang, and J. Dang, “Domain-adversarial autoencoder with attention based feature level fusion for speech emotion recognition,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6314–6318, IEEE, 2021.
[57] F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-source audio feature extractor,” in Proceedings of the 18th ACM international conference on Multimedia, pp. 1459–1462, 2010.
[58] F. Eyben, K. R. Scherer, B. W. Schuller, J. Sundberg, E. André, C. Busso, L. Y. Devillers, J. Epps, P. Laukka, S. S. Narayanan, et al., “The geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing,” IEEE Transactions on Affective Computing, vol. 7, no. 2, pp. 190–202, 2016.
[59] A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey, “Adversarial autoencoders,”arXiv e-prints, pp. arXiv–1511, 2015.
[60] H. Li, Y. Kim, C.-H. Kuo, and S. Narayanan, “Acted vs. improvised: Domain adaptation for elicitation approaches in audio-visual emotion recognition,” arXiv preprint arXiv:2104.01978, 2021.
[61] S. M. Feraru, D. Schuller, et al., “Cross-language acoustic emotion recognition: An overview and some tendencies,” in 2015 International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 125–131, IEEE, 2015.
[62] A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt, and B. Schölkopf, “Covariate shift by kernel mean matching,” in Dataset Shift in Machine Learning, pp. 131–160, MIT Press, 2009.
[63] S. Xie, Z. Zheng, L. Chen, and C. Chen, “Learning semantic representations for unsupervised domain adaptation,” in International Conference on Machine Learning, pp. 5423–5432, 2018.
[64] S. Motiian, M. Piccirilli, D. A. Adjeroh, and G. Doretto, “Unified deep supervised domain adaptation and generalization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
[65] Z. Luo, Y. Zou, J. Hoffman, and L. F. Fei-Fei, “Label efficient learning of transferable representations acrosss domains and tasks,” in Advances in Neural Information Processing Systems 30 (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), pp. 165–177, Curran Associates, Inc., 2017.
[66] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, “Maximum classifier discrepancy for unsupervised domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3723–3732, 2018.
[67] R. Moraffah, K. Shu, A. Raglin, and H. Liu, “Deep causal representation learning for unsupervised domain adaptation,” arXiv preprint arXiv:1910.12417, 2019.
[68] M. Cartwright, J. Cramer, J. Salamon, and J. P. Bello, “Tricycle: Audio representation learning from sensor network data using self-supervision,” in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 278–282, IEEE, 2019.
[69] K. Saito, D. Kim, S. Sclaroff, and K. Saenko, “Universal domain adaptation through self supervision,” arXiv preprint arXiv:2002.07953, 2020.
[70] Y.-S. L. Gao-Yi Chao, Chun-Min Chang and C.-C. Lee, “Enforcing semantic consistency for cross corpus valence regression from speech using adversarial discrepancy learning,” Interspeech2019, 2019.
[71] C.-M. Chang, G.-Y. Chao, and C.-C. Lee, “Enforcing semantic consistency for cross cor- pus emotion prediction using adversarial discrepancy learning,” IEEE Transactions on Affective Computing, pp. 1–1, 2021.
[72] C. Chang and C. Lee, “Learning enhanced acoustic latent representation for small scale affective corpus with adversarial cross corpora integration,” IEEE Transactions on Affective Computing, pp. 1–1, nov 5555.
[73]R. Lotfian and C. Busso, “Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings,” IEEE Transactions on Affective Computing, 2017.
[74]B. Schuller, B. Vlasenko, F. Eyben, M. Wöllmer, A. Stuhlsatz, A. Wendemuth, and G. Rigoll, “Cross-corpus acoustic emotion recognition: Variances and strategies,” IEEE Transactions on Affective Computing, vol. 1, no. 2, pp. 119–131, 2010.
[75] P. Song, W. Zheng, S. Ou, X. Zhang, Y. Jin, J. Liu, and Y. Yu, “Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization,” Speech Communication, vol. 83, pp. 34–41, 2016.
[76] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in European conference on computer vision, pp. 443–450, Springer, 2016.
[77] H. Luo and J. Han, “Nonnegative matrix factorization based transfer subspace learning for cross-corpus speech emotion recognition,” IEEE/ACM Transactions on Audio, Speech,and Language Processing, vol. 28, pp. 2047–2060, 2020.
[78] J. Zhang, L. Jiang, Y. Zong, W. Zheng, and L. Zhao, “Cross-corpus speech emotion recognition using joint distribution adaptive regression,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3790–3794,2021.
[79]J. Deng, Z. Zhang, F. Eyben, and B. Schuller, “Autoencoder-based unsupervised domain adaptation for speech emotion recognition,” IEEE Signal Processing Letters, vol. 21, no. 9, pp. 1068–1072, 2014.
[80] Y. Zong, W. Zheng, T. Zhang, and X. Huang, “Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression,” IEEE signal processing letters, vol. 23, no. 5, pp. 585–589, 2016.
[81] I. H. Laradji and R. Babanezhad, “M-adda: Unsupervised domain adaptation with deep metric learning,” in Domain adaptation for visual understanding, pp. 17–31, Springer, 2020.
[82] S.-w. Lee, “Ensemble of domain adversarial neural networks for speech emotion recognition,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 374–379, 2021.
[83] Y. Xiao, H. Zhao, and T. Li, “Learning class-aligned and generalized domain-invariant representations for speech emotion recognition,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 4, no. 4, pp. 480–489, 2020.
[84] B.-H. Su and C.-C. Lee, “Unsupervised cross-corpus speech emotion recognition using a multi-source cycle-gan,” IEEE Transactions on Affective Computing, pp. 1–1, 2022.
[85] S.-w. Lee, “Domain generalization with triplet network for cross-corpus speech emotion recognition,” in 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 389–396, 2021.
[86] K.Fengand T.Chaspari, “Few-shot learning in emotion recognition of spontaneous speech using a siamese neural network with adaptive sample pair formation,” IEEE Transactions on Affective Computing, pp. 1–1, 2021.
[87] W. Zhang, M. Ragab, and R. Sagarna, “Robust domain-free domain generalization with class-aware alignment,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2870–2874, 2021.
[88] Y. Sun, E. Tzeng, T. Darrell, and A. A. Efros, “Unsupervised domain adaptation through
self-supervision,” arXiv preprint arXiv:1909.11825, 2019.
[89] B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. S. Narayanan, “The interspeech 2010 paralinguistic challenge,” in Eleventh Annual Conference of the International Speech Communication Association, 2010.
[90] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A
theory of learning from different domains,” Machine learning, vol. 79, no. 1-2, pp. 151–
175, 2010.
[91] C.-Y. Lee, T. Batra, M. H. Baig, and D. Ulbricht, “Sliced wasserstein discrepancy for unsupervised domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10285–10295, 2019.