研究生: |
許暐彤 Hsu, Wei-Tung |
---|---|
論文名稱: |
保護易受傷害人群隱私的自動語音辨識: 以阿茲海默症患者為例 Privacy-Preserved Automatic Speech Recognition for Vulnerable Populations: A Case Study of People with Alzheimer's Disease |
指導教授: |
李祈均
Lee, Chi-Chun |
口試委員: |
王新民
Wang, Hsin-Min 曹昱 Tsao, Yu 周志遠 Chou, Jerry |
學位類別: |
碩士 Master |
系所名稱: |
電機資訊學院 - 電機工程學系 Department of Electrical Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 英文 |
論文頁數: | 27 |
中文關鍵詞: | 自動語音辨識 、阿茲海默症 、隱私保護 、聯邦學習 、異質性 |
外文關鍵詞: | Automatic speech recognition, Alzheimer’s disease, privacy preservation, federated learning, heterogeneity |
相關次數: | 點閱:65 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
為易受傷害人群(如阿茲海默症患者)設計的自動語音辨識系統,可以幫助醫療院所電子化和自動化診斷療程,其中模型使用和訓練時的隱私保護特別重要。使用模型時,可以藉由遮蔽特徵向量中敏感資訊的維度來保護隱私,但現有方法需人為設置閥值來生成僅含 1 或 0 的遮蔽向量。分群聯邦學習能保護訓練時的隱私,並解決聯邦學習常遇到的異質性問題。它將類似的資料樣本組成一群,每群訓練對應的客製化模型,同時不暴露個人資訊。但現有方法常忽略分群依據的設計。我們提出兩個演算法針對保護阿茲海默症患者隱私的自動語音辨識系統。健康資訊消除演算法在系統中加入端對端訓練而成的開關網路,在不使用人為設置閥值的狀況下,生成遮蔽向量來移除敏感資訊。相較於未做保護的基線模型,這方法將失智症保護功效提升至 33.33%,同時只犧牲 0.1% 的詞錯率。確保機密跨機構合作演算法利用分群聯邦學習,根據音檔的字元分布分群,提升系統表現,和傳統聯邦學習相比詞錯率進步幅度可達 4.67%。此方法降低了群內停頓
使用上的異質性,而停頓的使用方式可能對為阿茲海默症患者設計的自動語音辨識系統有重要影響。
Adapting automatic speech recognition (ASR) for vulnerable populations, such as people with Alzheimer’s disease (AD), while ensuring privacy during both inference and training, is important for medical institutions to digitize and automate treatment. Privacy during inference can be preserved by masking sensitive nodes in model embeddings, but recent method uses manually-set thresholds for hard decision masks. Cluster-based federated learning (FL) preserves privacy during training and addresses heterogeneity by clustering similar samples to train cluster-specific models without sharing personal information, though often neglecting the importance of clustering metrics. We introduce two algorithms for privacy-
preserved AD-oriented ASR. The health information exclusion (HIE) algorithm adds an end-to-end trained toggling network into the ASR model to conceal medical conditions, enhancing dementia protection efficacy (DPE) by 33.33% with a slight word error rate (WER) increase of 0.1%, compared to unprotected baseline. The confidentiality-ensured inter-institutional collaboration (CIC) algorithm uses cluster-based FL to group samples with similar token distributions and train cluster-specific models, achieving up to 4.67% reduction in WER compared to traditional FL. This reduction is achieved by lowering the within-cluster heterogeneity in pause usage, which might be a key factor for AD-oriented ASR.
[1] T. Wang, J. Deng, M. Geng, Z. Ye, S. Hu, Y. Wang, M. Cui, Z. Jin, X. Liu, and H. Meng,
“Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection,” in Proc. Interspeech 2022, pp. 4825–4829, 2022.
[2] Y. Wang, J. Deng, T. Wang, B. Zheng, S. Hu, X. Liu, and H. Meng, “Exploiting prompt
learning with pre-trained language models for alzheimer's disease detection,” in ICASSP
2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 1–5, 2023.
[3] M. A. Jalal, P. Peso Parada, J. Zhang, M. Ozay, K. Saravanan, M. Han, J. I. Lee, and
S. Jung, “On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on
Flexible Location Gradient Reversal Layer,” in Proc. INTERSPEECH 2023, pp. 780–784,
2023.
[4] D. Luong, M. Tran, S. Gharib, K. Drossos, and T. Virtanen, “Representation learning
for audio privacy preservation using source separation and robust adversarial learning,”
in 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
(WASPAA), pp. 1–5, 2023.
[5] P.-F. Zhang, G. Bai, H. Yin, and Z. Huang, “Proactive privacy-preserving learning for
cross-modal retrieval,” ACM Transactions on Information Systems, vol. 41, no. 2, pp. 1–
23, 2023.
[6] Y.-L. Huang, B.-H. Su, Y.-W. P. Hong, and C.-C. Lee, “An Attribute-Aligned Strategy for
Learning Speech Representation,” in Proc. Interspeech 2021, pp. 1179–1183, 2021.
[7] Y.-L. Huang, B.-H. Su, Y.-W. P. Hong, and C.-C. Lee, “An Attention-Based Method for
Guiding Attribute-Aligned Speech Representation Learning,” in Proc. Interspeech 2022,
pp. 5030–5034, 2022.
[8] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “CommunicationEfficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th
International Conference on Artificial Intelligence and Statistics (A. Singh and J. Zhu,
eds.), vol. 54 of Proceedings of Machine Learning Research, pp. 1273–1282, PMLR, 20–
22 Apr 2017.
[9] J. Yuan, X. Cai, Y. Bian, Z. Ye, and K. Church, “Pauses for detection of alzheimer's
disease,” Frontiers in Computer Science, vol. 2, p. 624488, 2021.
[10] K. Nandury, A. Mohan, and F. Weber, “Cross-silo federated training in the cloud with
diversity scaling and semi-supervised learning,” in ICASSP 2021-2021 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3085–3089, IEEE,
2021.
[11] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems,
vol. 2, pp. 429–450, 2020.
[12] T. Shen, J. Zhang, X. Jia, F. Zhang, Z. Lv, K. Kuang, C. Wu, and F. Wu, “Federated
mutual learning: a collaborative machine learning method for heterogeneous data, models,
and objectives,” Frontiers of Information Technology & Electronic Engineering, vol. 24,
no. 10, pp. 1390–1402, 2023.
[13] S. M. Vasunilashorn, N. Lunardi, J. C. Newman, G. Crosby, L. Acker, T. Abel, S. Bhatnagar, C. Cunningham, R. de Cabo, L. Dugan, et al., “Preclinical and translational models
for delirium: Recommendations for future research from the nidus delirium network,”
Alzheimer’s & dementia, vol. 19, no. 5, pp. 2150–2174, 2023.
[14] B. Farahani, S. Tabibian, and H. Ebrahimi, “Towards a personalized clustered federated
learning: A speech recognition case study,” IEEE Internet of Things Journal, 2023.
[15] L. Huang, A. L. Shea, H. Qian, A. Masurkar, H. Deng, and D. Liu, “Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time
using distributed electronic medical records,” Journal of biomedical informatics, vol. 99,
p. 103291, 2019.
[16] S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney, “Alzheimer's Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge,” in Proc.
Interspeech 2020, pp. 2172–2176, 2020.
[17] A. M. Lanzi, A. K. Saylor, D. Fromm, H. Liu, B. MacWhinney, and M. L. Cohen, “Dementiabank: Theoretical rationale, protocol, and illustrative analyses,” American Journal
of Speech-Language Pathology, vol. 32, no. 2, pp. 426–438, 2023.
[18] H. Goodglass, E. Kaplan, and B. Barresi, “Bdae-3: Boston diagnostic aphasia examination,”
[19] F. Wang, J. Cheng, W. Liu, and H. Liu, “Additive margin softmax for face verification,”
IEEE Signal Processing Letters, vol. 25, no. 7, pp. 926–930, 2018.
[20] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, pp. 369–376, 2006.
[21] J. Tian, N. C. Mithun, Z. Seymour, H.-p. Chiu, and Z. Kira, “Recall loss for imbalanced
image classification and semantic segmentation,” 2020.
[22] A. Baevski, W.-N. Hsu, Q. Xu, A. Babu, J. Gu, and M. Auli, “Data2vec: A general framework for self-supervised learning in speech, vision and language,” in International Conference on Machine Learning, pp. 1298–1312, PMLR, 2022.
[23] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,”
arXiv preprint arXiv:1611.01144, 2016.
[24] A. C. Morris, V. Maier, and P. Green, “From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition,” in Proc. Interspeech 2004,
pp. 2765–2768, 2004.
[25] W. A. Orenstein, R. H. Bernier, T. J. Dondero, A. R. Hinman, J. S. Marks, K. J. Bart, and
B. Sirotkin, “Field evaluation of vaccine efficacy.,” Bulletin of the World Health Organization, vol. 63, no. 6, p. 1055, 1985.
[26] Z. Xiong, Z. Cheng, X. Lin, C. Xu, X. Liu, D. Wang, X. Luo, Y. Zhang, H. Jiang, N. Qiao,
et al., “Facing small and biased data dilemma in drug discovery with enhanced federated
learning approaches,” Science China Life Sciences, pp. 1–11, 2021.