簡易檢索 / 詳目顯示

研究生: 許暐彤
Hsu, Wei-Tung
論文名稱: 保護易受傷害人群隱私的自動語音辨識: 以阿茲海默症患者為例
Privacy-Preserved Automatic Speech Recognition for Vulnerable Populations: A Case Study of People with Alzheimer's Disease
指導教授: 李祈均
Lee, Chi-Chun
口試委員: 王新民
Wang, Hsin-Min
曹昱
Tsao, Yu
周志遠
Chou, Jerry
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 27
中文關鍵詞: 自動語音辨識阿茲海默症隱私保護聯邦學習異質性
外文關鍵詞: Automatic speech recognition, Alzheimer’s disease, privacy preservation, federated learning, heterogeneity
相關次數: 點閱:65下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 為易受傷害人群(如阿茲海默症患者)設計的自動語音辨識系統,可以幫助醫療院所電子化和自動化診斷療程,其中模型使用和訓練時的隱私保護特別重要。使用模型時,可以藉由遮蔽特徵向量中敏感資訊的維度來保護隱私,但現有方法需人為設置閥值來生成僅含 1 或 0 的遮蔽向量。分群聯邦學習能保護訓練時的隱私,並解決聯邦學習常遇到的異質性問題。它將類似的資料樣本組成一群,每群訓練對應的客製化模型,同時不暴露個人資訊。但現有方法常忽略分群依據的設計。我們提出兩個演算法針對保護阿茲海默症患者隱私的自動語音辨識系統。健康資訊消除演算法在系統中加入端對端訓練而成的開關網路,在不使用人為設置閥值的狀況下,生成遮蔽向量來移除敏感資訊。相較於未做保護的基線模型,這方法將失智症保護功效提升至 33.33%,同時只犧牲 0.1% 的詞錯率。確保機密跨機構合作演算法利用分群聯邦學習,根據音檔的字元分布分群,提升系統表現,和傳統聯邦學習相比詞錯率進步幅度可達 4.67%。此方法降低了群內停頓
    使用上的異質性,而停頓的使用方式可能對為阿茲海默症患者設計的自動語音辨識系統有重要影響。


    Adapting automatic speech recognition (ASR) for vulnerable populations, such as people with Alzheimer’s disease (AD), while ensuring privacy during both inference and training, is important for medical institutions to digitize and automate treatment. Privacy during inference can be preserved by masking sensitive nodes in model embeddings, but recent method uses manually-set thresholds for hard decision masks. Cluster-based federated learning (FL) preserves privacy during training and addresses heterogeneity by clustering similar samples to train cluster-specific models without sharing personal information, though often neglecting the importance of clustering metrics. We introduce two algorithms for privacy-
    preserved AD-oriented ASR. The health information exclusion (HIE) algorithm adds an end-to-end trained toggling network into the ASR model to conceal medical conditions, enhancing dementia protection efficacy (DPE) by 33.33% with a slight word error rate (WER) increase of 0.1%, compared to unprotected baseline. The confidentiality-ensured inter-institutional collaboration (CIC) algorithm uses cluster-based FL to group samples with similar token distributions and train cluster-specific models, achieving up to 4.67% reduction in WER compared to traditional FL. This reduction is achieved by lowering the within-cluster heterogeneity in pause usage, which might be a key factor for AD-oriented ASR.

    誌謝 i 摘要 iii Abstract v 1 Introduction 1 2 HIE algorithm 5 2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Training of the toggling network . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Structure of the toggling network . . . . . . . . . . . . . . . . . . . . 8 2.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.2 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 CIC algorithm 15 3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Cluster-based FL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Token Diversity (TokDiv) . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 Experimental result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Conclusion 23 References 25 List of Figures 2.1 HIE algorithm adds a toggling network in ASR to hide dementia status in framewise embeddings. The network is trained with a dual-branch structure in an end-to-end manner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 UMAP plots of masked embeddings from AD-free and ASR-free branch . . . 12 3.1 The CIC algorithm groups samples with similar TokDiv into a cluster. Clients then train these samples federally to form a model for decoding other samples in the same cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Isomap plots of diverse and random client settings with client and cluster labels 21 List of Tables 2.1 Demographics of healthy controls (HC) and Alzheimer’s diseased (AD) for HIE 6 2.2 ASR and privacy preservation performances of different models . . . . . . . . 11 2.3 ASR and privacy preservation performances for ablation study . . . . . . . . . 13 3.1 Demographics of data among server and clients for CIC . . . . . . . . . . . . . 16 3.2 ASR performances (WER %) of models in diverse and random client settings . 19 3.3 Characteristic of each cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    [1] T. Wang, J. Deng, M. Geng, Z. Ye, S. Hu, Y. Wang, M. Cui, Z. Jin, X. Liu, and H. Meng,
    “Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection,” in Proc. Interspeech 2022, pp. 4825–4829, 2022.
    [2] Y. Wang, J. Deng, T. Wang, B. Zheng, S. Hu, X. Liu, and H. Meng, “Exploiting prompt
    learning with pre-trained language models for alzheimer's disease detection,” in ICASSP
    2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing
    (ICASSP), pp. 1–5, 2023.
    [3] M. A. Jalal, P. Peso Parada, J. Zhang, M. Ozay, K. Saravanan, M. Han, J. I. Lee, and
    S. Jung, “On-Device Speaker Anonymization of Acoustic Embeddings for ASR based on
    Flexible Location Gradient Reversal Layer,” in Proc. INTERSPEECH 2023, pp. 780–784,
    2023.
    [4] D. Luong, M. Tran, S. Gharib, K. Drossos, and T. Virtanen, “Representation learning
    for audio privacy preservation using source separation and robust adversarial learning,”
    in 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
    (WASPAA), pp. 1–5, 2023.
    [5] P.-F. Zhang, G. Bai, H. Yin, and Z. Huang, “Proactive privacy-preserving learning for
    cross-modal retrieval,” ACM Transactions on Information Systems, vol. 41, no. 2, pp. 1–
    23, 2023.
    [6] Y.-L. Huang, B.-H. Su, Y.-W. P. Hong, and C.-C. Lee, “An Attribute-Aligned Strategy for
    Learning Speech Representation,” in Proc. Interspeech 2021, pp. 1179–1183, 2021.
    [7] Y.-L. Huang, B.-H. Su, Y.-W. P. Hong, and C.-C. Lee, “An Attention-Based Method for
    Guiding Attribute-Aligned Speech Representation Learning,” in Proc. Interspeech 2022,
    pp. 5030–5034, 2022.
    [8] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “CommunicationEfficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th
    International Conference on Artificial Intelligence and Statistics (A. Singh and J. Zhu,
    eds.), vol. 54 of Proceedings of Machine Learning Research, pp. 1273–1282, PMLR, 20–
    22 Apr 2017.
    [9] J. Yuan, X. Cai, Y. Bian, Z. Ye, and K. Church, “Pauses for detection of alzheimer's
    disease,” Frontiers in Computer Science, vol. 2, p. 624488, 2021.
    [10] K. Nandury, A. Mohan, and F. Weber, “Cross-silo federated training in the cloud with
    diversity scaling and semi-supervised learning,” in ICASSP 2021-2021 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3085–3089, IEEE,
    2021.
    [11] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems,
    vol. 2, pp. 429–450, 2020.
    [12] T. Shen, J. Zhang, X. Jia, F. Zhang, Z. Lv, K. Kuang, C. Wu, and F. Wu, “Federated
    mutual learning: a collaborative machine learning method for heterogeneous data, models,
    and objectives,” Frontiers of Information Technology & Electronic Engineering, vol. 24,
    no. 10, pp. 1390–1402, 2023.
    [13] S. M. Vasunilashorn, N. Lunardi, J. C. Newman, G. Crosby, L. Acker, T. Abel, S. Bhatnagar, C. Cunningham, R. de Cabo, L. Dugan, et al., “Preclinical and translational models
    for delirium: Recommendations for future research from the nidus delirium network,”
    Alzheimer’s & dementia, vol. 19, no. 5, pp. 2150–2174, 2023.
    [14] B. Farahani, S. Tabibian, and H. Ebrahimi, “Towards a personalized clustered federated
    learning: A speech recognition case study,” IEEE Internet of Things Journal, 2023.
    [15] L. Huang, A. L. Shea, H. Qian, A. Masurkar, H. Deng, and D. Liu, “Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time
    using distributed electronic medical records,” Journal of biomedical informatics, vol. 99,
    p. 103291, 2019.
    [16] S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney, “Alzheimer's Dementia Recognition Through Spontaneous Speech: The ADReSS Challenge,” in Proc.
    Interspeech 2020, pp. 2172–2176, 2020.
    [17] A. M. Lanzi, A. K. Saylor, D. Fromm, H. Liu, B. MacWhinney, and M. L. Cohen, “Dementiabank: Theoretical rationale, protocol, and illustrative analyses,” American Journal
    of Speech-Language Pathology, vol. 32, no. 2, pp. 426–438, 2023.
    [18] H. Goodglass, E. Kaplan, and B. Barresi, “Bdae-3: Boston diagnostic aphasia examination,”
    [19] F. Wang, J. Cheng, W. Liu, and H. Liu, “Additive margin softmax for face verification,”
    IEEE Signal Processing Letters, vol. 25, no. 7, pp. 926–930, 2018.
    [20] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, pp. 369–376, 2006.
    [21] J. Tian, N. C. Mithun, Z. Seymour, H.-p. Chiu, and Z. Kira, “Recall loss for imbalanced
    image classification and semantic segmentation,” 2020.
    [22] A. Baevski, W.-N. Hsu, Q. Xu, A. Babu, J. Gu, and M. Auli, “Data2vec: A general framework for self-supervised learning in speech, vision and language,” in International Conference on Machine Learning, pp. 1298–1312, PMLR, 2022.
    [23] E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,”
    arXiv preprint arXiv:1611.01144, 2016.
    [24] A. C. Morris, V. Maier, and P. Green, “From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition,” in Proc. Interspeech 2004,
    pp. 2765–2768, 2004.
    [25] W. A. Orenstein, R. H. Bernier, T. J. Dondero, A. R. Hinman, J. S. Marks, K. J. Bart, and
    B. Sirotkin, “Field evaluation of vaccine efficacy.,” Bulletin of the World Health Organization, vol. 63, no. 6, p. 1055, 1985.
    [26] Z. Xiong, Z. Cheng, X. Lin, C. Xu, X. Liu, D. Wang, X. Luo, Y. Zhang, H. Jiang, N. Qiao,
    et al., “Facing small and biased data dilemma in drug discovery with enhanced federated
    learning approaches,” Science China Life Sciences, pp. 1–11, 2021.

    QR CODE