簡易檢索 / 詳目顯示

研究生: 黃庭頡
Huang, Ting-Jie
論文名稱: 全局模型隱私保護放大演算法及收斂性分析
Novel Privacy Amplification Algorithm and its Convergence Analysis for Global Model in Federated Learning
指導教授: 黃之浩
Huang, Scott C.-H.
口試委員: 祁忠勇
Chi, Chong-Yung
管延城
Kuan, Yen-Cheng
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 通訊工程研究所
Communications Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 中文
論文頁數: 46
中文關鍵詞: 聯邦學習隱私放大全局模型
外文關鍵詞: federated learning, privacy amplification, global model
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文探討了在聯邦學習中資料隱私保護問題。在聯邦學習的系統中,傳遞訓練完成的模型存在隱私洩漏的風險。差分隱私(DP)是在聯邦學習(FL)系統中泛用的一種隱私保護技術,主要藉由向用戶訓練出來的模型添加設計好的雜訊來達到隱私保護效果。但是,添加的雜訊可能產生負面效果影響學習性能。因此,本論文基於想減少學習的負面影響,同時符合需求的隱私強度而提出了一種隱私放大的演算法。
    為了解決這個問題,陸續研究提出許多隱私放大的技術,以提供隱私保證,而不會犧牲性能。本論文的演算法實現放大系統中全局模型(global model)的隱私保護,在原來的演算法裡結合用戶參與的機制以及數據的取樣。數據的取樣能夠使得本地模型(local model)的隱私保護進而提升全局模型,而用戶隨機參與的機制則是使得全局模型的隱私保護效果更有效地提升。此外,我們對演算法進行收斂性分析(convergence analysis),透過證明來保證演算法收斂性並分析訓練的影響。
    研究結果顯示,本論文提出的演算法可以不用犧牲更多的訓練性能而提升隱私保護效果。因此,在相關研究中,該演算法可以有效的訓練及強化隱私保護,對於未來的研究是良好的參考方法。


    The thesis explores the issue of data privacy protection in federated learning(FL). Within the framework of FL, the transmission of completed training models poses inherent privacy leakage risks. Differential privacy (DP) is a widely used technique for protecting privacy in FL systems. It is implemented by adding artificial noise to the local models/data. Nevertheless, the addition of noise may have adverse effects on learning performance.

    To mitigate the resulting performance loss, privacy amplification techniques have been developed to guarantee privacy without sacrificing learning performance. This thesis presents privacy amplification for the global model in FL systems by jointly considering the impact of client sampling and local data subsampling strategies. Data sampling improves the privacy protection of local models, consequently enhancing the global model, while the mechanism for random client participation effectively boosts the privacy protection of the global model. Additionally, we conduct convergence analysis based on the proposed algorithm, ensuring its convergence through proofs and analyzing the impact on training. Research results indicate that the proposed method, compared to original algorithms, can enhance privacy protection without significantly sacrificing training performance. Thus, this algorithm proves effective in training and provides superior privacy protection in the related research focused on improving privacy protection, serving as a valuable reference for future studies.

    Abstract (Chinese) i Abstract ii Acknowledgments (Chinese) iii Table of Contents iv List of Figures vi 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Our Contributions . . . . . . . . . . . . . . . . . . . . . . 3 2 System Model 4 2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . .4 2.2 Preliminaries of DP . . . . . . . . . . . . . . . . . . . . . .6 3 Privacy Amplification and Convergence Analysis 9 3.1 Data Subsampling . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Random Client Check-in Scheme . . . . . . . . . . . . . . . . 10 3.3 Privacy Amplification for Global Model . . . . . . . . . . . 11 3.4 Convergence Analysis of Algorithm 1 . . . . . . . . . . . . . 13 iv 4 Simulation Results and Discussions 16 4.1 Simulation Settings . . . . . . . . . . . . . . . . . . . . . 16 4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . 17 4.2.1 Privacy amplification experiments . . . . . . . .. . . . . 17 4.2.2 Learning performance of Algorithm 1 . . . . . . . . . . . . 20 5 Conclusions 28 A Proof of the Theorem 2 29 B Proof of the Theorem 4 32 B.1 Key Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . 33 B.2 Completing the Proof of Theorem 4 . . .. . . . . . . . . . . . 34 B.3 The Proofs of Key Lemmas . . . . . . . . . . . . . . . . . . . 35 B.3.1 Proof of Lemma 4 . . . . . . . . . . . . . . . . . . . . . . 35 B.3.2 Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . 36 B.3.3 Proof of Lemma 6 . . . . . . . . . . . . . . . . . . . . . . 42 Bibliography 44

    [1] C. Ma, J. Li, M. Ding, H. H. Yang, F. Shu, T. Q. Quek, and H. V. Poor, “On
    safeguarding privacy and security in the framework of federated learning,”
    IEEE Network, vol. 34, no. 4, pp. 242–248, 2020.
    [2] Y. Li, S. Wang, C.-Y. Chi, and T. Q. Quek, “Differentially private federated
    learning in edge networks: The perspective of noise reduction,” IEEE
    Network, vol. 36, no. 5, pp. 167–172, 2022.
    [3] Y. Li, T.-H. Chang, and C.-Y. Chi, “Secure federated averaging algorithm
    with differential privacy,” in Proc. IEEE International Workshop on Machine
    Learning for Signal Processing (MLSP), 2020, pp. 1–6.
    [4] Y. Li, S. Wang, T.-H. Chang, and C.-Y. Chi, “Federated stochastic primaldual
    learning with differential privacy,” arXiv preprint arXiv:2204.12284,
    2022.
    [5] A. Triastcyn and B. Faltings, “Federated learning with Bayesian differential
    privacy,” in Proc. IEEE International Conference on Big Data, 2019, pp.
    2587–2596.
    [6] P. Mohassel and Y. Zhang, “Secureml: A system for scalable privacypreserving
    machine learning,” in Proc. 2017 IEEE Symposium on Security
    and Privacy, 2017, pp. 19–38.
    44
    [7] I. Giacomelli, S. Jha, M. Joye, C. D. Page, and K. Yoon, “Privacy-preserving
    ridge regression with only linearly-homomorphic encryption,” in Proc. International
    Conference on Applied Cryptography and Network Security, 2018, pp.
    243–261.
    [8] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel,
    D. Ramage, A. Segal, and K. Seth, “Practical secure aggregation for
    privacy-preserving machine learning,” in Proc. ACM SIGSAC Conference on
    Computer and Communications Security, 2017, pp. 1175–1191.
    [9] C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,”
    Foundations and Trends® in Theoretical Computer Science, vol. 9,
    no. 3–4, pp. 211–407, 2014.
    [10] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor, “Our data,
    ourselves: Privacy via distributed noise generation,” in Proc. Springer Annual
    International Conference on the Theory and Applications of Cryptographic
    Techniques, May 2006, pp. 486–503.
    [11] A. Girgis, D. Data, S. Diggavi, P. Kairouz, and A. T. Suresh, “Shuffled model
    of differential privacy in federated learning,” in Proc. International Conference
    on Artificial Intelligence and Statistics, 2021, pp. 2521–2529.
    [12] V. Feldman, I. Mironov, K. Talwar, and A. Thakurta, “Privacy amplification
    by iteration,” in Porc. IEEE 59th Annual Symposium on Foundations of
    Computer Science (FOCS), 2018, pp. 521–532.
    [13] B. Balle, G. Barthe, and M. Gaboardi, “Privacy amplification by subsampling:
    Tight analyses via couplings and divergences,” in Proc. ACM Neural
    Information Processing Systems (NIPS), 2018, pp. 6277–6287.
    45
    [ 14] T. Steinke, “Composition of differential privacy & privacy amplification by
    subsampling,” arXiv preprint arXiv:2210.00597, 2022.
    [15] ´U. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and
    A. Thakurta, “Amplification by shuffling: From local to central differential
    privacy via anonymity,” in Proc. ACM-SIAM Symposium on Discrete Algorithms,
    2019, pp. 2468–2479.
    [16] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y Arcas,
    “Communication-efficient learning of deep networks from decentralized data,”
    in Proc. Artificial Intelligence and Statistics, 2017, pp. 1273–1282.
    [17] Y. Li, S. Wang, C.-Y. Chi, and T. Q. Quek, “Differentially private federated
    clustering over non-iid data,” arXiv preprint arXiv:2301.00955, 2023.
    [18] C. Dwork, G. N. Rothblum, and S. Vadhan, “Boosting and differential privacy,”
    in Proc. IEEE Symposium on Foundations of Computer Science, 2010,
    pp. 51–60.
    [19] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence
    of FedAvg on non-IID data,” in Proc. International Conference on Learning
    Representations (ICLR), 2020, pp. 1–26.
    [20] C. L. Blake and C. J. Merz, “UCI repository of machine learning databases,”
    1998, [http://www.ics.uci.edu/rvmlearnIMLRepository.html]. Irvine. CA:
    University of California, Department of Information and Computer Science.
    [21] M. S. E. Mohamed, W.-T. Chang, and R. Tandon, “Privacy amplification for
    federated learning via user sampling and wireless aggregation,” IEEE Journal
    on Selected Areas in Communications, vol. 39, no. 12, pp. 3821–3835, 2021.

    QR CODE