簡易檢索 / 詳目顯示

研究生: 邱信程
Ciou, Sin-Cheng
論文名稱: 共識稀疏主成分分析
Consensus Learning for Sparse Principal Component Analysis
指導教授: 李育杰
Lee, Yuh-Jye
蔡志強
Tsai, Je-Chiang
口試委員: 吳金典
Wu, Chin-Tien
李宏毅
Tsai, Je-ChiangLee, Hung-Yi
學位類別: 碩士
Master
系所名稱: 理學院 - 數學系
Department of Mathematics
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 55
中文關鍵詞: 共識學習聯邦式學習稀疏主成分分析
外文關鍵詞: Consensus Learning, Federated learning, Sparse Principal Component Analysis
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著電腦硬體設備、網路的發展與普及,人工智慧(Artificial Intelligence)成為當代最熱門的技術之一,海量的數據使我們可以解決許多問題。然而近年來許多國家對於隱私權越發重視,紛紛制定了嚴格的隱私條款,許多存儲在不同設備中的數據不能直接共享,使得原本深度學習的技術難以使用。為了解決的這個問題,Google在2016由McMahan等人提出了聯邦式學習(Federated Learning)的概念。在這樣的情境之下,我們提出了共識稀疏主成分分析。主成分分析(Principal Component Analysis, PCA)是一個非常泛用的數據分析工具,被廣泛運用於數據分析,不論是解釋數據,或是降低數據維度上,PCA都有著優異的效果,然而由於PCA是所有原始變數的線性組合,對於變數多的數據,難以解釋其線性轉換後的結果。為了克服這一個缺點,Zou等人在2006年提出了稀疏主成分分析(Sparse Principal Component Analysis, SPCA),SPCA的結果僅由少量的原始變數組成。我們提出的兩個模型:CASPCA和CSSPCA,使用了聯邦式學習中常見的ADMM演算法。在有許多工作單位,以及一個中央主機的情境下,每個工作單位只向中央主機傳遞模型參數,並不會傳遞各自擁有的資料,以此達到在維護資料隱私,並且在這樣的情境下,我們提出的模型仍可以得到良好的結果。


    Learning from data has become the mainstream in modern machine learning applications. The more data we have, our machine learning methods show better results if the data quality is good enough. However, in many cases, the data owners may not want to share or not allow to share the data they have because of privacy issues or legal concerns. To solve this problem, Google proposed federated learning, also known as consensus learning named in 2016. This framework has been proposed and applied to linear and nonlinear SVM as well as PCA. In this work, we will apply this framework to sparse principal analysis (SPCA). The sparse loadings are composed of as few non-zero elements as possible and keep the data variation as much as possible when we project the data onto these sparse loadings. Our proposed method, CASPCA and CSSPCA, will be solved by a distributed optimization algorithm, ADMM, which allows different data owners to train models together without sharing their own data. We will demonstrate our CSPCA with synthetics dataset and a public dataset.

    1 Introduction 1 2 Related Work 4   2.1 Sparse Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 4   2.2 Smoothing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Alternating Direction Method of Multipliers 6   3.1 Dual Ascent Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7   3.2 Method of Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8   3.3 Alternating Direction Method of Multipliers . . . . . . . . . . . . . . . . . 10   3.4 Consensus Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Consensus Sparse Principal Component Analysis 14   4.1 Consensus Approximation Sparse Principal Component Analysis . . . . . . 14    4.1.1 Deflation method for CASPCA . . . . . . . . . . . . . . . . . . . . 18   4.2 Consensus Smoothing Sparse Principal Component Analysis . . . . . . . . 21    4.2.1 Consensus Smoothing Sparse PCA . . . . . . . . . . . . . . . . . . 24    4.2.2 Line-search method on the Stiefel Manifold . . . . . . . . . . . . . 26    4.2.3 Deflation method for CSSPCA . . . . . . . . . . . . . . . . . . . . 28 5 Experiments 31   5.1 Experiments on Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . 31   5.1.1 Consensus Approximation Sparse Principal Component Analysis . . 32   5.1.2 Consensus Smooth Sparse Principal Component Analysis with fixedλ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39   5.1.3 Consensus Smooth Sparse Principal Component Analysis with fixedλ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42   5.2 Experiments on Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 47   5.2.1 Wisconsin Diagnostic Breast Cancer . . . . . . . . . . . . . . . . . 47 6 Conclusion and Future Work 50 Reference 51

    [1] K. Pearson, “Liii. on lines and planes of closest fit to systems of points in space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, no. 11, pp. 559–572, 1901.
    [2] Y.-J. Lee, Y.-R. Yeh, and Y.-C. F. Wang, “Anomaly detection via online oversampling principal component analysis,” IEEE transactions on knowledge and data engineering, vol. 25, no. 7, pp. 1460–1470, 2012.
    [3] H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component analysis,” Journal of computational and graphical statistics, vol. 15, no. 2, pp. 265–286, 2006.
    [4] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communicationefficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics, PMLR, 2017, pp. 1273–1282.
    [5] A. Grammenos, R. Mendoza Smith, J. Crowcroft, and C. Mascolo, “Federated principal component analysis,” Advances in Neural Information Processing Systems, vol. 33, 2020.
    [6] S. Boyd, N. Parikh, and E. Chu, Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.
    [7] H. Zou and L. Xue, “A selective overview of sparse principal component analysis,” Proceedings of the IEEE, vol. 106, no. 8, pp. 1311–1320, 2018.
    [8] I. T. Jolliffe, N. T. Trendafilov, and M. Uddin, “A modified principal component technique based on the lasso,” Journal of computational and Graphical Statistics, vol. 12, no. 3, pp. 531–547, 2003.
    [9] F. Chen and K. Rohe, “A new basis for sparse pca,” arXiv preprint arXiv:2007.00596, 2020.
    [10] M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre, “Generalized power method for sparse principal component analysis.,” Journal of Machine Learning Research, vol. 11, no. 2, 2010.
    [11] X.-T. Yuan and T. Zhang, “Truncated power method for sparse eigenvalue problems.,” Journal of Machine Learning Research, vol. 14, no. 4, 2013.
    [12] J. Ge, Z. Wang, M. Wang, and H. Liu, “Minimax-optimal privacy-preserving sparse pca in distributed systems,” in International Conference on Artificial Intelligence and Statistics, PMLR, 2018, pp. 1589–1598.
    [13] V. Q. Vu, J. Cho, J. Lei, and K. Rohe, “Fantope projection and selection: A nearoptimal convex relaxation of sparse pca,” in Advances in neural information processing systems, 2013, pp. 2670–2678.
    [14] A. d’Aspremont, L. El Ghaoui, M. I. Jordan, and G. R. Lanckriet, “A direct formulation for sparse pca using semidefinite programming,” SIAM review, vol. 49, no. 3, pp. 434–448, 2007.
    [15] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
    [16] M. Hebiri, S. Van De Geer, et al., “The smooth-lasso and other ℓ1+ ℓ2-penalized methods,” Electronic Journal of Statistics, vol. 5, pp. 1184–1226, 2011.
    [17] B. Saheya, C.-H. Yu, and J.-S. Chen, “Numerical comparisons based on four smoothing functions for absolute value equation,” Journal of Applied Mathematics and Computing, vol. 56, no. 1, pp. 131–149, 2018.
    [18] B. Saheya, C. T. Nguyen, and J.-S. Chen, “Neural network based on systematically generated smoothing functions for absolute value equation,” Journal of Applied Mathematics and Computing, vol. 61, no. 1, pp. 533–558, 2019.
    [19] Y.-J. Lee and O. L. Mangasarian, “Ssvm: A smooth support vector machine for classification,” Computational optimization and Applications, vol. 20, no. 1, pp. 5– 22, 2001.
    [20] Y.-J. Lee, W.-F. Hsieh, and C.-M. Huang, “ε-ssvr: A smooth support vector machine for ε-insensitive regression,” IEEE Transactions on knowledge and data engineering, vol. 17, no. 5, pp. 678–685, 2005.
    [21] R. Glowinski and A. Marroco, “Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires,” ESAIM: Mathematical Modelling and Numerical Analysis-Modélisation Mathématique et Analyse Numérique, vol. 9, no. R2, pp. 41–76, 1975.
    [22] D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite element approximation,” Computers & mathematics with applications, vol. 2, no. 1, pp. 17–40, 1976.
    [23] I. Damgård, V. Pastro, N. Smart, and S. Zakarias, “Multiparty computation from somewhat homomorphic encryption,” in Annual Cryptology Conference, Springer, 2012, pp. 643–662.
    [24] R. Cramer, I. Damgård, and J. B. Nielsen, “Multiparty computation from threshold homomorphic encryption,” in International conference on the theory and applications of cryptographic techniques, Springer, 2001, pp. 280–300.
    [25] J. A. Garay, P. MacKenzie, and K. Yang, “Strengthening zero-knowledge protocols using signatures,” in International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 2003, pp. 177–194.
    [26] S. Silva, B. A. Gutman, E. Romero, P. M. Thompson, A. Altmann, and M. Lorenzi, “Federated learning in distributed medical databases: Meta-analysis of large-scale subcortical brain data,” in 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), IEEE, 2019, pp. 270–274.
    [27] W. Zheng, R. A. Popa, J. E. Gonzalez, and I. Stoica, “Helen: Maliciously secure coopetitive learning for linear models,” in 2019 IEEE Symposium on Security and Privacy (SP), IEEE, 2019, pp. 724–738.
    [28] L. He, A. Bian, and M. Jaggi, “Cola: Decentralized linear learning,” arXiv preprint arXiv:1808.04883, 2018.
    [29] V. Smith, S. Forte, M. Chenxin, M. Takáč, M. I. Jordan, and M. Jaggi, “Cocoa: A general framework for communication-efficient distributed optimization,” Journal of Machine Learning Research, vol. 18, p. 230, 2018.
    [30] S. Ma, “Alternating direction method of multipliers for sparse principal component analysis,” Journal of the Operations Research Society of China, vol. 1, no. 2, pp. 253– 274, 2013.
    [31] M. Tan, Z. Hu, Y. Yan, J. Cao, D. Gong, and Q. Wu, “Learning sparse pca with stabilized admm method on stiefel manifold,” IEEE Transactions on Knowledge and Data Engineering, 2019.
    [32] D. Hajinezhad and M. Hong, “Nonconvex alternating direction method of multipliers for distributed sparse principal component analysis,” in 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE, 2015, pp. 255–259.
    [33] M. R. Hestenes, “Multiplier and gradient methods,” Journal of optimization theory and applications, vol. 4, no. 5, pp. 303–320, 1969.
    [34] M. J. Powell, “A method for nonlinear constraints in minimization problems,” Optimization, pp. 283–298, 1969.
    [35] S. Papadimitriou, J. Sun, and C. Faloutsos, “Streaming pattern discovery in multiple time-series,” 2005.
    [36] B. Yang, “Projection approximation subspace tracking,” IEEE Transactions on Signal processing, vol. 43, no. 1, pp. 95–107, 1995.
    [37] L. W. Mackey, “Deflation methods for sparse pca.,” in NIPS, vol. 21, 2008, pp. 1017– 1024.
    [38] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
    [39] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., “Advances and open problems in federated learning,” arXiv preprint arXiv:1912.04977, 2019.
    [40] W. Ring and B. Wirth, “Optimization methods on riemannian manifolds and their application to shape space,” SIAM Journal on Optimization, vol. 22, no. 2, pp. 596– 627, 2012.
    [41] H. Sato and K. Aihara, “Cholesky qr-based retraction on the generalized stiefel manifold,” Computational Optimization and Applications, vol. 72, no. 2, pp. 293– 308, 2019.
    [42] D. Dua and C. Graff, UCI machine learning repository, 2017.
    [Online]. Available: http://archive.ics.uci.edu/ml.
    [43] R. Zhang and J. Kwok, “Asynchronous distributed admm for consensus optimization,” in International conference on machine learning, PMLR, 2014, pp. 1701–1709.
    [44] T.-H. Chang, M. Hong, W.-C. Liao, and X. Wang, “Asynchronous distributed admm for large-scale optimization—part i: Algorithm and convergence analysis,” IEEE Transactions on Signal Processing, vol. 64, no. 12, pp. 3118–3130, 2016.

    QR CODE