簡易檢索 / 詳目顯示

研究生: 葉力嘉
Yeh, Li-Chia
論文名稱: 針對支撐向量機器中SMO有限疊代的嚴謹證明及核支撐向量機器之參數最佳化
A Rigorous Proof of Finite Iterations in SMO-SVM and Parameter Optimization in Kernel-Based SVM
指導教授: 呂忠津
Lu, Chung-Chin
口試委員: 陳博現
Chen, Bor-Sen
黃元豪
Huang, Yuan-Hao
馬席彬
Ma, Hsi-Pin
蘇賜麟
Su, Szu-Lin
林茂昭
Lin, Mao-Chao
學位類別: 博士
Doctor
系所名稱: 電機資訊學院 - 電機工程學系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 76
中文關鍵詞: 支撐向量機有限疊代超參數最佳化分類機器學習核矩陣
外文關鍵詞: support vector machine, finite iteration, hyperparameter optimization, classification, machine learning, kernel matrix
相關次數: 點閱:2下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 支持向量機(SVM)是一種眾所周知的監督二進制分類器。但是,當數據集的大小很大時,很難一次性訓練所有數據以優化訓練的分類器。 1998年,Platt提出了一種迭代算法,順序最小優化(SMO),只選擇兩個拉格朗日乘數λ_i和λ_j來更新每次迭代,以便最小化計算成本。當使用SMO實現SVM時,我們稱之為SMO-SVM。 2001年晚些時候,Keerthi等人。通過將檢驗簡化為F_i-F_j≤τ的停止標準來改進Platt的SMO,其中τ> 0是非零容差。 Keerthi和Gilbert證明了SMO-SVM在2002年的有限迭代中終止。儘管一些研究人員在Keerthi和Gilbert的研究中構成了不充分證據的一部分,但他們的證明仍然基於拉格朗日乘數的假設下的漸近行為。在有限迭代的SMO-SVM中。然而,在停止標準F_i-F_j≤τ的情況下,這種假設是不正確的。在這項研究中,我們在Keerthi的SMO-SVM和Gilbert證明中給出了有限迭代的新的嚴格證明。我們還分析了超參數與測試錯誤率之間的關係。基於我們的發現,我們提出了小型核心驗證(miniCV),以快速篩選出優化的超參數組合,尤其適用於大型數據集。所提出的miniCV是一種參數優化方法,其完全僅建立在通過迭代SMO訓練過程生成的數據的分佈上。由於miniCV依賴於內核矩陣,因此可以避免交叉驗證,從而優化基於內核的SVM中的超參數。此外,通過我們用於跟踪測試性能的訓練相關變量的關鍵發現,miniCV能夠定位一個強大的超參數組合, 針對給定的訓練數據集。


    Support vector machine (SVM) is a well-known supervised binary classifier. But, when the size of dataset is large, it is hard to train all data at once to optimize the trained classifier. In 1998, Platt proposed an iterative algorithm, the sequential minimal optimization (SMO), that only two Lagrangian multipliers λ_i and λ_j are selected to update for each iteration in order to minimize the computation cost. When SVM is implemented with SMO, we call SMO-SVM. Later in 2001, Keerthi et al. improved Platt’s SMO by simplifying the examination to the stopping criterion of F_i −F_j ≤ τ, where τ > 0 is a non-zero tolerance. Keerthi and Gilbert proved that the SMO-SVM terminates in finite iterations in 2002. Although some researchers made up part of the insufficient proof in Keerthi and Gilbert’s research, their proofs were still based on the asymptotic behavior of the Lagrangian multipliers under the assumption of an infinitely iterative SMO-SVM. However, such an assumption is incorrect under the stopping criterion F_i −F_j ≤ τ. In this research, we give a new and rigorous proof of finite iterations in SMO-SVM in Keerthi and Gilbert’s proof. We also analyze the relations between hyperparameters and the test error rate. Based on our discoveries, we propose mini core validation (miniCV) to fast screen out an optimized hyperparameter combination especially for large datasets. The proposed miniCV is a parameter optimization approach completely built only on the distribution of the data generated via the iterative SMO training process. Since miniCV depends on kernel matrix, it saves from cross-validation to optimize hyperparameters in kernel-based SVM. Moreover, with our key findings on a training related variable which is used to trace test performance, miniCV is able to locate a robust hyperparameter combination w.r.t. the given training dataset.

    1 Introduction 8 2 Literature Review in Parameter Optimization 13 3 The Sequential Minimal Optimazation Algorithm 15 3.1 Derivation of SMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 The update rules of the SMO algorithm . . . . . . . . . . . . . . . 17 3.2 Key Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4 Finite Iterations in SMO 30 4.1 Some Properties of SMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 A Rigorous Proof of Finite Iterations . . . . . . . . . . . . . . . . . . . . . 31 4.3 SMO-SVM Algorithm for Experiment . . . . . . . . . . . . . . . . . . . . . 42 5 Experiments on Parameters 45 5.1 Regulator C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2 Kernel Parameters in the RBF Kernel SVM . . . . . . . . . . . . . . . . . 46 5.3 Kernel Parameters d, and ˜ α in the Transformed Polynomial Kernel SVM . 47 5.4 An Intermediate Variable in the Kernel-Based SVM . . . . . . . . . . . . . 47 6 Key Findings from Experiment Results 48 6.1 Regulator C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2 RBF Kernel Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.3 Kernel Parameters d and ˜ α in the Transformed Polynomial Kernel SVM . . 53 6.4 A Fast Estimation of an Appropriate ˜ α . . . . . . . . . . . . . . . . . . . . 54 6.4.1 Estimation of γ in the RBF Kernel SVM . . . . . . . . . . . . . . . 57 6.4.2 Estimation of d and ˜ α in the Transformed Polynomial Kernel SVM 57 7 Devised Algorithm and Performance Evaluation 65 7.1 Devised Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7.1.1 Optimization of Hyperparameters in RBF Kernel SVM . . . . . . . 66 7.1.2 Optimization of Hyperparameters in Transformed Polynomial Kernel SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 7.2 An Alterable Tolerance for the Training Process of SMO-SVM . . . . . . . 67 7.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 8 Conclusion 72 Bibliography 74

    [1] V. N. Vapnik, The Nature of Statistical Learning Theory. Verlin, Heidelberg: Springer-Verlang, 1995.
    [2] B. Sch¨olkopf, A. Smola, and K.-R. Mu¨ller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998. [Online]. Available: https://doi.org/10.1162/089976698300017467
    [3] B. Sch¨olkopf, R. Herbrich, and A. J. Smola, “A generalized representer theorem,” in Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory, ser. COLT ’01/EuroCOLT ’01. London, UK, UK: Springer-Verlag, 2001, pp. 416–426. [Online]. Available: http://dl.acm.org/citation.cfm?id=648300.755324
    [4] J. C. Platt, “A fast algorithm for training support vector machines,” Advances in Kernel Methods-Support Vector Learning, vol. 208, 07 1998.
    [5] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy, “Improvements to platt’s smo algorithm for svm classifier design,” Neural Computation, vol. 13, no. 3, pp. 637–649, 3 2001.
    [6] S. Keerthi and E. Gilbert, “Convergence of a generalized smo algorithm for svm classifier design,” Machine Learning, vol. 46, no. 1, pp. 351–360, Jan 2002. [Online]. Available: https://doi.org/10.1023/A:1012431217818
    [7] N. Takahashi and T. Nishi, “Rigorous proof of termination of smo algorithm for support vector machines,” IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 774–776, May 2005.
    [8] D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
    [9] A. Tharwat, A. E. Hassanien, and B. E. Elnaghi, “A ba-based algorithm for parameter optimization of support vector machine,” Pattern Recognition Letters, vol. 93, pp. 13 – 22, 2017, pattern Recognition Techniques in Data Mining.
    [10] P. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. London: GB: Prentice-Hall, 1982.
    [11] S. Geisser, Predictive Inference. New York: NY: Chapman and Hall, 1993.
    [12] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, ser. IJCAI’95. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995, pp. 1137–1143. [Online]. Available: http://dl.acm.org/citation.cfm?id=1643031.1643047
    [13] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass support vector machines,” Trans. Neur. Netw., vol. 13, no. 2, pp. 415–425, Mar. 2002. [Online]. Available: https://doi.org/10.1109/72.991427
    [14] A. L. D. Rossi and A. C. P. L. F. d. Carvalho, “Bio-inspired optimization techniques for svm parameter tuning,” in 2008 10th Brazilian Symposium on Neural Networks, Oct 2008, pp. 57–62.
    [15] S. Lessmann, R. Stahlbock, and S. F. Crone, “Genetic algorithms for support vector machine model selection,” in The 2006 IEEE International Joint Conference on Neural Network Proceedings, July 2006, pp. 3063–3069.
    [16] B. F. De Souza, A. C. p. l. f. De Carvalho, R. Calvo, and R. P. Ishii, “Multiclass svm model selection using particle swarm optimization,” in 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS’06), Dec 2006, pp. 31–31.
    [17] X. Zhang, X. Chen, and Z. He, “An aco-based algorithm for parameter optimization of support vector machines,” Expert Systems with Applications, vol. 37, no. 9, pp. 6618 – 6628, 2010. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0957417410002630
    [18] I. Aydin, M. Karakose, and E. Akin, “A multi-objective artificial immune algorithm for parameter optimization in support vector machine,” Applied Soft Computing, vol. 11, no. 1, pp. 120 – 129, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1568494609002166
    [19] R.-E. Fan, P.-H. Chen, and C.-J. Lin, “Working set selection using second order information for training support vector machines,” J. Mach. Learn. Res., vol. 6, pp. 1889–1918, Dec. 2005. [Online]. Available: http://dl.acm.org/citation.cfm?id=1046920.1194907
    [20] U. von Luxburg, “A tutorial on spectral clustering,” CoRR, vol. abs/0711.0189, 2007. [Online]. Available: http://arxiv.org/abs/0711.0189
    [21] C. J. B. Yann LeCun, Corinna Cortes, “The mnist database of handwritten digits,” http://http://yann.lecun.com/exdb/mnist/, accessed: 2018-11-30.
    [22] B. Johnson, R. Tateishi, and N. Hoan, “A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees,” International Journal of Remote Sensing, vol. 34, pp. 6969–6982, 10 2013.
    [23] R. J. Lyon, B. W. Stappers, S. Cooper, J. M. Brooke, and J. D. Knowles, “Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach,” Monthly Notices of the Royal Astronomical Society, vol. 459, no. 1, pp. 1104–1123, 04 2016. [Online]. Available: https://doi.org/10.1093/mnras/stw656
    [24] M. A. U. H. Tahir, S. Asghar, A. Manzoor, and M. A. Noor, “A classification model for class imbalance dataset using genetic programming,” IEEE Access, vol. 7, pp. 71013–71037, 2019.
    [25] T. M. Mohamed, “Pulsar selection using fuzzy knn classifier,” Future Computing and Informatics Journal, vol. 3, no. 1, pp. 1 – 6, 2018.

    QR CODE