簡易檢索 / 詳目顯示

研究生: 謝志優
Hsieh, Chih-Yu
論文名稱: 探討AdaBoost演算法
A Study on AdaBoost Algorithm
指導教授: 陳朝欽
Chen, Chaur-Chin
口試委員: 朱學亭
蘇豐文
學位類別: 碩士
Master
系所名稱: 電機資訊學院 - 資訊系統與應用研究所
Institute of Information Systems and Applications
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 24
中文關鍵詞: 整體式學習資料探勘
外文關鍵詞: AdaBoost.M1, Ensemble Method
相關次數: 點閱:1下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 分類是資料探勘裡面最重要的技術之一,我們可以透過分類,將已知資料進行處理分類並找出隱含的規則,日後可用此規則對未知的資料進行預測。日常生活中,它的應用非常的廣泛,例如醫療上我們可以利用此技術找出病人基因特徵之隱含規則,日後便可將此規則應用在其他病人,如此一來可以加速醫療流程,也讓醫生在診斷上有其他的依據可做參考。所以資料探勘是一門非常重要的技術和學問,在海量資料(Big Data)的來臨,我們更必須要藉由此技術來分析資料中隱含的意義資訊。
    在本篇論文中,我們探討AdaBoost(Adaptive Boosting)二元及多元方法,首先賦予每個樣本一個權重值,再來利用改變樣本權重的方式來訓練多個弱分類器。訓練完成後,最終將多個弱分類器組合成一個強分類器,如此一來我們可以利用此強分類器來對未知資料進行預測。我們提供AdaBoost演算法在大腸癌、乳癌、8OX、及Iris資料集的實驗結果。


    Classification is one of the technology in data mining, we can discover patterns and relationships between parameters in data by classification that we can use these patterns to predict unknown data. In the real life, it is applied in several areas. For example, we can discover patterns from the genes of patients by using classification and then it can apply to other patients by using this pattern. Thus, data mining is the most important technology in data analysis. In Big Data, it cannot obtain the information without using data mining.
    In this thesis, we study the binary and multiclass classification of AdaBoost algorithm. In this algorithm, each sample has a weight value. It uses T weak classifiers to train the training samples. In training weak classifiers, we must change the weight of each incorrectly and correctly classified sample. Finally, the strong classifier is to combine the votes of all individual weak classifiers and then we can use this strong classifier to predict the unknown data. Experiments on colon cancer, breast cancer, 8OX, and Iris data sets are illustrated.

    Table of Contents Chapter 1 Introduction 1 Chapter 2 A Review of AdaBoost 4 2.1 AdaBoost [Freu1995] 4 2.2 The Algorithm of AdaBoost [Freu1995] 5 2.2.1 Selection of Number of Weak Classifiers 6 2.3 Weak Classifier 7 2.4 Strong Classifier 9 Chapter 3 A Review of AdaBoost.M1 10 3.1 AdaBoost.M1 [Freu1996] 10 3.2 The Algorithm of AdaBoost.M1 [Freu1996] 11 3.3 Weak Classifier 12 3.4 Strong Classifier 12 3.5 Discussion 13 Chapter 4 Experimental Results 15 4.1 The Input Data Sets of AdaBoost 15 4.2 The Input Data Sets of Adaboot.M1 16 4.3 Experimental Results 17 4.3.1 The Result of Classification by Using AdaBoost 18 4.3.2 The Result of Classification by Using AdaBoost.M1 20 Chapter 5 Conclusion and Future Work 22 References 23

    References

    [Alon1999] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proceedings of National Academy of Sciences of the United States of American, Vol. 96, 6745-6750, 1999.
    [Brei1984] L. Breiman, J.H. Friedman, R.A. Olshen, and C.G. Stone, “Classification and Regression Tree,” Wadsworth International Group, Belmont, California, 1984
    [Brei1994] L. Breiman, “Bagging Predictors,” Technical Report, No. 421, 1-19, 1994.
    [DLBA2013] J.D. De la Bastida Castillo, “Software for Gene Expression Data Analysis,” Institute of Information Systems and Applications, National Tsing Hua University, Hsinchu, Taiwan, May 2013.
    [Fish1936] R.A. Fisher, “The Use of Multiple Measurements in Taxonomic Problems,” Annals of Eugenics, 179-188, 1936.
    [Fraw1992] W.J. Frawley, G.P. Scapiro, and C.J. Matheus, “Knowledge Discovery in Database: An Overview,” AI Magazine, Vol. 13, 57-70, 1992.
    [Freu1995] Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” AT&T Labs, New Jersey, 119-139, 1995.
    [Freu1996] Y. Freund and R.E. Schapire, “Experiments with a New Boosting Algorithm,” Proceedings of the National Computer Conference, AT&T Labs, New Jersey, 1-9, 1996.
    [Freu1999] Y. Freund and R.E. Schapire, “A Short Introduction to Boosting,” Journal of Japanese Society for Artificial Intelligence, Japan, 771-780, 1999
    [Jain1988] A.K. Jain and R.C. Dubes, “Algorithms for Clustering Data,” Prentice-Hall, New Jersey, 1988.
    [Quin1993]J.R. Quinlan, “C4.5: Programs for Machine Learning,” Machine Learning, 235-240, 1993.
    [Poli2006] R. Polikar, “Ensemble Based Systems in Decision Making,” IEEE Circuts and Systems Magazine, 21-45, 2006.
    [Scha1990]R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, 197-227, 1990.
    [Veer2002] L.J. van’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend, “Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer,” Letters to Nature, Nature 415, 530-536, 2002.
    [Web01] http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html, last access on May 15, 2014
    [Web02] http://classes.engr.oregonstate.edu/eecs/spring2012/cs534/notes/, last access on May 15, 2014
    [Web03]
    http://tel.archives-ouvertes.fr/docs/00/71/27/10/PDF/thesis.pdf, last access on May 15, 2014
    [Web04]
    http://www.cs.waikato.ac.nz/ml/weka/, last access on May 15, 2014.

    無法下載圖示 全文公開日期 本全文未授權公開 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE