研究生: |
鍾允昇 Yun-Sheng Chung |
---|---|
論文名稱: |
從分類器差異性與個別性能評估多分類器整體性能 Ensemble Performance in terms of Diversity and Performance of Individual Classifier Systems |
指導教授: |
唐傳義
Chuan Yi Tang |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2007 |
畢業學年度: | 96 |
語文別: | 英文 |
論文頁數: | 70 |
中文關鍵詞: | 多分類器 、過半數投票制 、多數決投票制 、分類器差異性 、整體性能 |
外文關鍵詞: | Multiple classifier systems, majority voting, plurality voting, classifiers diversity, ensemble performance |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
多種分類器(classifier system) 的結合運用, 已知是能比單一分類器有更好的效能。先前的研究顯示, 多分類器的整體表現主要取決於分類器個別的效能以及之間的差異性。現今已有多種不同的差異度測度法和整體效能評估法被提出研究, 然而從個別效能與差異性來評估整體成效, 至今仍是相當具有挑戰性的問題。本研究首先以效能分配樣版(performance distribution pattern, PDP) 的觀念, 來探討在各個不同分類器組合下, 評估其整體效能的一般性問題。本研究特別為下列效能評估測度建立上限和下限:(a) 過半數投票制效能(考慮不一致性Dis),(b) 加權過半數投票制效能(考慮加權平均效能跟加權不一致性),(c) 多數決投票制效能(考慮不一致性的熵值‾D)。在利用PDP 模式取得輸入資料後, 本研究針對這三種情況所給的上下界範圍可以相當緊致。依我們先前在差異性均衡的研究結果,(a) 可以延伸出其他的差異性測度法。此外, 本研究也說明在(a) 前提下, 如果個別系統的效能‾ P 夠高時, 以最大entropy PDP法所得到的整體效能Pm, 可以表示為一個隨不一致性Dis遞增的函數。本研究使用八個來自不同應用領域的資料集來演示所定義的一般性問題的複雜度跟異質性。
Combining multiple classifier systems (MCS’) has been shown to outperform single classifier system. It has been demonstrated that improvement for ensemble performance
depends on either the diversity among or the performance of individual systems. A variety of diversity measures and ensemble methods have been proposed and studied. It remains
a challenging problem to estimate the ensemble performance in terms of the performance of and the diversity among individual systems. In this paper, we establish upper and
lower bounds for (a) majority voting ensemble performance with disagreement diversity measure Dis, (b) weighted majority voting performance in terms of weighted average
performance and weighted disagreement diversity, and (c) plurality voting ensemble performance with entropy diversity measure ‾D . Bounds for these three cases are shown to be tight using the concept of a performance distribution pattern (PDP) for the input set. As a consequence of our previous results on diversity equivalence, (a) can be extended to several other diversity measures. Moreover, we showed in the case of (a) that when ‾ P is big enough, the ensemble performance Pm resulting from a maximum (information-theoretic) entropy PDP is an increasing function with respect to the disagreement diversity Dis. Eight experiments using data sets from various applications domains are conducted to
demonstrate the complexity, richness, and diverseness of the problem in estimating the ensemble performance.
[1] Aksela, Matti and Laaksonen, Jorma (2006), “Using diversity of errors for selecting
members of a committee classifier”, Pattern Recognition, 39, 608–623.
[2] Bertsimas, Dimitris and Popescu, Ioana (2005), “Optimal inequalities in probability
theory: A convex optimization approach”, SIAM J. on Optimization, 15(3), 780–804.
[3] Breiman, Leo (1998), “Arcing classifiers”, The Annals of Statistics, 26(3), 801–849.
[4] Breiman, Leo (2001), “Random forests”, Machine Learning, 45, 5–32.
[5] Brown, Gavin, Wyatt, Jeremy, Harris, Rachel, and Yao, Xin (2005), “Diversity creation
methods: a survey and categorisation”, Information Fusion, 6, 5–20.
[6] Chung, Yun-Sheng, Hsu, D. Frank, and Tang, Chuan Yi (2007), “On the diversityperformance
relationship for majority voting in classifier ensembles”, in Josef Kittler,
Fabio Roli, and Michal Haindl (eds.), Multiple Classifier Systems, volume 4472 of
Lecture Notes in Computer Science, 407–420, Springer.
[7] Chung, Yun-Sheng, Hsu, D. Frank, and Tang, Chuan Yi (2007), “On the relationships
among various diversity measures in multiple classifier systems”, Manuscript.
[8] Cover, Thomas M. and Thomas, Joy A. (1991), Elements of Information Theory,
John Wiley & Sons, Inc.
[9] Demirekler, M¨ubeccel and Altin¸cay, Hakan (2002), “Plurality voting-based multiple
classifier systems: statistically independent with respect to dependent classifier sets.”,
Pattern Recognition, 35(11), 2365–2379.
[10] Devroye, Luc, Gy¨orfi, L´aszl´o, and Lugosi, G´abor (1996), A Probabilistic Theory of
Pattern Recognition, Springer-Verlag.
[11] Dietterich, T. (2000a), “An experimental comparison of three methods for constructing
ensembles of decision trees: Bagging, boosting and randomization”, Machine
Learning, 40, 139–157.
[12] Dietterich, Thomas G. (2000b), “Ensemble methods in machine learning.”, in Josef
Kittler and Fabio Roli (eds.), Multiple Classifier Systems, volume 1857 of Lecture
Notes in Computer Science, 1–15, Springer.
[13] Newman, D.J., Blake, C.L., Hettich, S. and Merz, C.J.
(1998), “UCI repository of machine learning databases”, URL:
http://www.ics.uci.edu/ mlearn/MLRepository.html.
[14] Fumera, Giorgio and Roli, Fabio (2005), “A theoretical and experimental analysis
of linear combiners and multiple classifier systems”, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 27(6), 942–956.
[15] Hansen, L.K. and Salamon, P. (1990), “Neural network ensembles”, IEEE Transactions
on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.
[16] Ho, T. K. (1992), “A theory of multiple classifier systems and its application to visual
word recognition”, Ph.D. thesis, State University of New York at Buffalo.
[17] Ho, T. K. (1998), “The random space method for constructing decision forests”,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 832–844.
[18] Ho, T. K. (2002), “Multiple classifier combination: Lessons and next steps”, in
H. Bunke and A. Kandel (eds.), Hybrid Methods in Pattern Recognition, 171–198,
World Scientific.
[19] Ho, T. K., Hull, J. J., and Srihari, S. N. (1994), “Decision combination in multiple
classifier systems”, IEEE Transactions on Pattern Analysis and Machine Intelligence,
16, 66–75.
[20] Hsu, D. Frank, Chung, Yun-Sheng, and Kristal, Bruce S. (2006), “Combinatorial
fusion analysis: Methods and practices of combining multiple scoring systems”, in
Hui-Huang Hsu (ed.), Advanced Data Mining Technologies in Bioinformatics, 32–62,
Idea Group Inc.
[21] Hsu, D. Frank and Taksa, Isak (2005), “Comparing rank and score combination
methods for data fusion in information retrieval”, Information Retrieval, 8(3), 449–
480.
[22] Kendall, Maurice and Gibbons, Jean Dickinson (1990), Rank Correlation Methods,
Edward Arnold.
[23] Kittler, Josef and Alkoot, F.M. (2003), “Sum versus vote fusion in multiple classifier
systems”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1),
110–115.
[24] Kittler, Josef, Hatef, Mohamad, Duin, Robert P.W., and Matas, Jiri (1998), “On
combining classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence,
20(3), 226–239.
[25] Krogh, Anders and Vedelsby, Jesper (1995), “Neural network ensembles, cross validation,
and active learning”, in G. Tesauro, D. S. Touretzky, and T. K. Leen (eds.),
Advances in Neural Information Processing Systems, volume 7, 231–238, MIT Press.
[26] Kuncheva, L. I. and Whitaker, C. J. (2003), “Measures of diversity in classifier ensembles
and their relationship with the ensemble accuracy”, Machine Learning, 51,
181–207.
[27] Kuncheva, L. I., Whitaker, C. J., Shipp, C. A., and Duin, R. P. W. (2003), “Limits on
the majority vote accuracy in classifier fusion”, Pattern Analysis and Applications,
6, 22–31.
[28] Kuncheva, Ludmila (2003), “That elusive diversity in classifier ensembles.”, in Francisco
J. Perales L´opez, Aur´elio C. Campilho, Nicolas P´erez de la Blanca, and Alberto
Sanfeliu (eds.), IbPRIA, volume 2652 of Lecture Notes in Computer Science, 1126–
1138, Springer.
[29] Kuncheva, Ludmila I. (2000), Fuzzy Classifier Design. Studies in Fuzziness and Soft
Computing, Springer-Verlag.
[30] Kuncheva, Ludmila I. (2004), Combining Pattern Classifiers: Methods and Algorithms,
Hoboken, New Jersey: John Wiley & Sons, Inc.
[31] Kuncheva, Ludmila I. (2005), “Diversity in multiple classifier systems”, Information
Fusion, 6(1), 3–4, editorial.
[32] Lacasse, Alexandre, Laviolette, Fran¸cois, Marchand, Mario, Germain, Pascal, and
Usunier, Nicolas (2006), “PAC-Bayes bounds for the risk of the majority vote and
the variance of the gibbs classifier”, in Neural Information Processing Systems.
[33] Lam, L. and Suen, C. Y. (1997), “Application of majority voting to pattern recognition:
An analysis of its behaviour and performance”, IEEE Transactions on Systems,
Man, and Cybernetics, 27(5), 533–568.
[34] Matan, Ofer (1996), “On voting ensembles of classifiers”, in Proc. American Association
for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models
Workshop, 84–88.
[35] McAllester, David (1999), “Some PAC-Bayesian theorems”, Machine Learning, 37,
355–363.
[36] Narasimhamurthy, Anand (2005a), “Evaluation of diversity measures for binary classifier
ensembles”, in Multiple Classifier Systems, volume 3541 of LNCS, 267–277.
[37] Narasimhamurthy, Anand (2005b), “Theoretical bounds of majority voting performance
for a binary classification problem”, IEEE Transactions on Pattern Analysis
and Machine Intelligence, 27(12), 1988–1995.
[38] Ng, K. B. and Kantor, P. B. (2000), “Predicting the effectiveness of na¨ıve data
fusion on the basis of system characteristics”, Journal of the American Society for
Information Science, 51(13), 1177–1189.
[39] Partridge, D. and Krzanowski, W. (2003), “Refining multiple classifier system diversity”,
Technical report, Computer Science Department, University of Exeter, UK.
[40] Rodriguez, J. J., Kuncheva, L. I., and Alonso, C. J. (2006), “Rotation forest: A new
classifier ensemble method”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 28(10), 1619–1630.
[41] Roli, Fabio and Kittler, Josef (2002), “Fusion of multiple classifiers”, Information
Fusion, 3(4), 243, editorial.
[42] Ruta, D. (2003), “Classifier diversity in combined pattern recognition systems”,
Ph.D. thesis, University of Paisley, Scotland, UK.
[43] Ruta, Dymitr and Gabrys, Bogdan (2002), “A theoretical analysis of the majority
voting errors for multiple classifier systems”, Pattern Analysis and Applications, 5,
333–350.
[44] Sharkey, Amanda (ed.) (1999), Combining Artificial Neural Nets, Springer-Verlag.
[45] Shipp, C.A. and Kuncheva, L.I. (2002), “Relationships between combination methods
and measures of diversity in combining classifiers”, Information Fusion, 3(2), 135–
148.
[46] Tumer, Kagan and Ghosh, Joydeep (1999), “Linear and order statistics combiners for
pattern classification”, in Amanda Sharkey (ed.), Combining Artificial Neural Nets,
127–162, Springer-Verlag.
[47] Xu, L., Krzyzak, A., and Suen, C. Y. (1992), “Methods for combining multiple
classifiers and their applications to handwritten recognition”, IEEE Transactions on
Systems, Man, and Cybernetics, 22(3), 418–435.
[48] Yang, J. M., Chen, Y. F., Shen, T. W., Kristal, B. S., and Hsu, D. F. (2005), “Consensus
scoring for improving enrichment in virtual screening”, Journal of Chemical
Information and Modeling, 45, 1134–1146.
[49] Yule, G. U. (1900), “On the association of attributes in statistics”, Philosophy Trans.,
A, 194, 257–319.
[50] Zenobi, G. and Cunningham, P. (2001), “Using diversity in preparing ensembles
of classifiers based on different feature subsets to minimize generalization error”, in
European Conference on Machine Learning, volume 2167 of LNCS, 576–587, Springer.