A Heuristically Perturbation of Dataset to Achieve a Diverse Ensemble of Classifiers
Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are among the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, named CDEBMTE (Creation of Diverse Ensemble Based on Manipulation of Training Examples), that directly constructs diverse hypotheses using manipulation of training examples in three ways: (1) sub-sampling training examples, (2) decreasing/increasing errorprone training examples and (3) decreasing/increasing neighbor samples of error-prone training examples.
The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using two well-known classifiers (1) decision-tree induction and (2) multilayer perceptron as two base learners demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier, Adaboost and Bagging. CDEBMTE also outperforms Adaboost more prominent when training data size is becomes larger.
We propose to show that CDEBMTE can be effectively used to achieve higher accuracy and to obtain better class membership probability estimates.
Experimental results using two well-known classifiers as two base learners demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier, Adaboost and Bagging. CDEBMTE also outperforms Adaboost more prominent when training data size is becomes larger.
KeywordsClassifier Ensemble Diversity Training Examples Manipulation
- 4.Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) Proceedings of the Thirteenth International Conference on Machine Learning (ICML 1996). Morgan Kaufmann (1996)Google Scholar
- 7.Kuncheva, L.I.: Combining Pattern Classifiers, Methods and Algorithms. Wiley, New York (2005)Google Scholar
- 8.Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Networks 12 (1999)Google Scholar
- 9.Melville, P., Mooney, R.: Constructing Diverse Classifier Ensembles Using Artificial Training Examples. In: Proc. of the IJCAI, vol. I, pp. 505–510 (2003)Google Scholar
- 10.Melville, P.: Creating Diverse Ensemble Classifiers (2006)Google Scholar
- 11.Newman, C.B.D.J., Hettich, S., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLSummary.html
- 12.Qiao, X., Liu, Y.: Adaptive Weighted Learning for Unbalanced Multicategory Classification. Biometrics, 159–168 (2009)Google Scholar
- 13.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 15.Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
- 18.Minaei-Bidgoli, B., Parvin, H., Alinejad-Rokny, H., Alizadeh, H., Punch, W.F.: Effects of resampling method and adaptation on clustering ensemble efficacy, Online (2011)Google Scholar
- 20.Parvin, H., Helmi, H., Minaei-Bidgoli, B., Alinejad-Rokny, H., Shirgahi, H.: Linkage Learning Based on Differences in Local Optimums of Building Blocks with One Optima. International Journal of the Physical Sciences 6(14), 3419–3425 (2011)Google Scholar
- 24.Parvin, H., Minaei-Bidgoli, B., Ghaffarian, H.: An Innovative Feature Selection Using Fuzzy Entropy. In: Liu, D. (ed.) ISNN 2011, Part III. LNCS, vol. 6677, pp. 576–585. Springer, Heidelberg (2011)Google Scholar
- 26.Parvin, H., Minaei-Bidgoli, B., Ghatei, S., Alinejad-Rokny, H.: An Innovative Combination of Particle Swarm Optimization, Learning Automaton and Great Deluge Algorithms for Dynamic Environments. International Journal of the Physical Sciences 6(22), 5121–5127 (2011)Google Scholar