A Heuristically Perturbation of Dataset to Achieve a Diverse Ensemble of Classifiers

  • Hamid Parvin
  • Sajad Parvin
  • Zahra Rezaei
  • Moslem Mohamadi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7329)

Abstract

Ensemble methods like Bagging and Boosting which combine the decisions of multiple hypotheses are among the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. We present a new method for generating ensembles, named CDEBMTE (Creation of Diverse Ensemble Based on Manipulation of Training Examples), that directly constructs diverse hypotheses using manipulation of training examples in three ways: (1) sub-sampling training examples, (2) decreasing/increasing errorprone training examples and (3) decreasing/increasing neighbor samples of error-prone training examples.

The technique is a simple, general meta-learner that can use any strong learner as a base classifier to build diverse committees. Experimental results using two well-known classifiers (1) decision-tree induction and (2) multilayer perceptron as two base learners demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier, Adaboost and Bagging. CDEBMTE also outperforms Adaboost more prominent when training data size is becomes larger.

We propose to show that CDEBMTE can be effectively used to achieve higher accuracy and to obtain better class membership probability estimates.

Experimental results using two well-known classifiers as two base learners demonstrate that this approach consistently achieves higher predictive accuracy than both the base classifier, Adaboost and Bagging. CDEBMTE also outperforms Adaboost more prominent when training data size is becomes larger.

Keywords

Classifier Ensemble Diversity Training Examples Manipulation 

References

  1. 1.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)MATHCrossRefGoogle Scholar
  2. 2.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)CrossRefGoogle Scholar
  3. 3.
    Freund, Y., Schapire, R.E.: A decision–theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) Proceedings of the Thirteenth International Conference on Machine Learning (ICML 1996). Morgan Kaufmann (1996)Google Scholar
  5. 5.
    Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Transaction on Pattern Analysis and Machine Intelligence 12, 993–1001 (1990)CrossRefGoogle Scholar
  6. 6.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)MATHGoogle Scholar
  7. 7.
    Kuncheva, L.I.: Combining Pattern Classifiers, Methods and Algorithms. Wiley, New York (2005)Google Scholar
  8. 8.
    Liu, Y., Yao, X.: Ensemble learning via negative correlation. Neural Networks 12 (1999)Google Scholar
  9. 9.
    Melville, P., Mooney, R.: Constructing Diverse Classifier Ensembles Using Artificial Training Examples. In: Proc. of the IJCAI, vol. I, pp. 505–510 (2003)Google Scholar
  10. 10.
    Melville, P.: Creating Diverse Ensemble Classifiers (2006)Google Scholar
  11. 11.
    Newman, C.B.D.J., Hettich, S., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLSummary.html
  12. 12.
    Qiao, X., Liu, Y.: Adaptive Weighted Learning for Unbalanced Multicategory Classification. Biometrics, 159–168 (2009)Google Scholar
  13. 13.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  14. 14.
    Tumer, K., Ghosh, J.: Error correlation and error reduction in ensemble classifiers. Connection Science 8(3-4), 385–403 (1996)CrossRefGoogle Scholar
  15. 15.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
  16. 16.
    Daryabari, M., Minaei-Bidgoli, B., Parvin, H.: Localizing Program Logical Errors Using Extraction of Knowledge from Invariants. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 124–135. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  17. 17.
    Fouladgar, H., Minaei-Bidgoli, B., Parvin, H.: On Possibility of Conditional Invariant Detection. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011, Part II. LNCS, vol. 6882, pp. 214–224. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  18. 18.
    Minaei-Bidgoli, B., Parvin, H., Alinejad-Rokny, H., Alizadeh, H., Punch, W.F.: Effects of resampling method and adaptation on clustering ensemble efficacy, Online (2011)Google Scholar
  19. 19.
    Parvin, H., Minaei-Bidgoli, B.: Linkage Learning Based on Local Optima. In: Jędrzejowicz, P., Nguyen, N.T., Hoang, K. (eds.) ICCCI 2011, Part I. LNCS, vol. 6922, pp. 163–172. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Parvin, H., Helmi, H., Minaei-Bidgoli, B., Alinejad-Rokny, H., Shirgahi, H.: Linkage Learning Based on Differences in Local Optimums of Building Blocks with One Optima. International Journal of the Physical Sciences 6(14), 3419–3425 (2011)Google Scholar
  21. 21.
    Parvin, H., Minaei-Bidgoli, B., Alizadeh, H.: A New Clustering Algorithm with the Convergence Proof. In: König, A., Dengel, A., Hinkelmann, K., Kise, K., Howlett, R.J., Jain, L.C. (eds.) KES 2011, Part I. LNCS, vol. 6881, pp. 21–31. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  22. 22.
    Parvin, H., Minaei, B., Alizadeh, H., Beigi, A.: A Novel Classifier Ensemble Method Based on Class Weightening in Huge Dataset. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds.) ISNN 2011, Part II. LNCS, vol. 6676, pp. 144–150. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  23. 23.
    Parvin, H., Minaei-Bidgoli, B., Alizadeh, H.: Detection of Cancer Patients Using an Innovative Method for Learning at Imbalanced Datasets. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 376–381. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  24. 24.
    Parvin, H., Minaei-Bidgoli, B., Ghaffarian, H.: An Innovative Feature Selection Using Fuzzy Entropy. In: Liu, D. (ed.) ISNN 2011, Part III. LNCS, vol. 6677, pp. 576–585. Springer, Heidelberg (2011)Google Scholar
  25. 25.
    Parvin, H., Minaei, B., Parvin, S.: A Metric to Evaluate a Cluster by Eliminating Effect of Complement Cluster. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS, vol. 7006, pp. 246–254. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  26. 26.
    Parvin, H., Minaei-Bidgoli, B., Ghatei, S., Alinejad-Rokny, H.: An Innovative Combination of Particle Swarm Optimization, Learning Automaton and Great Deluge Algorithms for Dynamic Environments. International Journal of the Physical Sciences 6(22), 5121–5127 (2011)Google Scholar
  27. 27.
    Parvin, H., Minaei, B., Karshenas, H., Beigi, A.: A New N-gram Feature Extraction-Selection Method for Malicious Code. In: Dobnikar, A., Lotrič, U., Šter, B. (eds.) ICANNGA 2011, Part II. LNCS, vol. 6594, pp. 98–107. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  28. 28.
    Qodmanan, H.R., Nasiri, M., Minaei-Bidgoli, B.: Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence. Expert Systems with Applications 38(1), 288–298 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hamid Parvin
    • 1
  • Sajad Parvin
    • 1
  • Zahra Rezaei
    • 1
  • Moslem Mohamadi
    • 1
  1. 1.Nourabad Mamasani BranchIslamic Azad UniversityNourabad MamasaniIran

Personalised recommendations