Skip to main content

Genetic Programming for Classification with Unbalanced Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6021))

Abstract

Learning algorithms can suffer a performance bias when data sets only have a small number of training examples for one or more classes. In this scenario learning methods can produce the deceptive appearance of “good looking” results even when classification performance on the important minority class can be poor. This paper compares two Genetic Programming (GP) approaches for classification with unbalanced data. The first focuses on adapting the fitness function to evolve classifiers with good classification ability across both minority and majority classes. The second uses a multi-objective approach to simultaneously evolve a Pareto front (or set) of classifiers along the minority and majority class trade-off surface. Our results show that solutions with good classification ability were evolved across a range of binary classification tasks with unbalanced data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks (1984)

    Google Scholar 

  2. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  3. Winkler, S., Affenzeller, M., Wagner, S.: Advanced genetic programming based machine learning. Journal of Mathematical Modelling and Algorithms 6(3), 455–480 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  4. Doucette, J., Heywood, M.I.: GP classification under imbalanced data sets: Active sub-sampling and AUC approximation. In: O’Neill, M., Vanneschi, L., Gustafson, S., Esparcia Alcázar, A.I., De Falco, I., Della Cioppa, A., Tarantino, E. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 266–277. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  5. Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6, 1–6 (2004)

    Article  Google Scholar 

  6. Weiss, G.M., Provost, F.: Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)

    MATH  Google Scholar 

  7. Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)

    Article  Google Scholar 

  8. Holmes, J.H.: Differential negative reinforcement improves classifier system learning rate in two-class problems with unequal base rates. In: Koza, J.R., Banzhaf, W., Chellapilla, K., et al. (eds.) Proceedings of the Third Annual Conference Genetic Programming 1998, pp. 635–644. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  9. Pednault, E., Rosen, B., Apte, C.: Handling imbalanced data sets in insurance risk modeling. Tech. Rep., IBM Tech Research Report RC-21731 (2000)

    Google Scholar 

  10. Sung, K.-K.: Learning and Example Selection for Object and Pattern Recognition. PhD thesis, AI Laboratory and Center for Biological and Computational Learning. MIT (1996)

    Google Scholar 

  11. Munder, S., Gavrila, D.: An experimental study on pedestrain classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(11), 1863–1868 (2006)

    Article  Google Scholar 

  12. Monard, M.C., Batista, G.E.A.P.A.: Learning with skewed class distributions. In: Advances in Logic, Artificial Intelligence and Robotics, pp. 173–180 (2002)

    Google Scholar 

  13. Song, D., Heywood, M., Zincir-Heywood, A.: Training genetic programming on half a million patterns: an example from anomaly detection. IEEE Transactions on Evolutionary Computation 9, 225–239 (2005)

    Article  Google Scholar 

  14. Eggermont, J., Eiben, A., van Hemert, J.: Adapting the fitness function in GP for data mining. In: Langdon, W.B., Fogarty, T.C., Nordin, P., Poli, R. (eds.) EuroGP 1999. LNCS, vol. 1598, pp. 193–202. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  15. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)

    Article  Google Scholar 

  16. Bhowan, U., Johnston, M., Zhang, M.: Differentiating between individual class performance in genetic programming fitness for classification with unbalanced data. In: Proceedings of the 2009 IEEE Congress on Evolutionary Computation, CEC 2009 (2009)

    Google Scholar 

  17. Patterson, G., Zhang, M.: Fitness functions in genetic programming for classification with unbalanced data. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 769–775. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182–197 (2000)

    Article  Google Scholar 

  19. Bhowan, U., Johnston, M., Zhang, M.: Multi-objective genetic programming for classification with unbalanced data. In: Li, X. (ed.) AI 2009. LNCS (LNAI), vol. 5866, pp. 370–380. Springer, Heidelberg (2009)

    Google Scholar 

  20. Parrot, D., Li, X., Ciesielski, V.: Multi-objective techniques in genetic programming for evolving classifiers. In: Proceedings of the 2005 Congress on Evolutionary Computation (CEC 2005), September 2005, pp. 1141–1148 (2005)

    Google Scholar 

  21. Fisher, R.A.: Statistical methods for research workers, 14th edn. Oliver and Boyd (1970)

    Google Scholar 

  22. Yan, L., Dodier, R., Mozer, M.C., Wolniewicz, R.: Optimizing classifier performance via the Wilcoxon-Mann-Whitney statistic. In: Proceedings of The Twentieth International Conference on Machine Learning (ICML 2003), pp. 848–855 (2003)

    Google Scholar 

  23. Coello, C., Lamont, G., Veldhuizen, D.: Evolutionary Algorithms for Solving Multi-Objective Problems, 2nd edn. Genetic & Evolutionary Computation Series. Springer, US (2007)

    MATH  Google Scholar 

  24. Asuncion, A., Newman, D.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  25. Bhowan, U., Johnston, M., Zhang, M.: Genetic programming for image classification with unbalanced data. In: Proceedings of 24th International Conference on Image and Vision Computing, Wellington, New Zealand, pp. 316–321. IEEE Press, Los Alamitos (2009)

    Google Scholar 

  26. Knowles, J., Thiele, L., Zitzler, E.: A tutorial on the performance assessment of stochastic multiobjective optimizers. Tech. Rep., No. 214, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich (February 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bhowan, U., Zhang, M., Johnston, M. (2010). Genetic Programming for Classification with Unbalanced Data. In: Esparcia-Alcázar, A.I., Ekárt, A., Silva, S., Dignum, S., Uyar, A.Ş. (eds) Genetic Programming. EuroGP 2010. Lecture Notes in Computer Science, vol 6021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12148-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12148-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12147-0

  • Online ISBN: 978-3-642-12148-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics