On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets

  • Sara KhanchiEmail author
  • Malcolm I. Heywood
  • Nur Zincir-Heywood
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9594)


Streaming data scenarios introduce a set of requirements that do not exist under supervised learning paradigms typically employed for classification. Specific examples include, anytime operation, non-stationary processes, and limited label budgets. From the perspective of class imbalance, this implies that it is not even possible to guarantee that all classes are present in the samples of data used to construct a model. Moreover, when decisions are made regarding what subset of data to sample, no label information is available. Only after sampling is label information provided. This represents a more challenging task than encountered under non-streaming (offline) scenarios because the training partition contains label information. In this work, we investigate the utility of different protocols for sampling from the stream under the above constraints. Adopting a uniform sampling protocol was previously shown to be reasonably effective under both evolutionary and non-evolutionary streaming classifiers. In this work, we introduce a scheme for using the current ‘champion’ classifier to bias the sampling of training instances during the course of the stream. The resulting streaming framework for genetic programming is more effective at sampling minor classes and therefore reacting to changes in the underlying process responsible for generating the data stream.


Streaming data classification Non-stationary Class imbalance Benchmarking 



This research is supported by the Canadian Safety and Security Program(CSSP) E-Security grant. The CSSP is led by the Defense Research and Development Canada, Centre for Security Science (CSS) on behalf of the Government of Canada and its partners across all levels of government, response and emergency management organizations, nongovernmental agencies, industry and academia.


  1. 1.
    Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. Frontiers in Artificial Intelligence and Applications, vol. 207. IOS Press, Amsterdam (2010)zbMATHGoogle Scholar
  2. 2.
    Bifet, A., Read, J., Žliobaitė, I., Pfahringer, B., Holmes, G.: Pitfalls in benchmarking data stream classification and how to avoid them. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 465–479. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program. Evolvable Mach. 2(4), 381–408 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Dempsey, I., O’Neill, M., Brabazon, A.: Grammatical Evolution. In: Dempsey, I., O’Neill, M., Brabazon, A. (eds.) Foundations in Grammatical Evolution for Dynamic Environments. SCI, vol. 194, pp. 9–24. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Ditzler, G., Roveri, M., Alippi, C., Polikar, R.: Learning in nonstationary environments: a survey. IEEE Comput. Intell. Mag. 10(4), 12–25 (2015)CrossRefGoogle Scholar
  6. 6.
    Fan, W., Huang, Y., Wang, H., Yu, P.S.: Active mining of data streams. In: SIAM International Conference on Data Mining, pp. 457–461 (2004)Google Scholar
  7. 7.
    Gama, J.: Knowledge Discovery from Data Streams. CRC Press, Boca Raton (2010)CrossRefzbMATHGoogle Scholar
  8. 8.
    Gama, J., Sabastiao, R., Rodrigues, P.P.: On evaluating stream learning algorithms. Mach. Learn. 90, 317–346 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Heywood, M.I.: Evolutionary model building under streaming data for classification tasks: opportunities and challenges. Genet. Program. Evolvable Mach. 16(3), 283–326 (2015)CrossRefGoogle Scholar
  10. 10.
    Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 363–370 (2008)Google Scholar
  11. 11.
    Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)Google Scholar
  12. 12.
    Polikar, R., Alippi, C.: Guest editorial: learning in nonstationary and evolving environments. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 9–11 (2014)CrossRefGoogle Scholar
  13. 13.
    Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: ACM Genetic and Evolutionary Computation Conference, pp. 1708–1715 (2007)Google Scholar
  14. 14.
    Žliobaitė, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 25(1), 27–54 (2014)CrossRefGoogle Scholar
  15. 15.
    Vahdat, A., Atwater, A., McIntyre, A.R., Heywood, M.I.: On the application of GP to streaming data classification tasks with label budgets. In: ACM GECCO (Companion), pp. 1287–1294 (2014)Google Scholar
  16. 16.
    Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Evolving GP classifiers for streaming data tasks with concept change and label budgets: a benchmarking study. In: Gandomi, A.H., Alavi, A.H., Ryan, C. (eds.) Handbook of Genetic Programming Applications, pp. 451–480. Springer, Switzerland (2015)CrossRefGoogle Scholar
  17. 17.
    Vahdat, A., Morgan, J., McIntyre, A., Heywood, M., Zincir-Heywood, A.: Tapped delay lines for GP streaming data classification with label budgets. In: Machado, P., et al. (eds.) Genetic Programming. LNCS, vol. 9025, pp. 126–138. Springer, Switzerland (2015)Google Scholar
  18. 18.
    Wagner, N., Michalewicz, Z., Khouja, M., McGregor, R.R.: Time series forecasting for dynamic environments: the DyFor genetic program model. IEEE Trans. Evol. Comput. 11(4), 433–452 (2007)CrossRefGoogle Scholar
  19. 19.
    Wu, S., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: ACM Genetic and Evolutionary Computation Conference, pp. 1403–1410 (2011)Google Scholar
  20. 20.
    Zhu, X., Zhang, P., Lin, X., Shi, Y.: Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern.: Part B 40(6), 1607–1621 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Sara Khanchi
    • 1
    Email author
  • Malcolm I. Heywood
    • 1
  • Nur Zincir-Heywood
    • 1
  1. 1.Faculty of Computer ScienceDalhousie UniversityHalifaxCanada

Personalised recommendations