Gene Expression Programming Ensemble for Classifying Big Datasets

  • Joanna Jȩdrzejowicz
  • Piotr Jȩdrzejowicz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10449)


The paper proposes a new GEP-based batch ensemble classifier constructed using the stacked generalization concept. In our approach combination of base classifiers involves evolving the meta-gene using genes induced by GEP from randomly generated combinations of instances with randomly selected subsets of attributes. The main property of the discussed classifier is its scalability allowing adaptation to the size of the dataset under consideration. To validate the proposed classifier, we have carried-out computational experiment involving a number of publicly available benchmark datasets. Experiment results show that the approach assures good performance, scalability and robustness.


Gene expression Classification Big data sets 


  1. 1.
    Álvarez, A., Sierra, B., Arruti, A., Gil, J.M.L., Garay-Vitoria, N.: Classifier subset selection for the stacked generalization method applied to emotion recognition in speech. Sensors 16(1), 21 (2016)CrossRefGoogle Scholar
  2. 2.
    Awwalu, J., Ghazvini, A., Bakar, A.A.: Comparative analysis of algorithms in supervised classification: a case study of bank notes dataset. Int. J. Comput. Trends Technol. 17(1), 38–43 (2014)Google Scholar
  3. 3.
    Ávila-Jiménez, J.L., Gibaja Galindo, E.L., Zafra, A., Ventura, S.: A gene expression programming algorithm for multi-label classification. Multiple-Valued Logic Soft Comput. 17(2–3), 183–206 (2011)Google Scholar
  4. 4.
    Crain, K., Davis, G.: Classifying forest cover type using cartographic features. Stanford University (2014)Google Scholar
  5. 5.
    Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. CoRR, cs.AI/0102027 (2001)Google Scholar
  6. 6.
    Ferreira, C.: Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. Studies in Computational Intelligence, vol. 21. Springer, Heidelberg (2006). doi: 10.1007/3-540-32849-1CrossRefzbMATHGoogle Scholar
  7. 7.
    Hosseini, S.A., Rabiee, H.R., Hafez, H., Soltani-Farani, A.: Classifying a stream of infinite concepts: a Bayesian non-parametric approach. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS, vol. 8724, pp. 1–16. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44848-9_1CrossRefGoogle Scholar
  8. 8.
    Jȩdrzejowicz, J., Jȩdrzejowicz, P.: GEP-induced expression trees as weak classifiers. In: Perner, P. (ed.) ICDM 2008. LNCS, vol. 5077, pp. 129–141. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-70720-2_10CrossRefGoogle Scholar
  9. 9.
    Jȩdrzejowicz, J., Jȩdrzejowicz, P.: A family of GEP-induced ensemble classifiers. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 641–652. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-04441-0_56CrossRefGoogle Scholar
  10. 10.
    Jȩdrzejowicz, J., Jȩdrzejowicz, P.: Experimental evaluation of two new GEP-based ensemble classifiers. Expert Syst. Appl. 38(9), 10932–10939 (2011)CrossRefGoogle Scholar
  11. 11.
    Jȩdrzejowicz, J., Jȩdrzejowicz, P.: Combining expression trees. In: 2013 IEEE International Conference on Cybernetics, CYBCONF 2013, Lausanne, Switzerland, 13–15 June 2013, pp. 80–85. IEEE (2013)Google Scholar
  12. 12.
    Johnson, B.A., Tateishi, R., Thanh, H.N.: A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees. Int. J. Remote Sens. 34(20), 6969–6982 (2013)CrossRefGoogle Scholar
  13. 13.
    Karakasis, V., Stafylopatis, A.: Data mining based on gene expression programming and Clonal selection. In: IEEE International Conference on Evolutionary Computation, CEC 2006, part of WCCI 2006, Vancouver, BC, Canada, 16–21 July 2006, pp. 514–521. IEEE (2006)Google Scholar
  14. 14.
    Koc, A.A., Yeniay, O.: A comparative study of artificial neural networks and logistic regression for classification of marketing campaign results. Math. Comput. Appl. 18(3), 392–398 (2013)Google Scholar
  15. 15.
    Li, X., Zhou, C., Xiao, W., Nelson, P.C.: Prefix gene expression programming. In: Rothlauf, F. (ed.) Late Breaking Paper at Genetic and Evolutionary Computation Conference (GECCO 2005), Washington, D.C., USA, pp. 25–29, June 2005Google Scholar
  16. 16.
    Lichman, M.: UCI machine learning repository (2013)Google Scholar
  17. 17.
    Liu, S., Liu, Z., Sun, J., Liu, L.: Application of synergetic neural network in online writeprint identification. Int. J. Digit. Content Technol. Appl. 5(3), 126–135 (2011)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Mertayak, C.: Utilization of dimensionality reduction in stacked generalization architecture. In: The 24th International Symposium on Computer and Information Sciences, ISCIS 2009, 14–16 September 2009, North Cyprus, pp. 88–93. IEEE (2009)Google Scholar
  19. 19.
    Olorunnimbe, M.K., Viktor, H.L., Paquet, E.: Intelligent adaptive ensembles for data stream mining: a high return on investment approach. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2015. LNCS, vol. 9607, pp. 61–75. Springer, Cham (2016). doi: 10.1007/978-3-319-39315-5_5CrossRefGoogle Scholar
  20. 20.
    Pesaranghader, A., Viktor, H.L.: Fast hoeffding drift detection method for evolving data streams. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS, vol. 9852, pp. 96–111. Springer, Cham (2016). doi: 10.1007/978-3-319-46227-1_7CrossRefGoogle Scholar
  21. 21.
    Ting, K.M., Witten, I.H.: Issues in stacked generalization. J. Artif. Intell. Res. (JAIR) 10, 271–289 (1999)zbMATHGoogle Scholar
  22. 22.
    Turkov, P., Krasotkina, O., Mottl, V.: Dynamic programming for bayesian logistic regression learning under concept drift. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds.) PReMI 2013. LNCS, vol. 8251, pp. 190–195. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-45062-4_26CrossRefGoogle Scholar
  23. 23.
    Weinert, W.R., Lopes, H.S.: GEPCLASS: a classification rule discovery tool using gene expression programming. In: Li, X., Zaïane, O.R., Li, Z. (eds.) ADMA 2006. LNCS, vol. 4093, pp. 871–880. Springer, Heidelberg (2006). doi: 10.1007/11811305_95CrossRefGoogle Scholar
  24. 24.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRefGoogle Scholar
  25. 25.
    Yeh, I.-C., Lien, C.H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2, Part 1), 2473–2480 (2009)CrossRefGoogle Scholar
  26. 26.
    Zeng, T., Tang, C., Xiang, Y., Chen, P., Liu, Y.: A model of immune gene expression programming for rule mining. J. Univ. Comput. Sci. 13(10), 1484–1497 (2007). Scholar
  27. 27.
    Zliobaite, I.: Controlled permutations for testing adaptive classifiers. In: Discovery Science, pp. 365–379 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institute of Informatics, Faculty of Mathematics, Physics and InformaticsUniversity of GdańskGdańskPoland
  2. 2.Department of Information SystemsGdynia Maritime UniversityGdyniaPoland

Personalised recommendations