Genetic Selection of Training Sets for (Not Only) Artificial Neural Networks

  • Jakub NalepaEmail author
  • Michal Myller
  • Szymon Piechaczek
  • Krzysztof Hrynczenko
  • Michal Kawulok
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 928)


Creating high-quality training sets is the first step in designing robust classifiers. However, it is fairly difficult in practice when the data quality is questionable (data is heterogeneous, noisy and/or massively large). In this paper, we show how to apply a genetic algorithm for evolving training sets from data corpora, and exploit it for artificial neural networks (ANNs) alongside other state-of-the-art models. ANNs have been proved very successful in tackling a wide range of pattern recognition tasks. However, they suffer from several drawbacks, with selection of appropriate network topology and training sets being one of the most challenging in practice, especially when ANNs are trained using time-consuming back-propagation. Our experimental study (coupled with statistical tests), performed for both real-life and benchmark datasets, proved the applicability of a genetic algorithm to select training data for various classifiers which then generalize well to unseen data.


ANN Genetic algorithm Training set selection Classification 



JN and MK were supported by the National Science Centre, Poland, under Research Grant No. DEC-2017/25/B/ST6/00474, and JN was supported by the Silesian University of Technology under the Grant BKM-509/RAu2/2017. This work was partially supported by the Polish National Centre for Research and Development (Innomed grant, POIR.01.02.00-00-0030/15).


  1. 1.
    Abhishek, K., Singh, M., Ghosh, S., Anand, A.: Weather forecasting model using artificial neural network. Proced. Technol. 4, 311–318 (2012)CrossRefGoogle Scholar
  2. 2.
    Aibinu, A., Shafie, A., Salami, M.: Performance analysis of ANN based YCbCr skin detection algorithm. Proced. Eng. 41, 1183–1189 (2012)CrossRefGoogle Scholar
  3. 3.
    Balcázar, J., Dai, Y., Watanabe, O.: A random sampling technique for training support vector machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS, vol. 2225, pp. 119–134. Springer, Heidelberg (2001). Scholar
  4. 4.
    Cervantes, J., Lamont, F.G., López-Chau, A., Mazahua, L.R., Ruíz, J.S.: Data selection based on decision tree for SVM classification on large data sets. Appl. Soft Comput. 37, 787–798 (2015)CrossRefGoogle Scholar
  5. 5.
    Cho, S., Cha, K.: Evolution of neural network training set through addition of virtual samples. In: Proceedings of IEEE CEC, pp. 685–688 (1996)Google Scholar
  6. 6.
    Chojaczyk, A., Teixeira, A., Neves, L., Cardoso, J., Soares, C.G.: Review and application of artificial neural networks models in reliability analysis of steel structures. Struct. Saf. 52, 78–89 (2015). Scholar
  7. 7.
    Ding, S., Li, H., Su, C., Yu, J., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)CrossRefGoogle Scholar
  8. 8.
    Hilado, S.D.F., Dadios, E.P., Gustilo, R.C.: Face detection using neural networks with skin segmentation. In: Proceedings of IEEE CIS, pp. 261–265 (2011)Google Scholar
  9. 9.
    Himmelblau, D.M.: Applications of artificial neural networks in chemical engineering. Korean J. Chem. Eng. 17(4), 373–392 (2000)CrossRefGoogle Scholar
  10. 10.
    Kamp, R.G., Savenije, H.H.G.: Optimising training data for ANNs with genetic algorithms. Hydrol. Earth Syst. Sci. 10, 603–608 (2006)CrossRefGoogle Scholar
  11. 11.
    Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm, SSPR/SPR. In: Gimel’farb, G. (ed.) Structural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 7626, pp. 557–565. Springer, Heidelberg (2012). Scholar
  12. 12.
    Li, Y.: Selecting training points for one-class support vector machines. Pattern Recogn. Lett. 32(11), 1517–1522 (2011)CrossRefGoogle Scholar
  13. 13.
    Liu, B.: Application of artificial neural networks in computer-aided diagnosis. In: Cartwright, H. (ed.) Artificial Neural Networks. MIMB, vol. 1260, pp. 195–204. Springer, New York (2015). Scholar
  14. 14.
    Millard, K., Richardson, M.: On the importance of training data sample selection in random forest image classification: a case study in peatland ecosystem mapping. Remote Sens. 7(7), 8489–8515 (2015)CrossRefGoogle Scholar
  15. 15.
    Mirończuk, M., Protasiewicz, J.: A diversified classification committee for recognition of innovative internet domains. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 368–383. Springer, Cham (2016). Scholar
  16. 16.
    Moghaddam, A.H., Moghaddam, M.H., Esfandyari, M.: Stock market index prediction using artificial neural network. J. Econ. Finan. Adm. Sci. 21(41), 89–93 (2016)CrossRefGoogle Scholar
  17. 17.
    Nalepa, J., Kawulok, M.: Adaptive genetic algorithm to select training data for support vector machines. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 514–525. Springer, Heidelberg (2014). Scholar
  18. 18.
    Nalepa, J., Kawulok, M.: Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs. Neurocomputing 185, 113–132 (2016). Scholar
  19. 19.
    Nalepa, J., Kawulok, M.: Selecting training sets for support vector machines: a review. Artifi. Intell. Rev. pp. 1–44 (2018).
  20. 20.
    Nguyen, H.B., Xue, B., Andreae, P.: Surrogate-model based particle swarm optimisation with local search for feature selection in classification. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 487–505. Springer, Cham (2017). Scholar
  21. 21.
    Pawełczyk, K., et al.: Towards detecting high-uptake lesions from lung ct scans using deep learning. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 310–320. Springer, Cham (2017). Scholar
  22. 22.
    Plechawska-Wojcik, M., Wolszczak, P.: Appling of neural networks to classification of brain-computer interface data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 485–496. Springer, Cham (2016). Scholar
  23. 23.
    Przybyła-Kasperek, M.: Two methods of combining classifiers, which are based on decision templates and theory of evidence, in a dispersed decision-making system. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 109–119. Springer, Cham (2016). Scholar
  24. 24.
    Reeves, C.R., Taylor, S.J.: Selection of training data for neural networks by a genetic algorithm. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 633–642. Springer, Heidelberg (1998). Scholar
  25. 25.
    Starosolski, R.: Lossless Compression of medical and natural high bit depth sparse histogram images. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015. CCIS, vol. 521, pp. 363–376. Springer, Cham (2015). Scholar
  26. 26.
    Wesolowski, M., Suchacz, B.: Artificial neural networks: theoretical background and pharmaceutical applications: a review. J. AOAC 95(3), 652–668 (2012)CrossRefGoogle Scholar
  27. 27.
    Yao, X., Islam, M.M.: Evolving artificial neural network ensembles. IEEE Comput. Intell. Mag. 3(1), 31–42 (2008)CrossRefGoogle Scholar
  28. 28.
    Zhang, G., Yan, P., Zhao, H., Zhang, X.: A computer aided diagnosis system in mammography using artificial neural networks. In: Proceedings ICBEI, vol. 2, pp. 823–826 (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Jakub Nalepa
    • 1
    • 2
    Email author
  • Michal Myller
    • 1
    • 2
  • Szymon Piechaczek
    • 1
    • 2
  • Krzysztof Hrynczenko
    • 1
    • 2
  • Michal Kawulok
    • 1
    • 2
  1. 1.Silesian University of TechnologyGliwicePoland
  2. 2.Future ProcessingGliwicePoland

Personalised recommendations