Skip to main content

Genetic Selection of Training Sets for (Not Only) Artificial Neural Networks

  • Conference paper
  • First Online:
Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety (BDAS 2018)

Abstract

Creating high-quality training sets is the first step in designing robust classifiers. However, it is fairly difficult in practice when the data quality is questionable (data is heterogeneous, noisy and/or massively large). In this paper, we show how to apply a genetic algorithm for evolving training sets from data corpora, and exploit it for artificial neural networks (ANNs) alongside other state-of-the-art models. ANNs have been proved very successful in tackling a wide range of pattern recognition tasks. However, they suffer from several drawbacks, with selection of appropriate network topology and training sets being one of the most challenging in practice, especially when ANNs are trained using time-consuming back-propagation. Our experimental study (coupled with statistical tests), performed for both real-life and benchmark datasets, proved the applicability of a genetic algorithm to select training data for various classifiers which then generalize well to unseen data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    As an example, there may be massive discrepancies between two equally-experienced readers segmenting a medical image  [21].

  2. 2.

    In this paper, we use examples and vectors of features interchangeably.

  3. 3.

    Also in the context of recent advances in the field of deep learning, where data augmentation—being a process of extending training sets rather than reducing their size—became a critical step in designing deep network topologies [21].

  4. 4.

    This database is available at: https://www.uow.edu.au/~phung/download.html; last access: January 4, 2018.

  5. 5.

    There exist approaches for updating the size of individuals dynamically [17], however we abstract from them in this paper.

  6. 6.

    This dataset is available at: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original); last access: January 4, 2018.

References

  1. Abhishek, K., Singh, M., Ghosh, S., Anand, A.: Weather forecasting model using artificial neural network. Proced. Technol. 4, 311–318 (2012)

    Article  Google Scholar 

  2. Aibinu, A., Shafie, A., Salami, M.: Performance analysis of ANN based YCbCr skin detection algorithm. Proced. Eng. 41, 1183–1189 (2012)

    Article  Google Scholar 

  3. Balcázar, J., Dai, Y., Watanabe, O.: A random sampling technique for training support vector machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS, vol. 2225, pp. 119–134. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45583-3_11

    Chapter  Google Scholar 

  4. Cervantes, J., Lamont, F.G., López-Chau, A., Mazahua, L.R., Ruíz, J.S.: Data selection based on decision tree for SVM classification on large data sets. Appl. Soft Comput. 37, 787–798 (2015)

    Article  Google Scholar 

  5. Cho, S., Cha, K.: Evolution of neural network training set through addition of virtual samples. In: Proceedings of IEEE CEC, pp. 685–688 (1996)

    Google Scholar 

  6. Chojaczyk, A., Teixeira, A., Neves, L., Cardoso, J., Soares, C.G.: Review and application of artificial neural networks models in reliability analysis of steel structures. Struct. Saf. 52, 78–89 (2015). http://www.sciencedirect.com/science/article/pii/S016747301400085X

    Article  Google Scholar 

  7. Ding, S., Li, H., Su, C., Yu, J., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)

    Article  Google Scholar 

  8. Hilado, S.D.F., Dadios, E.P., Gustilo, R.C.: Face detection using neural networks with skin segmentation. In: Proceedings of IEEE CIS, pp. 261–265 (2011)

    Google Scholar 

  9. Himmelblau, D.M.: Applications of artificial neural networks in chemical engineering. Korean J. Chem. Eng. 17(4), 373–392 (2000)

    Article  Google Scholar 

  10. Kamp, R.G., Savenije, H.H.G.: Optimising training data for ANNs with genetic algorithms. Hydrol. Earth Syst. Sci. 10, 603–608 (2006)

    Article  Google Scholar 

  11. Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm, SSPR/SPR. In: Gimel’farb, G. (ed.) Structural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 7626, pp. 557–565. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34166-3_61

    Chapter  Google Scholar 

  12. Li, Y.: Selecting training points for one-class support vector machines. Pattern Recogn. Lett. 32(11), 1517–1522 (2011)

    Article  Google Scholar 

  13. Liu, B.: Application of artificial neural networks in computer-aided diagnosis. In: Cartwright, H. (ed.) Artificial Neural Networks. MIMB, vol. 1260, pp. 195–204. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-2239-0_12

    Chapter  Google Scholar 

  14. Millard, K., Richardson, M.: On the importance of training data sample selection in random forest image classification: a case study in peatland ecosystem mapping. Remote Sens. 7(7), 8489–8515 (2015)

    Article  Google Scholar 

  15. Mirończuk, M., Protasiewicz, J.: A diversified classification committee for recognition of innovative internet domains. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 368–383. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_29

    Chapter  Google Scholar 

  16. Moghaddam, A.H., Moghaddam, M.H., Esfandyari, M.: Stock market index prediction using artificial neural network. J. Econ. Finan. Adm. Sci. 21(41), 89–93 (2016)

    Article  Google Scholar 

  17. Nalepa, J., Kawulok, M.: Adaptive genetic algorithm to select training data for support vector machines. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 514–525. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45523-4_42

    Chapter  Google Scholar 

  18. Nalepa, J., Kawulok, M.: Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs. Neurocomputing 185, 113–132 (2016). http://www.sciencedirect.com/science/article/pii/S0925231215019839

    Article  Google Scholar 

  19. Nalepa, J., Kawulok, M.: Selecting training sets for support vector machines: a review. Artifi. Intell. Rev. pp. 1–44 (2018). https://doi.org/10.1007/s10462-017-9611-1

  20. Nguyen, H.B., Xue, B., Andreae, P.: Surrogate-model based particle swarm optimisation with local search for feature selection in classification. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 487–505. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55849-3_32

    Chapter  Google Scholar 

  21. Pawełczyk, K., et al.: Towards detecting high-uptake lesions from lung ct scans using deep learning. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 310–320. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_29

    Chapter  Google Scholar 

  22. Plechawska-Wojcik, M., Wolszczak, P.: Appling of neural networks to classification of brain-computer interface data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 485–496. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_37

    Chapter  Google Scholar 

  23. Przybyła-Kasperek, M.: Two methods of combining classifiers, which are based on decision templates and theory of evidence, in a dispersed decision-making system. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 109–119. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_7

    Chapter  Google Scholar 

  24. Reeves, C.R., Taylor, S.J.: Selection of training data for neural networks by a genetic algorithm. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 633–642. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0056905

    Chapter  Google Scholar 

  25. Starosolski, R.: Lossless Compression of medical and natural high bit depth sparse histogram images. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015. CCIS, vol. 521, pp. 363–376. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18422-7_32

    Chapter  Google Scholar 

  26. Wesolowski, M., Suchacz, B.: Artificial neural networks: theoretical background and pharmaceutical applications: a review. J. AOAC 95(3), 652–668 (2012)

    Article  Google Scholar 

  27. Yao, X., Islam, M.M.: Evolving artificial neural network ensembles. IEEE Comput. Intell. Mag. 3(1), 31–42 (2008)

    Article  Google Scholar 

  28. Zhang, G., Yan, P., Zhao, H., Zhang, X.: A computer aided diagnosis system in mammography using artificial neural networks. In: Proceedings ICBEI, vol. 2, pp. 823–826 (2008)

    Google Scholar 

Download references

Acknowledgments

JN and MK were supported by the National Science Centre, Poland, under Research Grant No. DEC-2017/25/B/ST6/00474, and JN was supported by the Silesian University of Technology under the Grant BKM-509/RAu2/2017. This work was partially supported by the Polish National Centre for Research and Development (Innomed grant, POIR.01.02.00-00-0030/15).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Nalepa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nalepa, J., Myller, M., Piechaczek, S., Hrynczenko, K., Kawulok, M. (2018). Genetic Selection of Training Sets for (Not Only) Artificial Neural Networks. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99987-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99986-9

  • Online ISBN: 978-3-319-99987-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics