Abstract
Creating high-quality training sets is the first step in designing robust classifiers. However, it is fairly difficult in practice when the data quality is questionable (data is heterogeneous, noisy and/or massively large). In this paper, we show how to apply a genetic algorithm for evolving training sets from data corpora, and exploit it for artificial neural networks (ANNs) alongside other state-of-the-art models. ANNs have been proved very successful in tackling a wide range of pattern recognition tasks. However, they suffer from several drawbacks, with selection of appropriate network topology and training sets being one of the most challenging in practice, especially when ANNs are trained using time-consuming back-propagation. Our experimental study (coupled with statistical tests), performed for both real-life and benchmark datasets, proved the applicability of a genetic algorithm to select training data for various classifiers which then generalize well to unseen data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
As an example, there may be massive discrepancies between two equally-experienced readers segmenting a medical image  [21].
- 2.
In this paper, we use examples and vectors of features interchangeably.
- 3.
Also in the context of recent advances in the field of deep learning, where data augmentation—being a process of extending training sets rather than reducing their size—became a critical step in designing deep network topologies [21].
- 4.
This database is available at: https://www.uow.edu.au/~phung/download.html; last access: January 4, 2018.
- 5.
There exist approaches for updating the size of individuals dynamically [17], however we abstract from them in this paper.
- 6.
This dataset is available at: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original); last access: January 4, 2018.
References
Abhishek, K., Singh, M., Ghosh, S., Anand, A.: Weather forecasting model using artificial neural network. Proced. Technol. 4, 311–318 (2012)
Aibinu, A., Shafie, A., Salami, M.: Performance analysis of ANN based YCbCr skin detection algorithm. Proced. Eng. 41, 1183–1189 (2012)
Balcázar, J., Dai, Y., Watanabe, O.: A random sampling technique for training support vector machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS, vol. 2225, pp. 119–134. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45583-3_11
Cervantes, J., Lamont, F.G., López-Chau, A., Mazahua, L.R., RuÃz, J.S.: Data selection based on decision tree for SVM classification on large data sets. Appl. Soft Comput. 37, 787–798 (2015)
Cho, S., Cha, K.: Evolution of neural network training set through addition of virtual samples. In: Proceedings of IEEE CEC, pp. 685–688 (1996)
Chojaczyk, A., Teixeira, A., Neves, L., Cardoso, J., Soares, C.G.: Review and application of artificial neural networks models in reliability analysis of steel structures. Struct. Saf. 52, 78–89 (2015). http://www.sciencedirect.com/science/article/pii/S016747301400085X
Ding, S., Li, H., Su, C., Yu, J., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)
Hilado, S.D.F., Dadios, E.P., Gustilo, R.C.: Face detection using neural networks with skin segmentation. In: Proceedings of IEEE CIS, pp. 261–265 (2011)
Himmelblau, D.M.: Applications of artificial neural networks in chemical engineering. Korean J. Chem. Eng. 17(4), 373–392 (2000)
Kamp, R.G., Savenije, H.H.G.: Optimising training data for ANNs with genetic algorithms. Hydrol. Earth Syst. Sci. 10, 603–608 (2006)
Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm, SSPR/SPR. In: Gimel’farb, G. (ed.) Structural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 7626, pp. 557–565. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34166-3_61
Li, Y.: Selecting training points for one-class support vector machines. Pattern Recogn. Lett. 32(11), 1517–1522 (2011)
Liu, B.: Application of artificial neural networks in computer-aided diagnosis. In: Cartwright, H. (ed.) Artificial Neural Networks. MIMB, vol. 1260, pp. 195–204. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-2239-0_12
Millard, K., Richardson, M.: On the importance of training data sample selection in random forest image classification: a case study in peatland ecosystem mapping. Remote Sens. 7(7), 8489–8515 (2015)
Mirończuk, M., Protasiewicz, J.: A diversified classification committee for recognition of innovative internet domains. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 368–383. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_29
Moghaddam, A.H., Moghaddam, M.H., Esfandyari, M.: Stock market index prediction using artificial neural network. J. Econ. Finan. Adm. Sci. 21(41), 89–93 (2016)
Nalepa, J., Kawulok, M.: Adaptive genetic algorithm to select training data for support vector machines. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 514–525. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45523-4_42
Nalepa, J., Kawulok, M.: Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs. Neurocomputing 185, 113–132 (2016). http://www.sciencedirect.com/science/article/pii/S0925231215019839
Nalepa, J., Kawulok, M.: Selecting training sets for support vector machines: a review. Artifi. Intell. Rev. pp. 1–44 (2018). https://doi.org/10.1007/s10462-017-9611-1
Nguyen, H.B., Xue, B., Andreae, P.: Surrogate-model based particle swarm optimisation with local search for feature selection in classification. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 487–505. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55849-3_32
Pawełczyk, K., et al.: Towards detecting high-uptake lesions from lung ct scans using deep learning. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 310–320. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_29
Plechawska-Wojcik, M., Wolszczak, P.: Appling of neural networks to classification of brain-computer interface data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 485–496. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_37
Przybyła-Kasperek, M.: Two methods of combining classifiers, which are based on decision templates and theory of evidence, in a dispersed decision-making system. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 109–119. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_7
Reeves, C.R., Taylor, S.J.: Selection of training data for neural networks by a genetic algorithm. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 633–642. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0056905
Starosolski, R.: Lossless Compression of medical and natural high bit depth sparse histogram images. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015. CCIS, vol. 521, pp. 363–376. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18422-7_32
Wesolowski, M., Suchacz, B.: Artificial neural networks: theoretical background and pharmaceutical applications: a review. J. AOAC 95(3), 652–668 (2012)
Yao, X., Islam, M.M.: Evolving artificial neural network ensembles. IEEE Comput. Intell. Mag. 3(1), 31–42 (2008)
Zhang, G., Yan, P., Zhao, H., Zhang, X.: A computer aided diagnosis system in mammography using artificial neural networks. In: Proceedings ICBEI, vol. 2, pp. 823–826 (2008)
Acknowledgments
JN and MK were supported by the National Science Centre, Poland, under Research Grant No. DEC-2017/25/B/ST6/00474, and JN was supported by the Silesian University of Technology under the Grant BKM-509/RAu2/2017. This work was partially supported by the Polish National Centre for Research and Development (Innomed grant, POIR.01.02.00-00-0030/15).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Nalepa, J., Myller, M., Piechaczek, S., Hrynczenko, K., Kawulok, M. (2018). Genetic Selection of Training Sets for (Not Only) Artificial Neural Networks. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-99987-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99986-9
Online ISBN: 978-3-319-99987-6
eBook Packages: Computer ScienceComputer Science (R0)