Genetic Selection of Training Sets for (Not Only) Artificial Neural Networks

Nalepa, Jakub; Myller, Michal; Piechaczek, Szymon; Hrynczenko, Krzysztof; Kawulok, Michal

doi:10.1007/978-3-319-99987-6_15

Jakub Nalepa^13,14,
Michal Myller^13,14,
Szymon Piechaczek^13,14,
Krzysztof Hrynczenko^13,14 &
…
Michal Kawulok^13,14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 928))

Included in the following conference series:

International Conference: Beyond Databases, Architectures and Structures

919 Accesses
2 Citations

Abstract

Creating high-quality training sets is the first step in designing robust classifiers. However, it is fairly difficult in practice when the data quality is questionable (data is heterogeneous, noisy and/or massively large). In this paper, we show how to apply a genetic algorithm for evolving training sets from data corpora, and exploit it for artificial neural networks (ANNs) alongside other state-of-the-art models. ANNs have been proved very successful in tackling a wide range of pattern recognition tasks. However, they suffer from several drawbacks, with selection of appropriate network topology and training sets being one of the most challenging in practice, especially when ANNs are trained using time-consuming back-propagation. Our experimental study (coupled with statistical tests), performed for both real-life and benchmark datasets, proved the applicability of a genetic algorithm to select training data for various classifiers which then generalize well to unseen data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As an example, there may be massive discrepancies between two equally-experienced readers segmenting a medical image [21].
2.
In this paper, we use examples and vectors of features interchangeably.
3.
Also in the context of recent advances in the field of deep learning, where data augmentation—being a process of extending training sets rather than reducing their size—became a critical step in designing deep network topologies [21].
4.
This database is available at: https://www.uow.edu.au/~phung/download.html; last access: January 4, 2018.
5.
There exist approaches for updating the size of individuals dynamically [17], however we abstract from them in this paper.
6.
This dataset is available at: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original); last access: January 4, 2018.

References

Abhishek, K., Singh, M., Ghosh, S., Anand, A.: Weather forecasting model using artificial neural network. Proced. Technol. 4, 311–318 (2012)
Article Google Scholar
Aibinu, A., Shafie, A., Salami, M.: Performance analysis of ANN based YCbCr skin detection algorithm. Proced. Eng. 41, 1183–1189 (2012)
Article Google Scholar
Balcázar, J., Dai, Y., Watanabe, O.: A random sampling technique for training support vector machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS, vol. 2225, pp. 119–134. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45583-3_11
Chapter Google Scholar
Cervantes, J., Lamont, F.G., López-Chau, A., Mazahua, L.R., Ruíz, J.S.: Data selection based on decision tree for SVM classification on large data sets. Appl. Soft Comput. 37, 787–798 (2015)
Article Google Scholar
Cho, S., Cha, K.: Evolution of neural network training set through addition of virtual samples. In: Proceedings of IEEE CEC, pp. 685–688 (1996)
Google Scholar
Chojaczyk, A., Teixeira, A., Neves, L., Cardoso, J., Soares, C.G.: Review and application of artificial neural networks models in reliability analysis of steel structures. Struct. Saf. 52, 78–89 (2015). http://www.sciencedirect.com/science/article/pii/S016747301400085X
Article Google Scholar
Ding, S., Li, H., Su, C., Yu, J., Jin, F.: Evolutionary artificial neural networks: a review. Artif. Intell. Rev. 39(3), 251–260 (2013)
Article Google Scholar
Hilado, S.D.F., Dadios, E.P., Gustilo, R.C.: Face detection using neural networks with skin segmentation. In: Proceedings of IEEE CIS, pp. 261–265 (2011)
Google Scholar
Himmelblau, D.M.: Applications of artificial neural networks in chemical engineering. Korean J. Chem. Eng. 17(4), 373–392 (2000)
Article Google Scholar
Kamp, R.G., Savenije, H.H.G.: Optimising training data for ANNs with genetic algorithms. Hydrol. Earth Syst. Sci. 10, 603–608 (2006)
Article Google Scholar
Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm, SSPR/SPR. In: Gimel’farb, G. (ed.) Structural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 7626, pp. 557–565. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34166-3_61
Chapter Google Scholar
Li, Y.: Selecting training points for one-class support vector machines. Pattern Recogn. Lett. 32(11), 1517–1522 (2011)
Article Google Scholar
Liu, B.: Application of artificial neural networks in computer-aided diagnosis. In: Cartwright, H. (ed.) Artificial Neural Networks. MIMB, vol. 1260, pp. 195–204. Springer, New York (2015). https://doi.org/10.1007/978-1-4939-2239-0_12
Chapter Google Scholar
Millard, K., Richardson, M.: On the importance of training data sample selection in random forest image classification: a case study in peatland ecosystem mapping. Remote Sens. 7(7), 8489–8515 (2015)
Article Google Scholar
Mirończuk, M., Protasiewicz, J.: A diversified classification committee for recognition of innovative internet domains. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 368–383. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_29
Chapter Google Scholar
Moghaddam, A.H., Moghaddam, M.H., Esfandyari, M.: Stock market index prediction using artificial neural network. J. Econ. Finan. Adm. Sci. 21(41), 89–93 (2016)
Article Google Scholar
Nalepa, J., Kawulok, M.: Adaptive genetic algorithm to select training data for support vector machines. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 514–525. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45523-4_42
Chapter Google Scholar
Nalepa, J., Kawulok, M.: Adaptive memetic algorithm enhanced with data geometry analysis to select training data for SVMs. Neurocomputing 185, 113–132 (2016). http://www.sciencedirect.com/science/article/pii/S0925231215019839
Article Google Scholar
Nalepa, J., Kawulok, M.: Selecting training sets for support vector machines: a review. Artifi. Intell. Rev. pp. 1–44 (2018). https://doi.org/10.1007/s10462-017-9611-1
Nguyen, H.B., Xue, B., Andreae, P.: Surrogate-model based particle swarm optimisation with local search for feature selection in classification. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 487–505. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55849-3_32
Chapter Google Scholar
Pawełczyk, K., et al.: Towards detecting high-uptake lesions from lung ct scans using deep learning. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 310–320. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_29
Chapter Google Scholar
Plechawska-Wojcik, M., Wolszczak, P.: Appling of neural networks to classification of brain-computer interface data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 485–496. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_37
Chapter Google Scholar
Przybyła-Kasperek, M.: Two methods of combining classifiers, which are based on decision templates and theory of evidence, in a dispersed decision-making system. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015-2016. CCIS, vol. 613, pp. 109–119. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-34099-9_7
Chapter Google Scholar
Reeves, C.R., Taylor, S.J.: Selection of training data for neural networks by a genetic algorithm. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) PPSN 1998. LNCS, vol. 1498, pp. 633–642. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0056905
Chapter Google Scholar
Starosolski, R.: Lossless Compression of medical and natural high bit depth sparse histogram images. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015. CCIS, vol. 521, pp. 363–376. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18422-7_32
Chapter Google Scholar
Wesolowski, M., Suchacz, B.: Artificial neural networks: theoretical background and pharmaceutical applications: a review. J. AOAC 95(3), 652–668 (2012)
Article Google Scholar
Yao, X., Islam, M.M.: Evolving artificial neural network ensembles. IEEE Comput. Intell. Mag. 3(1), 31–42 (2008)
Article Google Scholar
Zhang, G., Yan, P., Zhao, H., Zhang, X.: A computer aided diagnosis system in mammography using artificial neural networks. In: Proceedings ICBEI, vol. 2, pp. 823–826 (2008)
Google Scholar

Download references

Acknowledgments

JN and MK were supported by the National Science Centre, Poland, under Research Grant No. DEC-2017/25/B/ST6/00474, and JN was supported by the Silesian University of Technology under the Grant BKM-509/RAu2/2017. This work was partially supported by the Polish National Centre for Research and Development (Innomed grant, POIR.01.02.00-00-0030/15).

Author information

Authors and Affiliations

Silesian University of Technology, Gliwice, Poland
Jakub Nalepa, Michal Myller, Szymon Piechaczek, Krzysztof Hrynczenko & Michal Kawulok
Future Processing, Gliwice, Poland
Jakub Nalepa, Michal Myller, Szymon Piechaczek, Krzysztof Hrynczenko & Michal Kawulok

Authors

Jakub Nalepa
View author publications
You can also search for this author in PubMed Google Scholar
Michal Myller
View author publications
You can also search for this author in PubMed Google Scholar
Szymon Piechaczek
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Hrynczenko
View author publications
You can also search for this author in PubMed Google Scholar
Michal Kawulok
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Nalepa .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nalepa, J., Myller, M., Piechaczek, S., Hrynczenko, K., Kawulok, M. (2018). Genetic Selection of Training Sets for (Not Only) Artificial Neural Networks. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Facing the Challenges of Data Proliferation and Growing Variety. BDAS 2018. Communications in Computer and Information Science, vol 928. Springer, Cham. https://doi.org/10.1007/978-3-319-99987-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-99987-6_15
Published: 31 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99986-9
Online ISBN: 978-3-319-99987-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics