Particle Swarm Optimization for Evolving Deep Convolutional Neural Networks for Image Classification: Single- and Multi-Objective Approaches

  • Bin Wang
  • Bing Xue
  • Mengjie ZhangEmail author
Part of the Natural Computing Series book series (NCS)


Convolutional neural networks (CNNs) are one of the most effective deep learning methods to solve image classification problems, but the design of the CNN architectures is mainly done manually, which is very time consuming and requires expertise in both problem domains and CNNs. In this chapter, we will describe an approach to the use of particle swarm optimization (PSO) for automatically searching for and learning the optimal CNN architectures. We will provide an encoding strategy inspired by computer networks to encode CNN layers and to allow the proposed method to learn variable-length CNN architectures by focusing only on the single objective of maximizing the classification accuracy. A surrogate dataset will be used to speed up the evolutionary learning process. We will also include a multi-objective way for PSO to evolve CNN architectures in the chapter. The PSO-based algorithms are examined and compared with state-of-the-art algorithms on a number of widely used image classification benchmark datasets. The experimental results show that the proposed algorithms are strong competitors to the state-of-the-art algorithms in terms of classification error. A major advantage of the proposed methods is the automated design of CNN architectures without requiring human intervention and good performance of the learned CNNs.


  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. Preprint. arXiv:1603.04467 (2016)Google Scholar
  2. 2.
    Assunção, F., Lourenço, N., Machado, P., Ribeiro, B.: Evolving the topology of large scale deep neural networks. In: European Conference on Genetic Programming, pp. 19–34. Springer, Berlin (2018)Google Scholar
  3. 3.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, Berlin (2010)Google Scholar
  4. 4.
    Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)Google Scholar
  5. 5.
    Chan, T.H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: PCANet: a simple deep learning baseline for image classification? IEEE Trans. Image Process. 24(12), 5017–5032 (2015)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)Google Scholar
  7. 7.
    Fuller, V., Li, T., Yu, J., Varadhan, K.: Classless inter-domain routing (CIDR): an address assignment and aggregation strategy (1993)Google Scholar
  8. 8.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  9. 9.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).
  11. 11.
    Hinton, G.E.: A practical guide to training restricted Boltzmann machines. In: Neural Networks: Tricks of the Trade, pp. 599–619. Springer, Berlin (2012)Google Scholar
  12. 12.
    Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. CoRR abs/1608.06993 (2016).
  13. 13.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint. arXiv:1502.03167 (2015)Google Scholar
  14. 14.
    Jones, M.T.: Deep Learning Architectures and the Rise of Artificial Intelligence. IBM DeveloperWorks, Armonk (2017)Google Scholar
  15. 15.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. Preprint. arXiv:1412.6980 (2014)Google Scholar
  16. 16.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical Report. Citeseer (2009)Google Scholar
  17. 17.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  18. 18.
    Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 473–480. ACM, New York (2007)Google Scholar
  19. 19.
    Laumanns, M., Thiele, L., Deb, K., Zitzler, E.: Combining convergence and diversity in evolutionary multiobjective optimization. Evol. Comput. 10(3), 263–282 (2002)Google Scholar
  20. 20.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)Google Scholar
  21. 21.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)Google Scholar
  22. 22.
    Li, X.: A non-dominated sorting particle swarm optimizer for multiobjective optimization. In: Genetic and Evolutionary Computation Conference, pp. 37–48. Springer, Berlin (2003)Google Scholar
  23. 23.
    Postel, J.: DoD standard internet protocol (1980)Google Scholar
  24. 24.
    Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q.V., Kurakin, A.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2902–2911. JMLR. org (2017)Google Scholar
  25. 25.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 833–840. Omnipress, Madison (2011)Google Scholar
  26. 26.
    Sierra, M.R., Coello, C.A.C.: Improving PSO-based multi-objective optimization using crowding, mutation and ε-dominance. In: International Conference on Evolutionary Multi-Criterion Optimization, pp. 505–519. Springer, Berlin (2005)Google Scholar
  27. 27.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
  28. 28.
    Sohn, K., Lee, H.: Learning invariant representations with local transformations. Preprint. arXiv:1206.6418 (2012)Google Scholar
  29. 29.
    Sohn, K., Zhou, G., Lee, C., Lee, H.: Learning and selecting features jointly with point-wise gated Boltzmann machines. In: International Conference on Machine Learning, pp. 217–225 (2013)Google Scholar
  30. 30.
    Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)Google Scholar
  31. 31.
    Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 497–504. ACM, New York (2017)Google Scholar
  32. 32.
    Sun, Y., Xue, B., Zhang, M., Yen, G.G.: Evolving deep convolutional neural networks for image classification. IEEE Trans. Evol. Comput. 24(2), 394–407 (2019). Google Scholar
  33. 33.
    Sun, Y., Yen, G.G., Yi, Z.: Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evol. Comput. 23(1), 89–103 (2019). Google Scholar
  34. 34.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  35. 35.
    Van den Bergh, F., Engelbrecht, A.P.: A study of particle swarm optimization particle trajectories. Inf. Sci. 176(8), 937–971 (2006)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Wang, B., Sun, Y., Xue, B., Zhang, M.: Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE, Piscataway (2018)Google Scholar
  37. 37.
    Wang, B., Sun, Y., Xue, B., Zhang, M.: A hybrid differential evolution approach to designing deep convolutional neural networks for image classification. In: Australasian Joint Conference on Artificial Intelligence, pp. 237–250. Springer, Berlin (2018)Google Scholar
  38. 38.
    Wang, B., Sun, Y., Xue, B., Zhang, M.: Evolving deep neural networks by multi-objective particle swarm optimization for image classification. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 490–498. ACM, New York (2019).
  39. 39.
    Wang, B., Sun, Y., Xue, B., Zhang, M.: A hybrid GA-PSO method for evolving architecture and short connections of deep convolutional neural networks. In: Nayak, A.C., Sharma, A. (eds.) PRICAI 2019: Trends in Artificial Intelligence, pp. 650–663. Springer International Publishing, Cham (2019)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations