Advertisement

Deep Learning Architecture Search by Neuro-Cell-Based Evolution with Function-Preserving Mutations

  • Martin WistubaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)

Abstract

The design of convolutional neural network architectures for a new image data set is a laborious and computational expensive task which requires expert knowledge. We propose a novel neuro-evolutionary technique to solve this problem without human interference. Our method assumes that a convolutional neural network architecture is a sequence of neuro-cells and keeps mutating them using function-preserving operations. This novel combination of approaches has several advantages. We define the network architecture by a sequence of repeating neuro-cells which reduces the search space complexity. Furthermore, these cells are possibly transferable and can be used in order to arbitrarily extend the complexity of the network. Mutations based on function-preserving operations guarantee better parameter initialization than random initialization such that less training time is required per network architecture. Our proposed method finds within 12 GPU hours neural network architectures that can achieve a classification error of about 4% and 24% with only 5.5 and 6.5 million parameters on CIFAR-10 and CIFAR-100, respectively. In comparison to competitor approaches, our method provides similar competitive results but requires orders of magnitudes less search time and in many cases less network parameters.

Keywords

Automated machine learning Neural architecture search Evolutionary algorithms 

References

  1. 1.
    Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. In: Proceedings of the International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April (2017)Google Scholar
  2. 2.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J.: Reinforcement learning for architecture search by network transformation. CoRR abs/1707.04873 (2017)Google Scholar
  4. 4.
    Chen, T., Goodfellow, I.J., Shlens, J.: Net2Net: accelerating learning via knowledge transfer. In: Proceedings of the International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May (2016)Google Scholar
  5. 5.
    Chollet, F.: Xception: deep learning with depthwise separable convolutions. CoRR abs/1610.02357 (2016)Google Scholar
  6. 6.
    Devries, T., Taylor, G.W.: Improved regularization of convolutional neural networks with cutout. CoRR abs/1708.04552 (2017)Google Scholar
  7. 7.
    Diaz, G.I., Fokoue-Nkoutche, A., Nannicini, G., Samulowitz, H.: An effective algorithm for hyperparameter optimization of neural networks. IBM J. Res. Dev. 61(4), 9 (2017)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2016)Google Scholar
  9. 9.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 2261–2269 (2017)Google Scholar
  10. 10.
    Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep Networks with Stochastic Depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_39CrossRefGoogle Scholar
  11. 11.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 448–456 (2015)Google Scholar
  12. 12.
    Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)Google Scholar
  13. 13.
    Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: Proceedings of the International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April (2017)Google Scholar
  14. 14.
    Liu, C., et al.: Progressive neural architecture search. CoRR abs/1712.00559 (2017)Google Scholar
  15. 15.
    Liu, H., Simonyan, K., Vinyals, O., Fernando, C., Kavukcuoglu, K.: Hierarchical representations for efficient architecture search. In: Proceedings of the International Conference on Learning Representations, ICLR 2018, Vancouver, Canada (2018)Google Scholar
  16. 16.
    Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts. In: Proceedings of the International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April (2017)Google Scholar
  17. 17.
    Miikkulainen, R., et al.: Evolving deep neural networks. CoRR abs/1703.00548 (2017)Google Scholar
  18. 18.
    Miller, G.F., Todd, P.M., Hegde, S.U.: Designing neural networks using genetic algorithms. In: Proceedings of the 3rd International Conference on Genetic Algorithms, June 1989, pp. 379–384. George Mason University, Fairfax, Virginia, USA (1989)Google Scholar
  19. 19.
    Negrinho, R., Gordon, G.J.: Deeparchitect: Automatically designing and training deep architectures. CoRR abs/1704.08792 (2017)Google Scholar
  20. 20.
    Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 2902–2911 (2017)Google Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  22. 22.
    Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Proceedings of a meeting held 3–6 December 2012, Lake Tahoe, Nevada, United States, pp. 2960–2968 (2012)Google Scholar
  23. 23.
    Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)CrossRefGoogle Scholar
  24. 24.
    Suganuma, M., Shirakawa, S., Nagao, T.: A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2017, Berlin, Germany, 15–19 July 2017, pp. 497–504 (2017)Google Scholar
  25. 25.
    Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015, pp. 1–9 (2015)Google Scholar
  26. 26.
    Wistuba, M.: Bayesian optimization combined with successive halving for neural network architecture optimization. In: Proceedings of AutoML@PKDD/ECML 2017, Skopje, Macedonia, 22 September 2017, pp. 2–11 (2017)Google Scholar
  27. 27.
    Wistuba, M.: Finding competitive network architectures within a day using UCT. CoRR abs/1712.07420 (2017)Google Scholar
  28. 28.
    Yu, G., Smith, D.K., Zhu, H., Guan, Y., Lam, T.T.Y.: ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8(1), 28–36 (2016)CrossRefGoogle Scholar
  29. 29.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, 19–22 September 2016 (2016)Google Scholar
  30. 30.
    Zhong, Z., Yan, J., Liu, C.: Practical network blocks design with q-learning. CoRR abs/1708.05552 (2017)Google Scholar
  31. 31.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: Proceedings of the International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April (2017)Google Scholar
  32. 32.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. CoRR abs/1707.07012 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.IBM ResearchDublinIreland

Personalised recommendations