Skip to main content

Architecture-Aware Bayesian Optimization for Neural Network Tuning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11728))

Abstract

Hyperparameter optimization of a neural network is a non-trivial task. It is time-consuming to evaluate a hyperparameter setting, no analytical expression of the impact of the hyperparameters are available, and the evaluations are noisy in the sense that the result is dependent on the training process and weight initialization. Bayesian optimization is a powerful tool to handle these problems. However, hyperparameter optimization of neural networks poses additional challenges, since the hyperparameters can be integer-valued, categorical, and/or conditional, whereas Bayesian optimization often assumes variables to be real-valued. In this paper we present an architecture-aware transformation of neural networks applied in the kernel of a Gaussian process to boost the performance of hyperparameter optimization.

The empirical experiment in this paper demonstrates that by introducing an architecture-aware transformation of the kernel, the performance of the Bayesian optimizer shows a clear improvement over a naïve implementation and that the results are comparable to other state-of-the-art methods.

This research was supported by the project Root Cause Analysis of Quality Deviations in Manufacturing using Machine Learning (RCA-ML) in the funding program The smart digital factory (DNR 2016-04472), administered by VINNOVA, the Swedish Government Agency for Innovation Systems. It was also developed in the Fraunhofer Cluster of Excellence Cognitive Internet Technologies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: a Python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference, pp. 13–20. Citeseer (2013)

    Google Scholar 

  2. Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)

    Google Scholar 

  3. Brochu, E., Brochu, T., de Freitas, N.: A Bayesian interactive optimization approach to procedural animation design. In: Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 103–112. Eurographics Association (2010)

    Google Scholar 

  4. Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)

  5. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)

    Google Scholar 

  6. Garrido-Merchán, E.C., Hernández-Lobato, D.: Dealing with categorical and integer-valued variables in Bayesian optimization with Gaussian processes. arXiv preprint arXiv:1805.03463 (2018)

  7. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970). https://doi.org/10.1093/biomet/57.1.97

    Article  MathSciNet  MATH  Google Scholar 

  8. Hernández-Lobato, J.M., Hoffman, M.W., Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. In: Advances in Neural Information Processing Systems, pp. 918–926 (2014)

    Google Scholar 

  9. Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)

    MathSciNet  MATH  Google Scholar 

  10. Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Global Optim. 34(3), 441–466 (2006)

    Article  MathSciNet  Google Scholar 

  11. Jiménez, J., Ginebra, J.: pyGPGO: Bayesian optimization for Python. J. Open Source Softw. 2, 431 (2017)

    Article  Google Scholar 

  12. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998)

    Article  MathSciNet  Google Scholar 

  13. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  14. Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset, p. 4 (2014). http://www.cs.toronto.edu/kriz/cifar.html

  15. Lévesque, J.C., Durand, A., Gagné, C., Sabourin, R.: Bayesian optimization for conditional hyperparameter spaces. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 286–293. IEEE (2017)

    Google Scholar 

  16. Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2924–2932 (2014)

    Google Scholar 

  17. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  18. Salvatier, J., Fonnesbeck, C., et al.: PyMC3: Python probabilistic programming framework. Astrophysics Source Code Library (2016)

    Google Scholar 

  19. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)

    Google Scholar 

  20. Snoek, J.R.: Bayesian optimization and semiparametric models with applications to assistive technology. Ph.D. thesis, University of Toronto (2013)

    Google Scholar 

  21. Torczon, V.: On the convergence of pattern search algorithms. SIAM J. Optim. 7(1), 1–25 (1997)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anders Sjöberg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sjöberg, A., Önnheim, M., Gustavsson, E., Jirstrand, M. (2019). Architecture-Aware Bayesian Optimization for Neural Network Tuning. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30484-3_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30483-6

  • Online ISBN: 978-3-030-30484-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics