Architecture-Aware Bayesian Optimization for Neural Network Tuning

Sjöberg, Anders; Önnheim, Magnus; Gustavsson, Emil; Jirstrand, Mats

doi:10.1007/978-3-030-30484-3_19

Architecture-Aware Bayesian Optimization for Neural Network Tuning

Anders Sjöberg^12,13,
Magnus Önnheim^12,13,
Emil Gustavsson^12,13 &
…
Mats Jirstrand^12,13

Conference paper
First Online: 09 September 2019

3983 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11728))

Abstract

Hyperparameter optimization of a neural network is a non-trivial task. It is time-consuming to evaluate a hyperparameter setting, no analytical expression of the impact of the hyperparameters are available, and the evaluations are noisy in the sense that the result is dependent on the training process and weight initialization. Bayesian optimization is a powerful tool to handle these problems. However, hyperparameter optimization of neural networks poses additional challenges, since the hyperparameters can be integer-valued, categorical, and/or conditional, whereas Bayesian optimization often assumes variables to be real-valued. In this paper we present an architecture-aware transformation of neural networks applied in the kernel of a Gaussian process to boost the performance of hyperparameter optimization.

The empirical experiment in this paper demonstrates that by introducing an architecture-aware transformation of the kernel, the performance of the Bayesian optimizer shows a clear improvement over a naïve implementation and that the results are comparable to other state-of-the-art methods.

This research was supported by the project Root Cause Analysis of Quality Deviations in Manufacturing using Machine Learning (RCA-ML) in the funding program The smart digital factory (DNR 2016-04472), administered by VINNOVA, the Swedish Government Agency for Innovation Systems. It was also developed in the Fraunhofer Cluster of Excellence Cognitive Internet Technologies.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: a Python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference, pp. 13–20. Citeseer (2013)
Google Scholar
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Google Scholar
Brochu, E., Brochu, T., de Freitas, N.: A Bayesian interactive optimization approach to procedural animation design. In: Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 103–112. Eurographics Association (2010)
Google Scholar
Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Google Scholar
Garrido-Merchán, E.C., Hernández-Lobato, D.: Dealing with categorical and integer-valued variables in Bayesian optimization with Gaussian processes. arXiv preprint arXiv:1805.03463 (2018)
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970). https://doi.org/10.1093/biomet/57.1.97
Article MathSciNet MATH Google Scholar
Hernández-Lobato, J.M., Hoffman, M.W., Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. In: Advances in Neural Information Processing Systems, pp. 918–926 (2014)
Google Scholar
Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)
MathSciNet MATH Google Scholar
Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Global Optim. 34(3), 441–466 (2006)
Article MathSciNet Google Scholar
Jiménez, J., Ginebra, J.: pyGPGO: Bayesian optimization for Python. J. Open Source Softw. 2, 431 (2017)
Article Google Scholar
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998)
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset, p. 4 (2014). http://www.cs.toronto.edu/kriz/cifar.html
Lévesque, J.C., Durand, A., Gagné, C., Sabourin, R.: Bayesian optimization for conditional hyperparameter spaces. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 286–293. IEEE (2017)
Google Scholar
Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2924–2932 (2014)
Google Scholar
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Salvatier, J., Fonnesbeck, C., et al.: PyMC3: Python probabilistic programming framework. Astrophysics Source Code Library (2016)
Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Google Scholar
Snoek, J.R.: Bayesian optimization and semiparametric models with applications to assistive technology. Ph.D. thesis, University of Toronto (2013)
Google Scholar
Torczon, V.: On the convergence of pattern search algorithms. SIAM J. Optim. 7(1), 1–25 (1997)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer-Chalmers Centre, 412 88, Gothenburg, Sweden
Anders Sjöberg, Magnus Önnheim, Emil Gustavsson & Mats Jirstrand
Fraunhofer Center for Machine Learning, Gothenburg, Sweden
Anders Sjöberg, Magnus Önnheim, Emil Gustavsson & Mats Jirstrand

Authors

Anders Sjöberg
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Önnheim
View author publications
You can also search for this author in PubMed Google Scholar
Emil Gustavsson
View author publications
You can also search for this author in PubMed Google Scholar
Mats Jirstrand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anders Sjöberg .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Prague 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sjöberg, A., Önnheim, M., Gustavsson, E., Jirstrand, M. (2019). Architecture-Aware Bayesian Optimization for Neural Network Tuning. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-30484-3_19
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics