Abstract
Hyperparameter optimization of a neural network is a non-trivial task. It is time-consuming to evaluate a hyperparameter setting, no analytical expression of the impact of the hyperparameters are available, and the evaluations are noisy in the sense that the result is dependent on the training process and weight initialization. Bayesian optimization is a powerful tool to handle these problems. However, hyperparameter optimization of neural networks poses additional challenges, since the hyperparameters can be integer-valued, categorical, and/or conditional, whereas Bayesian optimization often assumes variables to be real-valued. In this paper we present an architecture-aware transformation of neural networks applied in the kernel of a Gaussian process to boost the performance of hyperparameter optimization.
The empirical experiment in this paper demonstrates that by introducing an architecture-aware transformation of the kernel, the performance of the Bayesian optimizer shows a clear improvement over a naïve implementation and that the results are comparable to other state-of-the-art methods.
Keywords
This research was supported by the project Root Cause Analysis of Quality Deviations in Manufacturing using Machine Learning (RCA-ML) in the funding program The smart digital factory (DNR 2016-04472), administered by VINNOVA, the Swedish Government Agency for Innovation Systems. It was also developed in the Fraunhofer Cluster of Excellence Cognitive Internet Technologies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bergstra, J., Yamins, D., Cox, D.D.: Hyperopt: a Python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in Science Conference, pp. 13–20. Citeseer (2013)
Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546–2554 (2011)
Brochu, E., Brochu, T., de Freitas, N.: A Bayesian interactive optimization approach to procedural animation design. In: Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 103–112. Eurographics Association (2010)
Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Garrido-Merchán, E.C., Hernández-Lobato, D.: Dealing with categorical and integer-valued variables in Bayesian optimization with Gaussian processes. arXiv preprint arXiv:1805.03463 (2018)
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970). https://doi.org/10.1093/biomet/57.1.97
Hernández-Lobato, J.M., Hoffman, M.W., Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. In: Advances in Neural Information Processing Systems, pp. 918–926 (2014)
Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(1), 1593–1623 (2014)
Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Global Optim. 34(3), 441–466 (2006)
Jiménez, J., Ginebra, J.: pyGPGO: Bayesian optimization for Python. J. Open Source Softw. 2, 431 (2017)
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset, p. 4 (2014). http://www.cs.toronto.edu/kriz/cifar.html
Lévesque, J.C., Durand, A., Gagné, C., Sabourin, R.: Bayesian optimization for conditional hyperparameter spaces. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 286–293. IEEE (2017)
Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2924–2932 (2014)
Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006)
Salvatier, J., Fonnesbeck, C., et al.: PyMC3: Python probabilistic programming framework. Astrophysics Source Code Library (2016)
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)
Snoek, J.R.: Bayesian optimization and semiparametric models with applications to assistive technology. Ph.D. thesis, University of Toronto (2013)
Torczon, V.: On the convergence of pattern search algorithms. SIAM J. Optim. 7(1), 1–25 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sjöberg, A., Önnheim, M., Gustavsson, E., Jirstrand, M. (2019). Architecture-Aware Bayesian Optimization for Neural Network Tuning. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)