Soft Computing

, Volume 23, Issue 9, pp 2969–2977 | Cite as

Global optimization in machine learning: the design of a predictive analytics application

  • Antonio CandelieriEmail author
  • Francesco Archetti


Global optimization, especially Bayesian optimization, has become the tool of choice in hyperparameter tuning and algorithmic configuration to optimize the generalization capability of machine learning algorithms. The contribution of this paper was to extend this approach to a complex algorithmic pipeline for predictive analytics, based on time-series clustering and artificial neural networks. The software environment R has been used with mlrMBO, a comprehensive and flexible toolbox for sequential model-based optimization. Random forest has been adopted as surrogate model, due to the nature of decision variables (i.e., conditional and discrete hyperparameters) of the case studies considered. Two acquisition functions have been considered: Expected improvement and lower confidence bound, and results are compared. The computational results, on a benchmark and a real-world dataset, show that even in a complex search space, up to 80 dimensions related to integer, categorical, and conditional variables (i.e., hyperparameters), sequential model-based optimization is an effective solution, with lower confidence bound requiring a lower number of function evaluations than expected improvement to find the same optimal solution.


Hyperparameters optimization Global optimization Machine learning 


Compliance with ethical standards

Conflict of interest

Antonio Candelieri and Francesco Archetti declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2017) mlrMBO: a modular framework for model-based optimization of expensive black-box functions. arXiv:1703.03373
  2. Candelieri A (2017) Clustering and support vector regression for water demand forecasting and anomaly detection. Water 9(3):224CrossRefGoogle Scholar
  3. Candelieri A, Archetti F (2014) Identifying typical urban water demand patterns for a reliable short-term forecasting—the icewater project approach. Procedia Eng 89:1004–1012CrossRefGoogle Scholar
  4. Candelieri A, Soldi D, Archetti F (2015) Short-term forecasting of hourly water consumption by using automatic metering readers data. Procedia Eng 119(1):844–853CrossRefGoogle Scholar
  5. Candelieri A, Giordani I, Archetti F (2017) Automatic configuration of kernel-based clustering: an optimization approach. In: International conference on learning and intelligence optimization. Springer, Cham, pp 34–49Google Scholar
  6. Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 551–556Google Scholar
  7. Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015) Efficient and robust automated machine learning. In: Advances in neural information processing systems, pp 2962–2970Google Scholar
  8. Huang D, Allen TT, Notz WI, Zeng N (2006) Global optimization of stochastic black-box systems via sequential kriging meta-models. J Glob Optim 34(3):441–466MathSciNetCrossRefzbMATHGoogle Scholar
  9. Kandasamy K, Schneider J, Pòczos B (2015) High dimensional Bayesian optimisation and bandits via additive models. In: International conference on machine learning, vol 37, pp 295–304Google Scholar
  10. Mockus J, Tiesis V, Zilinskas A (1978) The application of Bayesian methods for seeking the extremum. In: Dixon L, Szego G (eds) Towards global optimisation 2. Elsevier, New York, pp 117–130Google Scholar
  11. Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N (2016) Taking the human out of the loop: a review of bayesian optimization. Proc IEEE 104(1):148–175CrossRefGoogle Scholar
  12. Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. arXiv:1206.2944[stat.ML]
  13. Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of ACM SIGKDD, pp 847–855Google Scholar
  14. Wang Z, Zoghi M, Hutter F, Matheson D, De Freitas N (2013) Bayesian optimization in high dimensions via random embeddings. In: Proceedings of the international joint conference on artificial intelligence, pp 1778–1784Google Scholar
  15. Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science, Systems and CommunicationUniversity of Milano-BicoccaMilanItaly

Personalised recommendations