Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters

Mehmani, Ali; Chowdhury, Souma; Meinrenken, Christoph; Messac, Achille

doi:10.1007/s00158-017-1797-y

Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters

RESEARCH PAPER
Published: 12 September 2017

Volume 57, pages 1093–1114, (2018)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Ali Mehmani¹,
Souma Chowdhury ORCID: orcid.org/0000-0002-1500-2696²,
Christoph Meinrenken³ &
…
Achille Messac⁴

1254 Accesses
36 Citations
Explore all metrics

Abstract

This paper presents an automated surrogate model selection framework called the Concurrent Surrogate Model Selection or COSMOS. Unlike most existing techniques, COSMOS coherently operates at three levels, namely: 1) selecting the model type (e.g., RBF or Kriging), 2) selecting the kernel function type (e.g., cubic or multiquadric kernel in RBF), and 3) determining the optimal values of the typically user-prescribed hyper-parameters (e.g., shape parameter in RBF). The quality of the models is determined and compared using measures of median and maximum error, given by the Predictive Estimation of Model Fidelity (PEMF) method. PEMF is a robust implementation of sequential k-fold cross-validation. The selection process undertakes either a cascaded approach over the three levels or a more computationally-efficient one-step approach that solves a mixed-integer nonlinear programming problem. Genetic algorithms are used to perform the optimal selection. Application of COSMOS to benchmark test functions resulted in optimal model choices that agree well with those given by analyzing the model errors on a large set of additional test points. For the four analytical benchmark problems and three practical engineering applications – airfoil design, window heat transfer modeling, and building energy modeling – diverse forms of models/kernels are observed to be selected as optimal choices. These observations further establish the need for automated multi-level model selection that is also guided by dependable measures of model fidelity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Article 13 March 2023

Mohamed Abdel-Basset, Reda Mohamed, … Mohamed Abouhawwash

Topology optimization of multi-scale structures: a review

Article Open access 08 March 2021

Jun Wu, Ole Sigmund & Jeroen P. Groen

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

Notes

https://www.mathworks.com/matlabcentral/fileexchange/60106-pemf-cross-validation

References

Acar E (2010) Optimizing the shape parameters of radial basis functions: An application to automobile crashworthiness. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 224(12):1541–1553
Google Scholar
Acar E, Rais-Rohani M (2009) Ensemble of metamodels with optimized weight factors. Struct Multidiscip Optim 37(3):279–294
Article Google Scholar
Ali MM, Khompatraporn C, Zabinsky ZB (2005) A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J Glob Optim 31(4):635–672
Article MathSciNet MATH Google Scholar
Ascione F, Bianco N, Stasio CD, Mauro GM, Vanoli GP (2017) Artificial neural networks to predict energy performance and retrofit scenarios for any member of a building category: a novel approach. Energy 26(118):999–1017
Article Google Scholar
Basak D, Srimanta P, Patranabis DC (2007) Support vector regression. Neural Information Processing-Letters and Review 11(10):203–224
Google Scholar
Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. Data mining techniques for the life sciences, pp 223–239
Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on least angle regression. J Comput Phys 230(6):2345–2367
Article MathSciNet MATH Google Scholar
Bozdogan H (2000) Akaike’s information criterion and recent developments in information complexity. J Math Psychol 44:62–91
Article MathSciNet MATH Google Scholar
Chang C-C, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Article Google Scholar
Chen PW, Wang JY, Lee HM (2004) Model selection of svms using ga approach. In: IEEE international joint conference on neural networks, 2004. Proceedings. 2004, IEEE, vol 3, pp 2035–2040
Chen X, Yang H, Sun K (2017) Developing a meta-model for sensitivity analyses and prediction of building performance for passively designed high-rise residential buildings. Appl Energy 194:422–439
Article Google Scholar
Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge Books
Coelho F, Breitkopf P, Knopf-Lenoir C (2008) Model reduction for multidisciplinary optimization: application to a 2d wing. Struct Multidiscip Optim 37(1):29–48
Article Google Scholar
Couckuyt I, Dhaene T, Demeester P (2014) Oodace toolbox: a flexible object-oriented Kriging implementation. J Mach Learn Res 15(1):3183–3186
MATH Google Scholar
Crawley DB, Pedersen CO, Lawrie LK, Winkelmann FC (2000) Energyplus: energy simulation program. ASHRAE J 49(4)
Cressie N (1993) Statistics for spatial data. Wiley, New York
MATH Google Scholar
Deb K (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6 (2):182–197
Article Google Scholar
Deru M, Field K, Studer D, Benne K, Griffith B, Torcellini P, Liu B (2011) US department of energy commercial reference building models of the national building stock. Tech. rep., Department of Energy
Deschrijver D, Dhaene T (2005) An alternative approach to avoid overfitting for surrogate models. In: Signal propagation on interconnects, 2005. Proceedings. 9th IEEE workshop, pp 111– 114
DOE (2017) (Accessed on Jan 15, 2017) commercial prototype building models. http://www.energycodes.gov/development/commercial
Fang A, Rais-Rohani M, Liu Z, Horstemeyer MF (2005) A comparative study of metamodeling methods for multiobjective crashworthiness optimization. Comput Struct 83(25):2121–2136
Article Google Scholar
Forrester A, Keane A (2009) Recent advances in surrogate-based optimization. Prog Aerosp Sci 45(1-3):50–79
Article Google Scholar
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley
Giovanis DG, Papaioannou I, Straub D, Papadopoulos V (2017) Bayesian updating with subset simulation using artificial neural networks. Comput Methods Appl Mech Eng 319:124–145
Article MathSciNet Google Scholar
Giunta AA, Watson L (1998) A comparison of approximation modeling techniques: polynomial versus interpolating models. AIAA Journal (AIAA-98-4758)
Goel T, Stander N (2009) Comparing three error criteria for selecting radial basis function network topology. Comput Methods Appl Mech Eng 198:2137–2150
Article MathSciNet MATH Google Scholar
Gorissen D, Dhaene T, Turck FD (2009) Evolutionary model type selection for global surrogate modeling. J Mach Learn Res 10:2039–2078
MathSciNet MATH Google Scholar
Gorissen D, Couckuyt I, Demeester P, Dhaene T, Crombecq K (2010) A surrogate modeling and adaptive sampling toolbox for computer based design. J Mach Learn Res 11:2051–2055
Google Scholar
Haftka RT, Villanueva D, Chaudhuri A (2016) Parallel surrogate-assisted global optimization with expensive functions – a survey. Struct Multidiscip Optim 54(1):3–13
Article MathSciNet Google Scholar
Hamza K, Saitou K (2012) A co-evolutionary approach for design optimization via ensembles of surrogates with application to vehicle crashworthiness. J Mech Des 134(1):011,001–10
Article Google Scholar
Hardy RL (1971) Multiquadric equations of topography and other irregular surfaces. J Geophys Res 76:1905–1915
Article Google Scholar
Holena M, Demut R (2011) Assessing the suitability of surrogate models in evolutionary optimization. In: Information technologies, pp 31–38
Jakeman JD, Narayan A, Zhou T (2017) A generalized sampling and preconditioning scheme for sparse approximation of polynomial chaos expansions. SIAM J Sci Comput 39(3):A1114–A1144
Article MathSciNet MATH Google Scholar
Jia G, Taflanidis AA (2013) Kriging metamodeling for approximation of high-dimensional wave and surge responses in real-time storm/hurricane risk assessment. Comput Methods Appl Mech Eng 261:24–38
Article MathSciNet MATH Google Scholar
Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modelling criteria. Struct Multidiscip Optim 23(1):1–13
Article Google Scholar
Lee H, Jo Y, Lee D, Choi S (2016) Surrogate model based design optimization of multiple wing sails considering flow interaction effect. Ocean Eng 121:422–436
Article Google Scholar
Li YF, Ng SH, Xie M, Goh TN (2010) A systematic comparison of metamodeling techniques for simulation optimization in decision support systems. Appl Soft Comput 10(2):255–268
Article Google Scholar
Lin S (2011) A nsga-ii program in matlab, version 1.4 ed
Lophaven SN, Nielsen HB, Sondergaard J (2002) Dace - a matlab kriging toolbox, version 2.0. Tech. Rep IMM-REP-2002-12. Informatics and Mathematical Modelling Report, Technical University of Denmark
Martin JD, Simpson TW (2005) Use of kriging models to approximate deterministic computer models. AIAA J 43(4):853–863
Article Google Scholar
Mehmani A, Chowdhury S, Messac A (2015a) Predictive quantification of surrogate model fidelity based on modal variations with sample density. Struct Multidiscip Optim 52(2):353–373
Article Google Scholar
Mehmani A, Chowdhury S, Tong W, Messac A (2015b) Adaptive switching of variable-fidelity models in population-based optimization. In: Engineering and applied sciences optimization, computational methods in applied sciences, vol 38. Springer International Publishing, pp 175–205
Molinaro AM, Simon R, Pfeiffer RM (2005) Rprediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307
Article Google Scholar
Mongillo M (2011) Choosing basis functions and shape parameters for rad-ial basis function methods. In: SIAM undergraduate research online
Qudeiri JEA, Khadra FYA, Umer U, Hussein HMA (2015) Response surface metamodel to predict springback in sheet metal air bending process. International Journal of Materials, Mechanics and Manufacturing 3(4):203–224
Article Google Scholar
Queipo N, Haftka R, Shyy W, Goel T, Vaidyanathan R, Tucker P (2005) Surrogate-based analysis and optimization. Prog Aerosp Sci 41(1):1–28
Article Google Scholar
Reute IM, Mailach VR, Becker KH, Fischersworring-Bunk A, Schlums H, Ivankovic M (2017) Moving least squares metamodels-hyperparameter, variable reduction and model selection. In: 14th international probabilistic workshop. Springer International Publishing, pp 63–80
Rippa S (1999) An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Adv Comput Math 11(2-3):193–210
Article MathSciNet MATH Google Scholar
Roustant O, Ginsbourger D, Deville Y (2012) Dicekriging, diceoptim: two r packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J Stat Softw 51(1):518–523
Article Google Scholar
Soares C, Brazdil PB, Kuba P (2004) A meta-learning method to select the kernel width in support vector regression. Mach Learn 54(3):195–209
Article MATH Google Scholar
Solomatine D, Ostfeld A (2008) Data-driven modelling: some past experiences and new approaches. J Hyd’roinf 10(1):3–22
Article Google Scholar
Takahashi R, Prasai D, Adams BL, Mattson CA (2012) Hybrid bishop-hill model for elastic-yield limited design with non-orthorhombic polycrystalline metals. J Eng Mater Technol 134(1):0110,031–12
Article Google Scholar
Tian W (2013) A review of sensitivity analysis methods in building energy analysis. Renew Sust Energ Rev 20:411–419
Article Google Scholar
Viana FAC, Haftka RT, Steffen V (2009) Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Struct Multidiscip Optim 39:439–457
Article Google Scholar
Viana FAC, Venter G, Balabanov V (2010) An algorithm for fast optimal latin hypercube design of experiments. Int J Numer Methods Eng 82(2):135–156
MathSciNet MATH Google Scholar
Zhang J, Messac A, Zhang J, Chowdhury S (2014) Adaptive optimal design of active thermoelectric windows using surrogate modeling. Optim Eng 15(2):469–483
Article Google Scholar
Zhang M, Gou W, Li L, Yang F, Yue Z (2016) Multidisciplinary design and multi-objective optimization on guide fins of twin-web disk using Kriging surrogate model. Struct Multidiscip Optim 55(1):361–373
Article Google Scholar
Zhang Y, Park C, Kim NH, Haftka RT (2017) Function prediction at one inaccessible point using converging lines. J Mech Des 139(5):051,402
Article Google Scholar

Download references

Acknowledgements

Support from the National Science Foundation (NSF) Awards CMMI-1642340 and CNS-1524628 is gratefully acknowledged. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations

Postdoctoral Research Associate, Data Science Institute and Earth Institute, Columbia University, New York, NY, 10027, USA
Ali Mehmani
Assistant Professor, Department of Mechanical and Aerospace Engineering, University at Buffalo, Buffalo, NY, 14228, USA
Souma Chowdhury
Associate Research Scientist, Data Science Institute and Earth Institute, Columbia University, New York, NY, 10027, USA
Christoph Meinrenken
Dean, College of Engineering, Architecture and Computer Sciences, Howard University, Washington, DC, 20059, USA
Achille Messac

Authors

Ali Mehmani
View author publications
You can also search for this author in PubMed Google Scholar
Souma Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Meinrenken
View author publications
You can also search for this author in PubMed Google Scholar
Achille Messac
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The different core concepts underlying PEMF and COSMOS were conceived, implemented and tested (through MATLAB) by Ali Mehmani and Souma Chowdhury, with important conceptual contributions from Achille Messac with regards to the surrogate modeling paradigm. The airfoil design and building peak cooling model in this paper were developed and implemented by Ali Mehmani, with support from Christoph Meinrenken on the latter.

Corresponding author

Correspondence to Souma Chowdhury.

Additional information

Parts of this manuscript have been presented at the ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, in 2014, at Buffalo, NY - Paper Number: DETC2014-35358

Appendices

Appendix A: Surrogate model candidates

1.1 Radial basis function (RBF)

The idea of using Radial Basis Functions (RBF) as approximation functions was conceived by (Hardy 1971). The RBF approximation is a linear combination of the basis functions (Ψ) computed with respect to each sample point, as given by

$$ \bar{F}(x)=W^{T}\Psi=\sum\limits_{i=1}^{n_{p}} w_{i} \psi\left( \|x-x^{i}\|\right) $$

(9)

In (9), n _p denotes the number of selected sample points; w _i’s are the weights estimated using the pseudo inverse method, based on the training data; and ψ is the basis function that is expressed in terms of the Euclidean distance, r = ∥x − x ⁱ∥, of a point x from a given sample point, x ⁱ. The most effective forms of the basis functions are listed in Table 5 where σ represents the shape parameter of the basis function. The shape parameter in a basis function has a strong impact on the accuracy of the trained RBF. A smaller shape parameter often corresponds to a wider basis function, and the shape parameter, σ = 0, corresponds to a constant basis function (Mongillo 2011). The different RBFs considered in this paper are listed in Table 5.

Table 5 Basis or Kernel functions and their hyper-parameters in the candidate surrogate models

Full size table

1.2 Kriging

Kriging (Giunta and Watson 1998) is an approach to approximate irregular data. The kriging approximation function consists of two components: (i) a global trend function, and (i i) a deviation function representing the departure from the trend function. The trend function is generally a polynomial (e.g., constant, linear, or quadratic). The general form of the kriging surrogate model is given by Cressie (1993):

$$ \bar{F}(x)= \hat{f}(x,\varphi) + Z(x) $$

(10)

where $\bar {F}(x)$ is the unknown function of interest, Z(x) is the realization of a stochastic process with the mean equal to zero, and a nonzero covariance, and $\hat {f}$ is the known approximation function

$$ \hat{f}(x,\varphi)=f(x)^{T}\varphi $$

(11)

where φ is the regression parameter matrix. The i,j − t h element of the covariance matrix, Z(x), is given by

$$ COV[Z(x^{i}),Z(x^{j})] = {\sigma_{z}^{2}} R_{ij} $$

(12)

where R _{i
j} is the correlation function between the i ^th and the j ^th data points; and ${\sigma _{z}^{2}}$ is the process variance, which scales the spatial correlation function. The popular types of correlation functions are listed in Table 5. The correlation function controls the smoothness of the Kriging model estimation, based on the influence of other nearby points on the point of intrest. In Kriging, the regression function coefficients, the process variance, and the correlation function parameters, $\{\varphi ,{\sigma _{z}^{2}},\theta \}$, each can be predefined or estimated using parameter estimation methods such as Maximum Likelihood Estimation (MLE). In this paper, the regression function coefficients and the process variance are estimated using MLE, as given by

$$\begin{array}{@{}rcl@{}} \varphi&=&(F^{t} R^{-1}F)^{-1} F^{T}R^{-1}Y \\ {\sigma_{z}^{2}}&=&\frac{1}{n}(Y-F\widetilde{\varphi})^{T} R^{-1} (Y-F\widetilde{\varphi}) \end{array} $$

(13)

where Y = [y ₁ y ₂...] represents the vector of the actual output at the training points; R is a correlation matrix; and F is a matrix of f(x) evaluated by Kriging at each training point (Martin and Simpson 2005). The hyper-parameter, 𝜃, in the correlation function is determined by solving the nonlinear hyper-parameter optimization problem. In this paper, Kriging model with first-order regression polynomial function is used. To estimate the regression function coefficients and the process variance in Kriging, the DACE (design and analysis of computer experiments) package, developed by Lophaven et al. (2002), is used in this paper.

1.3 Support vector regression (SVR)

Support Vector Regression (SVR) is a relatively newer regression type surrogate model. For a given training set of instance-label pairs (x _i, y _i), i = 1,...,n _p, where x _i ∈ R ⁿ and y ∈1,−1^m, a linear SVR is defined by f(x) =< w,x > + b, where b is a bias and < .,. > denotes the dot product. To train the SVR, the error, |ξ| = |y − f(x)|, is minimized by solving the following convex optimization problem:

$$\begin{array}{@{}rcl@{}} & & \text{Min} \left\{ \frac{1}{2} \| w \|^{2}+C {\Sigma}_{i=1}^{n_{p}} \xi_{i}+\tilde{\xi}_{i}\right\} \\ & & \text{subject to}\\ & & (w^{T}x_{i}+b)-y_{i}\leq \varepsilon+\xi_{i} \\ & & y_{i}-(w^{t}x_{i}+b)\leq \varepsilon+\tilde{\xi}_{i} \\ & & \xi_{i}, \tilde{\xi}_{i}\geq0, i=1,2,...,n_{p} \end{array} $$

(14)

In (14), ε ≥ 0 represents the difference between the actual and the predicted values; ξ _i and $\tilde {\xi }_{i}$ are the slack variables; C represents the flatness of the function; n _p represents the number of training points. By applying kernel functions, K(α,β) =< ϕ(α),ϕ(β) > , under KKT conditions, the original problem is mapped into a higher dimensional space. The dual form of SVR for nonlinear regression could be represented as

$$\begin{array}{@{}rcl@{}} & & \text{Max}\ \left\{\sum\limits_{i=1}^{n_{p}}\alpha_{i} y_{i} \,-\, \varepsilon \sum\limits_{i=1}^{n_{p}} | \alpha | \,-\, \frac{1}{2} \sum\limits_{i,j=1}^{n_{p}} \alpha_{i}.\alpha_{j} \!<\!\phi(x_{i}),\phi(x_{j})>\right\} \\ & & \text{subject to}\\ & & \sum\limits_{i=1}^{n_{p}} \alpha_{i}=0, -C \leq \alpha_{i} \leq C for i=1,...,n_{p} \end{array} $$

(15)

The standard Kernel functions used in SVR are listed in Table 5. The performance of SVR depends on its penalty parameter C and kernel parameters γ, r, and d. Using hyper-parameter optimization, correlation parameters can be estimated such that it minimizes the model error. To implement SVR in this paper, the LIBSVM (A Library for Support Vector Machines) package (Chang and Lin 2011) is used.

Appendix B: Analytical test functions

Branin-Hoo function (2 variables):

$$\begin{array}{@{}rcl@{}} f(x) \,=\, \left( x_{2} - \frac{5.1{x_{1}^{2}}}{4\pi^{2}} + \frac{5x_{1}}{\pi} -6 \right)^{2} \,+\, 10 \left( 1 - \frac{1}{8\pi} \right) cos(x_{1}) + 10 \end{array} $$

(16)

where x ₁ ∈ [−510],x ₂ ∈ [015].

Hartmann function (3 variables):

$$\begin{array}{@{}rcl@{}} f(x) \,=\, - \sum\limits_{i=1}^{4} c_{i} exp \left\{ - \sum\limits_{j=1}^{n} A_{ij} \left( x_{j} - P_{ij} \right)^{2} \right\} \end{array} $$

(17)

where x = (x ₁ x ₂…x _n)x _i ∈ [01].

In this function, the number of variables, n = 3; the constants c, A, and P, are respectively a 1 × 4 vector, a 4 × 3 matrix, and a 4 × 3 matrix:

$$\begin{array}{@{}rcl@{}} \mathrm{c} &\,=\,& [1.0, 1.2, 3.0, 3.2]^{T};\\ A&\,=\,& \left[ \begin{array}{ccc} 3.0 &10 & 30 \\ 0.1& 10& 35\\ 3.0 &10 & 30 \\ 0.1& 10& 35 \end{array} \right]\!, \text{ \!and\! } P\,=\, 10^{(-4)} \!\times\! \left[ \begin{array}{ccc} 3689 & 1170 & 2673 \\ 4699 & 4387 & 7470\\ 1091 & 8732 & 5547\\ 381 & 5743 & 8828 \end{array}\right] \end{array} $$

Perm Function (10 variables):

$$\begin{array}{@{}rcl@{}} f(x) = \sum\limits_{k=1}^{n}\left\{\sum\limits_{j=1}^{k}(j^{k}+0.5)\left[\left( \frac{x_{j}}{j}\right)^{k}-1\right]\right\}^{2} \\ where ~~~~x_{i} \in [-n n+1], i=1,...,n \\ n=10 \end{array} $$

(18)

Dixon & Price Function (50 variables):

$$\begin{array}{@{}rcl@{}} f(x) = \left( x_{1} - 1 \right)^{2} + \sum\limits_{i=2}^{n} i\left( 2{x_{i}^{2}} - x_{i-1} \right)^{2} \\ where ~~~~x_{i} \in [-10 10], i=1,...,n \\ n=50 \end{array} $$

(19)

Appendix C: Relationship between the error of the model selected by COSMOS and the size and distribution of the training data set

In this section, we explore the relationship between the error of the model selected by COSMOS and the size and distribution of the training data set for the Branin-Hoo benchmark function. A single model-kernel combination is considered here (RBF with Multiquadric function). A set of 200 paired training data set is randomly generated: X₁, X₂, X₃,...,X₂₀₀. The size of each training data set is defined to be: {X_{1t
o40}} = 30, {X_{41t
o80}} = 60, {X_{81t
o120}} = 90, {X_{121 to 160}} = 120, {X_{161 to 200}} = 150. The distribution of samples is different (random) for sets with the same size. For each data set, COSMOS is applied to find the hyper-parameter value that minimizes the median error metric.

The median error of the selected model for the different data sets is illustrated in Fig. 21 as a series of boxplots, with each boxplot corresponding to sample sets of one given size. It is readily evident that error of the selected model is highly sensitive to the size of the training data sets. Although the influence of the training data distribution is also evident from the significant observed variance in the resulting model error, no particular trend is observed.

Appendix D: Implementation of COSMOS on Hartmann and Perm functions

The final solutions, including the best trade-offs between the median and the maximum errors (in the Φ ₀, Φ ₁, and Φ ₂ classes) in Hartmann Perm functions are illustrated in Figs. 22 and 23. For the Hartmann test function, RBF with the Gaussian and Multiquadratic basis functions and Kriging with the Gaussian correlation function under the Φ ₁ class constitute the set of Pareto models in both Cascaded and One-Step techniques (Table 4). It is observed from Fig. 22 that in this problem, the Pareto solutions given by COSMOS for different Hyper-parameter classes have a larger spread than those given by the actual error. However, in terms of the best trade-off models, there is a fair agreement between the results of COSMOS and those determined from the actual errors (Table 4). Unlike the Branin-Hoo test function, in the Hartmann function, there is noticeable overlap between the final solutions from the different Hyper-parameter classes.

Figure 23 and Table 4 show that for the Perm test function, at least one model-kernel combination from each of the three classes (Φ ₀, Φ ₁, and Φ ₂) contribute to the Pareto optimal set. In this test problem, Kriging with Linear correlation function and SVR with Sigmoid kernel function are selected as the best models with the lowest median error and the lowest maximum error, respectively. It can be seen from Table 4 that there is promising agreement between the model-kernel combinations chosen by COSMOS and those chosen based on the actual error. From the COSMOS and the actual error results, we also observe that a Pareto solution from the Φ ₀ class (RBF-Linear) is located at the elbow of the Pareto Frontier, which could be considered to represents a practically attractive, best trade-off model choice.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mehmani, A., Chowdhury, S., Meinrenken, C. et al. Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters. Struct Multidisc Optim 57, 1093–1114 (2018). https://doi.org/10.1007/s00158-017-1797-y

Download citation

Received: 30 March 2017
Revised: 21 August 2017
Accepted: 23 August 2017
Published: 12 September 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s00158-017-1797-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters

Abstract

Access this article

Similar content being viewed by others

Spider wasp optimizer: a novel meta-heuristic optimization algorithm

Topology optimization of multi-scale structures: a review

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Notes

References

Acknowledgements