Skip to main content

Advertisement

Log in

Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters

  • RESEARCH PAPER
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

This paper presents an automated surrogate model selection framework called the Concurrent Surrogate Model Selection or COSMOS. Unlike most existing techniques, COSMOS coherently operates at three levels, namely: 1) selecting the model type (e.g., RBF or Kriging), 2) selecting the kernel function type (e.g., cubic or multiquadric kernel in RBF), and 3) determining the optimal values of the typically user-prescribed hyper-parameters (e.g., shape parameter in RBF). The quality of the models is determined and compared using measures of median and maximum error, given by the Predictive Estimation of Model Fidelity (PEMF) method. PEMF is a robust implementation of sequential k-fold cross-validation. The selection process undertakes either a cascaded approach over the three levels or a more computationally-efficient one-step approach that solves a mixed-integer nonlinear programming problem. Genetic algorithms are used to perform the optimal selection. Application of COSMOS to benchmark test functions resulted in optimal model choices that agree well with those given by analyzing the model errors on a large set of additional test points. For the four analytical benchmark problems and three practical engineering applications – airfoil design, window heat transfer modeling, and building energy modeling – diverse forms of models/kernels are observed to be selected as optimal choices. These observations further establish the need for automated multi-level model selection that is also guided by dependable measures of model fidelity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. https://www.mathworks.com/matlabcentral/fileexchange/60106-pemf-cross-validation

References

  • Acar E (2010) Optimizing the shape parameters of radial basis functions: An application to automobile crashworthiness. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 224(12):1541–1553

    Google Scholar 

  • Acar E, Rais-Rohani M (2009) Ensemble of metamodels with optimized weight factors. Struct Multidiscip Optim 37(3):279–294

    Article  Google Scholar 

  • Ali MM, Khompatraporn C, Zabinsky ZB (2005) A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J Glob Optim 31(4):635–672

    Article  MathSciNet  MATH  Google Scholar 

  • Ascione F, Bianco N, Stasio CD, Mauro GM, Vanoli GP (2017) Artificial neural networks to predict energy performance and retrofit scenarios for any member of a building category: a novel approach. Energy 26(118):999–1017

    Article  Google Scholar 

  • Basak D, Srimanta P, Patranabis DC (2007) Support vector regression. Neural Information Processing-Letters and Review 11(10):203–224

    Google Scholar 

  • Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. Data mining techniques for the life sciences, pp 223–239

  • Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on least angle regression. J Comput Phys 230(6):2345–2367

    Article  MathSciNet  MATH  Google Scholar 

  • Bozdogan H (2000) Akaike’s information criterion and recent developments in information complexity. J Math Psychol 44:62–91

    Article  MathSciNet  MATH  Google Scholar 

  • Chang C-C, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27

    Article  Google Scholar 

  • Chen PW, Wang JY, Lee HM (2004) Model selection of svms using ga approach. In: IEEE international joint conference on neural networks, 2004. Proceedings. 2004, IEEE, vol 3, pp 2035–2040

  • Chen X, Yang H, Sun K (2017) Developing a meta-model for sensitivity analyses and prediction of building performance for passively designed high-rise residential buildings. Appl Energy 194:422–439

    Article  Google Scholar 

  • Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge Books

  • Coelho F, Breitkopf P, Knopf-Lenoir C (2008) Model reduction for multidisciplinary optimization: application to a 2d wing. Struct Multidiscip Optim 37(1):29–48

    Article  Google Scholar 

  • Couckuyt I, Dhaene T, Demeester P (2014) Oodace toolbox: a flexible object-oriented Kriging implementation. J Mach Learn Res 15(1):3183–3186

    MATH  Google Scholar 

  • Crawley DB, Pedersen CO, Lawrie LK, Winkelmann FC (2000) Energyplus: energy simulation program. ASHRAE J 49(4)

  • Cressie N (1993) Statistics for spatial data. Wiley, New York

    MATH  Google Scholar 

  • Deb K (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6 (2):182–197

    Article  Google Scholar 

  • Deru M, Field K, Studer D, Benne K, Griffith B, Torcellini P, Liu B (2011) US department of energy commercial reference building models of the national building stock. Tech. rep., Department of Energy

  • Deschrijver D, Dhaene T (2005) An alternative approach to avoid overfitting for surrogate models. In: Signal propagation on interconnects, 2005. Proceedings. 9th IEEE workshop, pp 111– 114

  • DOE (2017) (Accessed on Jan 15, 2017) commercial prototype building models. http://www.energycodes.gov/development/commercial

  • Fang A, Rais-Rohani M, Liu Z, Horstemeyer MF (2005) A comparative study of metamodeling methods for multiobjective crashworthiness optimization. Comput Struct 83(25):2121–2136

    Article  Google Scholar 

  • Forrester A, Keane A (2009) Recent advances in surrogate-based optimization. Prog Aerosp Sci 45(1-3):50–79

    Article  Google Scholar 

  • Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley

  • Giovanis DG, Papaioannou I, Straub D, Papadopoulos V (2017) Bayesian updating with subset simulation using artificial neural networks. Comput Methods Appl Mech Eng 319:124–145

    Article  MathSciNet  Google Scholar 

  • Giunta AA, Watson L (1998) A comparison of approximation modeling techniques: polynomial versus interpolating models. AIAA Journal (AIAA-98-4758)

  • Goel T, Stander N (2009) Comparing three error criteria for selecting radial basis function network topology. Comput Methods Appl Mech Eng 198:2137–2150

    Article  MathSciNet  MATH  Google Scholar 

  • Gorissen D, Dhaene T, Turck FD (2009) Evolutionary model type selection for global surrogate modeling. J Mach Learn Res 10:2039–2078

    MathSciNet  MATH  Google Scholar 

  • Gorissen D, Couckuyt I, Demeester P, Dhaene T, Crombecq K (2010) A surrogate modeling and adaptive sampling toolbox for computer based design. J Mach Learn Res 11:2051–2055

    Google Scholar 

  • Haftka RT, Villanueva D, Chaudhuri A (2016) Parallel surrogate-assisted global optimization with expensive functions – a survey. Struct Multidiscip Optim 54(1):3–13

    Article  MathSciNet  Google Scholar 

  • Hamza K, Saitou K (2012) A co-evolutionary approach for design optimization via ensembles of surrogates with application to vehicle crashworthiness. J Mech Des 134(1):011,001–10

    Article  Google Scholar 

  • Hardy RL (1971) Multiquadric equations of topography and other irregular surfaces. J Geophys Res 76:1905–1915

    Article  Google Scholar 

  • Holena M, Demut R (2011) Assessing the suitability of surrogate models in evolutionary optimization. In: Information technologies, pp 31–38

  • Jakeman JD, Narayan A, Zhou T (2017) A generalized sampling and preconditioning scheme for sparse approximation of polynomial chaos expansions. SIAM J Sci Comput 39(3):A1114–A1144

    Article  MathSciNet  MATH  Google Scholar 

  • Jia G, Taflanidis AA (2013) Kriging metamodeling for approximation of high-dimensional wave and surge responses in real-time storm/hurricane risk assessment. Comput Methods Appl Mech Eng 261:24–38

    Article  MathSciNet  MATH  Google Scholar 

  • Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modelling criteria. Struct Multidiscip Optim 23(1):1–13

    Article  Google Scholar 

  • Lee H, Jo Y, Lee D, Choi S (2016) Surrogate model based design optimization of multiple wing sails considering flow interaction effect. Ocean Eng 121:422–436

    Article  Google Scholar 

  • Li YF, Ng SH, Xie M, Goh TN (2010) A systematic comparison of metamodeling techniques for simulation optimization in decision support systems. Appl Soft Comput 10(2):255–268

    Article  Google Scholar 

  • Lin S (2011) A nsga-ii program in matlab, version 1.4 ed

  • Lophaven SN, Nielsen HB, Sondergaard J (2002) Dace - a matlab kriging toolbox, version 2.0. Tech. Rep IMM-REP-2002-12. Informatics and Mathematical Modelling Report, Technical University of Denmark

  • Martin JD, Simpson TW (2005) Use of kriging models to approximate deterministic computer models. AIAA J 43(4):853–863

    Article  Google Scholar 

  • Mehmani A, Chowdhury S, Messac A (2015a) Predictive quantification of surrogate model fidelity based on modal variations with sample density. Struct Multidiscip Optim 52(2):353–373

    Article  Google Scholar 

  • Mehmani A, Chowdhury S, Tong W, Messac A (2015b) Adaptive switching of variable-fidelity models in population-based optimization. In: Engineering and applied sciences optimization, computational methods in applied sciences, vol 38. Springer International Publishing, pp 175–205

  • Molinaro AM, Simon R, Pfeiffer RM (2005) Rprediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307

    Article  Google Scholar 

  • Mongillo M (2011) Choosing basis functions and shape parameters for rad-ial basis function methods. In: SIAM undergraduate research online

  • Qudeiri JEA, Khadra FYA, Umer U, Hussein HMA (2015) Response surface metamodel to predict springback in sheet metal air bending process. International Journal of Materials, Mechanics and Manufacturing 3(4):203–224

    Article  Google Scholar 

  • Queipo N, Haftka R, Shyy W, Goel T, Vaidyanathan R, Tucker P (2005) Surrogate-based analysis and optimization. Prog Aerosp Sci 41(1):1–28

    Article  Google Scholar 

  • Reute IM, Mailach VR, Becker KH, Fischersworring-Bunk A, Schlums H, Ivankovic M (2017) Moving least squares metamodels-hyperparameter, variable reduction and model selection. In: 14th international probabilistic workshop. Springer International Publishing, pp 63–80

  • Rippa S (1999) An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Adv Comput Math 11(2-3):193–210

    Article  MathSciNet  MATH  Google Scholar 

  • Roustant O, Ginsbourger D, Deville Y (2012) Dicekriging, diceoptim: two r packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J Stat Softw 51(1):518–523

    Article  Google Scholar 

  • Soares C, Brazdil PB, Kuba P (2004) A meta-learning method to select the kernel width in support vector regression. Mach Learn 54(3):195–209

    Article  MATH  Google Scholar 

  • Solomatine D, Ostfeld A (2008) Data-driven modelling: some past experiences and new approaches. J Hyd’roinf 10(1):3–22

    Article  Google Scholar 

  • Takahashi R, Prasai D, Adams BL, Mattson CA (2012) Hybrid bishop-hill model for elastic-yield limited design with non-orthorhombic polycrystalline metals. J Eng Mater Technol 134(1):0110,031–12

    Article  Google Scholar 

  • Tian W (2013) A review of sensitivity analysis methods in building energy analysis. Renew Sust Energ Rev 20:411–419

    Article  Google Scholar 

  • Viana FAC, Haftka RT, Steffen V (2009) Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Struct Multidiscip Optim 39:439–457

    Article  Google Scholar 

  • Viana FAC, Venter G, Balabanov V (2010) An algorithm for fast optimal latin hypercube design of experiments. Int J Numer Methods Eng 82(2):135–156

    MathSciNet  MATH  Google Scholar 

  • Zhang J, Messac A, Zhang J, Chowdhury S (2014) Adaptive optimal design of active thermoelectric windows using surrogate modeling. Optim Eng 15(2):469–483

    Article  Google Scholar 

  • Zhang M, Gou W, Li L, Yang F, Yue Z (2016) Multidisciplinary design and multi-objective optimization on guide fins of twin-web disk using Kriging surrogate model. Struct Multidiscip Optim 55(1):361–373

    Article  Google Scholar 

  • Zhang Y, Park C, Kim NH, Haftka RT (2017) Function prediction at one inaccessible point using converging lines. J Mech Des 139(5):051,402

    Article  Google Scholar 

Download references

Acknowledgements

Support from the National Science Foundation (NSF) Awards CMMI-1642340 and CNS-1524628 is gratefully acknowledged. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF.

Author information

Authors and Affiliations

Authors

Contributions

The different core concepts underlying PEMF and COSMOS were conceived, implemented and tested (through MATLAB) by Ali Mehmani and Souma Chowdhury, with important conceptual contributions from Achille Messac with regards to the surrogate modeling paradigm. The airfoil design and building peak cooling model in this paper were developed and implemented by Ali Mehmani, with support from Christoph Meinrenken on the latter.

Corresponding author

Correspondence to Souma Chowdhury.

Additional information

Parts of this manuscript have been presented at the ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, in 2014, at Buffalo, NY - Paper Number: DETC2014-35358

Appendices

Appendix A: Surrogate model candidates

1.1 Radial basis function (RBF)

The idea of using Radial Basis Functions (RBF) as approximation functions was conceived by (Hardy 1971). The RBF approximation is a linear combination of the basis functions (Ψ) computed with respect to each sample point, as given by

$$ \bar{F}(x)=W^{T}\Psi=\sum\limits_{i=1}^{n_{p}} w_{i} \psi\left( \|x-x^{i}\|\right) $$
(9)

In (9), n p denotes the number of selected sample points; w i ’s are the weights estimated using the pseudo inverse method, based on the training data; and ψ is the basis function that is expressed in terms of the Euclidean distance, r = ∥xx i∥, of a point x from a given sample point, x i. The most effective forms of the basis functions are listed in Table 5 where σ represents the shape parameter of the basis function. The shape parameter in a basis function has a strong impact on the accuracy of the trained RBF. A smaller shape parameter often corresponds to a wider basis function, and the shape parameter, σ = 0, corresponds to a constant basis function (Mongillo 2011). The different RBFs considered in this paper are listed in Table 5.

Table 5 Basis or Kernel functions and their hyper-parameters in the candidate surrogate models

1.2 Kriging

Kriging (Giunta and Watson 1998) is an approach to approximate irregular data. The kriging approximation function consists of two components: (i) a global trend function, and (i i) a deviation function representing the departure from the trend function. The trend function is generally a polynomial (e.g., constant, linear, or quadratic). The general form of the kriging surrogate model is given by Cressie (1993):

$$ \bar{F}(x)= \hat{f}(x,\varphi) + Z(x) $$
(10)

where \(\bar {F}(x)\) is the unknown function of interest, Z(x) is the realization of a stochastic process with the mean equal to zero, and a nonzero covariance, and \(\hat {f}\) is the known approximation function

$$ \hat{f}(x,\varphi)=f(x)^{T}\varphi $$
(11)

where φ is the regression parameter matrix. The i,jt h element of the covariance matrix, Z(x), is given by

$$ COV[Z(x^{i}),Z(x^{j})] = {\sigma_{z}^{2}} R_{ij} $$
(12)

where R i j is the correlation function between the i th and the j th data points; and \({\sigma _{z}^{2}}\) is the process variance, which scales the spatial correlation function. The popular types of correlation functions are listed in Table 5. The correlation function controls the smoothness of the Kriging model estimation, based on the influence of other nearby points on the point of intrest. In Kriging, the regression function coefficients, the process variance, and the correlation function parameters, \(\{\varphi ,{\sigma _{z}^{2}},\theta \}\), each can be predefined or estimated using parameter estimation methods such as Maximum Likelihood Estimation (MLE). In this paper, the regression function coefficients and the process variance are estimated using MLE, as given by

$$\begin{array}{@{}rcl@{}} \varphi&=&(F^{t} R^{-1}F)^{-1} F^{T}R^{-1}Y \\ {\sigma_{z}^{2}}&=&\frac{1}{n}(Y-F\widetilde{\varphi})^{T} R^{-1} (Y-F\widetilde{\varphi}) \end{array} $$
(13)

where Y = [y 1 y 2...] represents the vector of the actual output at the training points; R is a correlation matrix; and F is a matrix of f(x) evaluated by Kriging at each training point (Martin and Simpson 2005). The hyper-parameter, 𝜃, in the correlation function is determined by solving the nonlinear hyper-parameter optimization problem. In this paper, Kriging model with first-order regression polynomial function is used. To estimate the regression function coefficients and the process variance in Kriging, the DACE (design and analysis of computer experiments) package, developed by Lophaven et al. (2002), is used in this paper.

1.3 Support vector regression (SVR)

Support Vector Regression (SVR) is a relatively newer regression type surrogate model. For a given training set of instance-label pairs (x i , y i ), i = 1,...,n p , where x i R n and y ∈1,−1m, a linear SVR is defined by f(x) =< w,x > + b, where b is a bias and < .,. > denotes the dot product. To train the SVR, the error, |ξ| = |yf(x)|, is minimized by solving the following convex optimization problem:

$$\begin{array}{@{}rcl@{}} & & \text{Min} \left\{ \frac{1}{2} \| w \|^{2}+C {\Sigma}_{i=1}^{n_{p}} \xi_{i}+\tilde{\xi}_{i}\right\} \\ & & \text{subject to}\\ & & (w^{T}x_{i}+b)-y_{i}\leq \varepsilon+\xi_{i} \\ & & y_{i}-(w^{t}x_{i}+b)\leq \varepsilon+\tilde{\xi}_{i} \\ & & \xi_{i}, \tilde{\xi}_{i}\geq0, i=1,2,...,n_{p} \end{array} $$
(14)

In (14), ε ≥ 0 represents the difference between the actual and the predicted values; ξ i and \(\tilde {\xi }_{i}\) are the slack variables; C represents the flatness of the function; n p represents the number of training points. By applying kernel functions, K(α,β) =< ϕ(α),ϕ(β) > , under KKT conditions, the original problem is mapped into a higher dimensional space. The dual form of SVR for nonlinear regression could be represented as

$$\begin{array}{@{}rcl@{}} & & \text{Max}\ \left\{\sum\limits_{i=1}^{n_{p}}\alpha_{i} y_{i} \,-\, \varepsilon \sum\limits_{i=1}^{n_{p}} | \alpha | \,-\, \frac{1}{2} \sum\limits_{i,j=1}^{n_{p}} \alpha_{i}.\alpha_{j} \!<\!\phi(x_{i}),\phi(x_{j})>\right\} \\ & & \text{subject to}\\ & & \sum\limits_{i=1}^{n_{p}} \alpha_{i}=0, -C \leq \alpha_{i} \leq C for i=1,...,n_{p} \end{array} $$
(15)

The standard Kernel functions used in SVR are listed in Table 5. The performance of SVR depends on its penalty parameter C and kernel parameters γ, r, and d. Using hyper-parameter optimization, correlation parameters can be estimated such that it minimizes the model error. To implement SVR in this paper, the LIBSVM (A Library for Support Vector Machines) package (Chang and Lin 2011) is used.

Appendix B: Analytical test functions

Branin-Hoo function (2 variables):

$$\begin{array}{@{}rcl@{}} f(x) \,=\, \left( x_{2} - \frac{5.1{x_{1}^{2}}}{4\pi^{2}} + \frac{5x_{1}}{\pi} -6 \right)^{2} \,+\, 10 \left( 1 - \frac{1}{8\pi} \right) cos(x_{1}) + 10 \end{array} $$
(16)

where x 1 ∈ [−510],x 2 ∈ [015].

Hartmann function (3 variables):

$$\begin{array}{@{}rcl@{}} f(x) \,=\, - \sum\limits_{i=1}^{4} c_{i} exp \left\{ - \sum\limits_{j=1}^{n} A_{ij} \left( x_{j} - P_{ij} \right)^{2} \right\} \end{array} $$
(17)

where x = (x 1 x 2x n )x i ∈ [01].

In this function, the number of variables, n = 3; the constants c, A, and P, are respectively a 1 × 4 vector, a 4 × 3 matrix, and a 4 × 3 matrix:

$$\begin{array}{@{}rcl@{}} \mathrm{c} &\,=\,& [1.0, 1.2, 3.0, 3.2]^{T};\\ A&\,=\,& \left[ \begin{array}{ccc} 3.0 &10 & 30 \\ 0.1& 10& 35\\ 3.0 &10 & 30 \\ 0.1& 10& 35 \end{array} \right]\!, \text{ \!and\! } P\,=\, 10^{(-4)} \!\times\! \left[ \begin{array}{ccc} 3689 & 1170 & 2673 \\ 4699 & 4387 & 7470\\ 1091 & 8732 & 5547\\ 381 & 5743 & 8828 \end{array}\right] \end{array} $$

Perm Function (10 variables):

$$\begin{array}{@{}rcl@{}} f(x) = \sum\limits_{k=1}^{n}\left\{\sum\limits_{j=1}^{k}(j^{k}+0.5)\left[\left( \frac{x_{j}}{j}\right)^{k}-1\right]\right\}^{2} \\ where ~~~~x_{i} \in [-n n+1], i=1,...,n \\ n=10 \end{array} $$
(18)

Dixon & Price Function (50 variables):

$$\begin{array}{@{}rcl@{}} f(x) = \left( x_{1} - 1 \right)^{2} + \sum\limits_{i=2}^{n} i\left( 2{x_{i}^{2}} - x_{i-1} \right)^{2} \\ where ~~~~x_{i} \in [-10 10], i=1,...,n \\ n=50 \end{array} $$
(19)

Appendix C: Relationship between the error of the model selected by COSMOS and the size and distribution of the training data set

In this section, we explore the relationship between the error of the model selected by COSMOS and the size and distribution of the training data set for the Branin-Hoo benchmark function. A single model-kernel combination is considered here (RBF with Multiquadric function). A set of 200 paired training data set is randomly generated: X1, X2, X3,...,X200. The size of each training data set is defined to be: {X1t o40} = 30, {X41t o80} = 60, {X81t o120} = 90, {X121 to 160} = 120, {X161 to 200} = 150. The distribution of samples is different (random) for sets with the same size. For each data set, COSMOS is applied to find the hyper-parameter value that minimizes the median error metric.

The median error of the selected model for the different data sets is illustrated in Fig. 21 as a series of boxplots, with each boxplot corresponding to sample sets of one given size. It is readily evident that error of the selected model is highly sensitive to the size of the training data sets. Although the influence of the training data distribution is also evident from the significant observed variance in the resulting model error, no particular trend is observed.

Fig. 21
figure 21

Variation of the estimated error measures of the COSMOS-yielded models, with respect to training set size and variation in sample distribution (for Branin-Hoo Function)

Appendix D: Implementation of COSMOS on Hartmann and Perm functions

The final solutions, including the best trade-offs between the median and the maximum errors (in the Φ 0, Φ 1, and Φ 2 classes) in Hartmann Perm functions are illustrated in Figs. 22 and 23. For the Hartmann test function, RBF with the Gaussian and Multiquadratic basis functions and Kriging with the Gaussian correlation function under the Φ 1 class constitute the set of Pareto models in both Cascaded and One-Step techniques (Table 4). It is observed from Fig. 22 that in this problem, the Pareto solutions given by COSMOS for different Hyper-parameter classes have a larger spread than those given by the actual error. However, in terms of the best trade-off models, there is a fair agreement between the results of COSMOS and those determined from the actual errors (Table 4). Unlike the Branin-Hoo test function, in the Hartmann function, there is noticeable overlap between the final solutions from the different Hyper-parameter classes.

Fig. 22
figure 22

Trade-offs between modal values of median and maximum error - Hartmann test function (3-variable): Pareto optimal models and final population of models from all Φ classes

Figure 23 and Table 4 show that for the Perm test function, at least one model-kernel combination from each of the three classes (Φ 0, Φ 1, and Φ 2) contribute to the Pareto optimal set. In this test problem, Kriging with Linear correlation function and SVR with Sigmoid kernel function are selected as the best models with the lowest median error and the lowest maximum error, respectively. It can be seen from Table 4 that there is promising agreement between the model-kernel combinations chosen by COSMOS and those chosen based on the actual error. From the COSMOS and the actual error results, we also observe that a Pareto solution from the Φ 0 class (RBF-Linear) is located at the elbow of the Pareto Frontier, which could be considered to represents a practically attractive, best trade-off model choice.

Fig. 23
figure 23

Trade-offs between modal values of median and maximum error - Perm test function (10-variable): Pareto optimal models and final population of models from all Φ classes

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehmani, A., Chowdhury, S., Meinrenken, C. et al. Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters. Struct Multidisc Optim 57, 1093–1114 (2018). https://doi.org/10.1007/s00158-017-1797-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-017-1797-y

Keywords

Navigation