Abstract
This paper presents an automated surrogate model selection framework called the Concurrent Surrogate Model Selection or COSMOS. Unlike most existing techniques, COSMOS coherently operates at three levels, namely: 1) selecting the model type (e.g., RBF or Kriging), 2) selecting the kernel function type (e.g., cubic or multiquadric kernel in RBF), and 3) determining the optimal values of the typically user-prescribed hyper-parameters (e.g., shape parameter in RBF). The quality of the models is determined and compared using measures of median and maximum error, given by the Predictive Estimation of Model Fidelity (PEMF) method. PEMF is a robust implementation of sequential k-fold cross-validation. The selection process undertakes either a cascaded approach over the three levels or a more computationally-efficient one-step approach that solves a mixed-integer nonlinear programming problem. Genetic algorithms are used to perform the optimal selection. Application of COSMOS to benchmark test functions resulted in optimal model choices that agree well with those given by analyzing the model errors on a large set of additional test points. For the four analytical benchmark problems and three practical engineering applications – airfoil design, window heat transfer modeling, and building energy modeling – diverse forms of models/kernels are observed to be selected as optimal choices. These observations further establish the need for automated multi-level model selection that is also guided by dependable measures of model fidelity.
Similar content being viewed by others
References
Acar E (2010) Optimizing the shape parameters of radial basis functions: An application to automobile crashworthiness. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 224(12):1541–1553
Acar E, Rais-Rohani M (2009) Ensemble of metamodels with optimized weight factors. Struct Multidiscip Optim 37(3):279–294
Ali MM, Khompatraporn C, Zabinsky ZB (2005) A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J Glob Optim 31(4):635–672
Ascione F, Bianco N, Stasio CD, Mauro GM, Vanoli GP (2017) Artificial neural networks to predict energy performance and retrofit scenarios for any member of a building category: a novel approach. Energy 26(118):999–1017
Basak D, Srimanta P, Patranabis DC (2007) Support vector regression. Neural Information Processing-Letters and Review 11(10):203–224
Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. Data mining techniques for the life sciences, pp 223–239
Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on least angle regression. J Comput Phys 230(6):2345–2367
Bozdogan H (2000) Akaike’s information criterion and recent developments in information complexity. J Math Psychol 44:62–91
Chang C-C, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Chen PW, Wang JY, Lee HM (2004) Model selection of svms using ga approach. In: IEEE international joint conference on neural networks, 2004. Proceedings. 2004, IEEE, vol 3, pp 2035–2040
Chen X, Yang H, Sun K (2017) Developing a meta-model for sensitivity analyses and prediction of building performance for passively designed high-rise residential buildings. Appl Energy 194:422–439
Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge Books
Coelho F, Breitkopf P, Knopf-Lenoir C (2008) Model reduction for multidisciplinary optimization: application to a 2d wing. Struct Multidiscip Optim 37(1):29–48
Couckuyt I, Dhaene T, Demeester P (2014) Oodace toolbox: a flexible object-oriented Kriging implementation. J Mach Learn Res 15(1):3183–3186
Crawley DB, Pedersen CO, Lawrie LK, Winkelmann FC (2000) Energyplus: energy simulation program. ASHRAE J 49(4)
Cressie N (1993) Statistics for spatial data. Wiley, New York
Deb K (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6 (2):182–197
Deru M, Field K, Studer D, Benne K, Griffith B, Torcellini P, Liu B (2011) US department of energy commercial reference building models of the national building stock. Tech. rep., Department of Energy
Deschrijver D, Dhaene T (2005) An alternative approach to avoid overfitting for surrogate models. In: Signal propagation on interconnects, 2005. Proceedings. 9th IEEE workshop, pp 111– 114
DOE (2017) (Accessed on Jan 15, 2017) commercial prototype building models. http://www.energycodes.gov/development/commercial
Fang A, Rais-Rohani M, Liu Z, Horstemeyer MF (2005) A comparative study of metamodeling methods for multiobjective crashworthiness optimization. Comput Struct 83(25):2121–2136
Forrester A, Keane A (2009) Recent advances in surrogate-based optimization. Prog Aerosp Sci 45(1-3):50–79
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley
Giovanis DG, Papaioannou I, Straub D, Papadopoulos V (2017) Bayesian updating with subset simulation using artificial neural networks. Comput Methods Appl Mech Eng 319:124–145
Giunta AA, Watson L (1998) A comparison of approximation modeling techniques: polynomial versus interpolating models. AIAA Journal (AIAA-98-4758)
Goel T, Stander N (2009) Comparing three error criteria for selecting radial basis function network topology. Comput Methods Appl Mech Eng 198:2137–2150
Gorissen D, Dhaene T, Turck FD (2009) Evolutionary model type selection for global surrogate modeling. J Mach Learn Res 10:2039–2078
Gorissen D, Couckuyt I, Demeester P, Dhaene T, Crombecq K (2010) A surrogate modeling and adaptive sampling toolbox for computer based design. J Mach Learn Res 11:2051–2055
Haftka RT, Villanueva D, Chaudhuri A (2016) Parallel surrogate-assisted global optimization with expensive functions – a survey. Struct Multidiscip Optim 54(1):3–13
Hamza K, Saitou K (2012) A co-evolutionary approach for design optimization via ensembles of surrogates with application to vehicle crashworthiness. J Mech Des 134(1):011,001–10
Hardy RL (1971) Multiquadric equations of topography and other irregular surfaces. J Geophys Res 76:1905–1915
Holena M, Demut R (2011) Assessing the suitability of surrogate models in evolutionary optimization. In: Information technologies, pp 31–38
Jakeman JD, Narayan A, Zhou T (2017) A generalized sampling and preconditioning scheme for sparse approximation of polynomial chaos expansions. SIAM J Sci Comput 39(3):A1114–A1144
Jia G, Taflanidis AA (2013) Kriging metamodeling for approximation of high-dimensional wave and surge responses in real-time storm/hurricane risk assessment. Comput Methods Appl Mech Eng 261:24–38
Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modelling criteria. Struct Multidiscip Optim 23(1):1–13
Lee H, Jo Y, Lee D, Choi S (2016) Surrogate model based design optimization of multiple wing sails considering flow interaction effect. Ocean Eng 121:422–436
Li YF, Ng SH, Xie M, Goh TN (2010) A systematic comparison of metamodeling techniques for simulation optimization in decision support systems. Appl Soft Comput 10(2):255–268
Lin S (2011) A nsga-ii program in matlab, version 1.4 ed
Lophaven SN, Nielsen HB, Sondergaard J (2002) Dace - a matlab kriging toolbox, version 2.0. Tech. Rep IMM-REP-2002-12. Informatics and Mathematical Modelling Report, Technical University of Denmark
Martin JD, Simpson TW (2005) Use of kriging models to approximate deterministic computer models. AIAA J 43(4):853–863
Mehmani A, Chowdhury S, Messac A (2015a) Predictive quantification of surrogate model fidelity based on modal variations with sample density. Struct Multidiscip Optim 52(2):353–373
Mehmani A, Chowdhury S, Tong W, Messac A (2015b) Adaptive switching of variable-fidelity models in population-based optimization. In: Engineering and applied sciences optimization, computational methods in applied sciences, vol 38. Springer International Publishing, pp 175–205
Molinaro AM, Simon R, Pfeiffer RM (2005) Rprediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307
Mongillo M (2011) Choosing basis functions and shape parameters for rad-ial basis function methods. In: SIAM undergraduate research online
Qudeiri JEA, Khadra FYA, Umer U, Hussein HMA (2015) Response surface metamodel to predict springback in sheet metal air bending process. International Journal of Materials, Mechanics and Manufacturing 3(4):203–224
Queipo N, Haftka R, Shyy W, Goel T, Vaidyanathan R, Tucker P (2005) Surrogate-based analysis and optimization. Prog Aerosp Sci 41(1):1–28
Reute IM, Mailach VR, Becker KH, Fischersworring-Bunk A, Schlums H, Ivankovic M (2017) Moving least squares metamodels-hyperparameter, variable reduction and model selection. In: 14th international probabilistic workshop. Springer International Publishing, pp 63–80
Rippa S (1999) An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Adv Comput Math 11(2-3):193–210
Roustant O, Ginsbourger D, Deville Y (2012) Dicekriging, diceoptim: two r packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J Stat Softw 51(1):518–523
Soares C, Brazdil PB, Kuba P (2004) A meta-learning method to select the kernel width in support vector regression. Mach Learn 54(3):195–209
Solomatine D, Ostfeld A (2008) Data-driven modelling: some past experiences and new approaches. J Hyd’roinf 10(1):3–22
Takahashi R, Prasai D, Adams BL, Mattson CA (2012) Hybrid bishop-hill model for elastic-yield limited design with non-orthorhombic polycrystalline metals. J Eng Mater Technol 134(1):0110,031–12
Tian W (2013) A review of sensitivity analysis methods in building energy analysis. Renew Sust Energ Rev 20:411–419
Viana FAC, Haftka RT, Steffen V (2009) Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Struct Multidiscip Optim 39:439–457
Viana FAC, Venter G, Balabanov V (2010) An algorithm for fast optimal latin hypercube design of experiments. Int J Numer Methods Eng 82(2):135–156
Zhang J, Messac A, Zhang J, Chowdhury S (2014) Adaptive optimal design of active thermoelectric windows using surrogate modeling. Optim Eng 15(2):469–483
Zhang M, Gou W, Li L, Yang F, Yue Z (2016) Multidisciplinary design and multi-objective optimization on guide fins of twin-web disk using Kriging surrogate model. Struct Multidiscip Optim 55(1):361–373
Zhang Y, Park C, Kim NH, Haftka RT (2017) Function prediction at one inaccessible point using converging lines. J Mech Des 139(5):051,402
Acknowledgements
Support from the National Science Foundation (NSF) Awards CMMI-1642340 and CNS-1524628 is gratefully acknowledged. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF.
Author information
Authors and Affiliations
Contributions
The different core concepts underlying PEMF and COSMOS were conceived, implemented and tested (through MATLAB) by Ali Mehmani and Souma Chowdhury, with important conceptual contributions from Achille Messac with regards to the surrogate modeling paradigm. The airfoil design and building peak cooling model in this paper were developed and implemented by Ali Mehmani, with support from Christoph Meinrenken on the latter.
Corresponding author
Additional information
Parts of this manuscript have been presented at the ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, in 2014, at Buffalo, NY - Paper Number: DETC2014-35358
Appendices
Appendix A: Surrogate model candidates
1.1 Radial basis function (RBF)
The idea of using Radial Basis Functions (RBF) as approximation functions was conceived by (Hardy 1971). The RBF approximation is a linear combination of the basis functions (Ψ) computed with respect to each sample point, as given by
In (9), n p denotes the number of selected sample points; w i ’s are the weights estimated using the pseudo inverse method, based on the training data; and ψ is the basis function that is expressed in terms of the Euclidean distance, r = ∥x − x i∥, of a point x from a given sample point, x i. The most effective forms of the basis functions are listed in Table 5 where σ represents the shape parameter of the basis function. The shape parameter in a basis function has a strong impact on the accuracy of the trained RBF. A smaller shape parameter often corresponds to a wider basis function, and the shape parameter, σ = 0, corresponds to a constant basis function (Mongillo 2011). The different RBFs considered in this paper are listed in Table 5.
1.2 Kriging
Kriging (Giunta and Watson 1998) is an approach to approximate irregular data. The kriging approximation function consists of two components: (i) a global trend function, and (i i) a deviation function representing the departure from the trend function. The trend function is generally a polynomial (e.g., constant, linear, or quadratic). The general form of the kriging surrogate model is given by Cressie (1993):
where \(\bar {F}(x)\) is the unknown function of interest, Z(x) is the realization of a stochastic process with the mean equal to zero, and a nonzero covariance, and \(\hat {f}\) is the known approximation function
where φ is the regression parameter matrix. The i,j − t h element of the covariance matrix, Z(x), is given by
where R i j is the correlation function between the i th and the j th data points; and \({\sigma _{z}^{2}}\) is the process variance, which scales the spatial correlation function. The popular types of correlation functions are listed in Table 5. The correlation function controls the smoothness of the Kriging model estimation, based on the influence of other nearby points on the point of intrest. In Kriging, the regression function coefficients, the process variance, and the correlation function parameters, \(\{\varphi ,{\sigma _{z}^{2}},\theta \}\), each can be predefined or estimated using parameter estimation methods such as Maximum Likelihood Estimation (MLE). In this paper, the regression function coefficients and the process variance are estimated using MLE, as given by
where Y = [y 1 y 2...] represents the vector of the actual output at the training points; R is a correlation matrix; and F is a matrix of f(x) evaluated by Kriging at each training point (Martin and Simpson 2005). The hyper-parameter, 𝜃, in the correlation function is determined by solving the nonlinear hyper-parameter optimization problem. In this paper, Kriging model with first-order regression polynomial function is used. To estimate the regression function coefficients and the process variance in Kriging, the DACE (design and analysis of computer experiments) package, developed by Lophaven et al. (2002), is used in this paper.
1.3 Support vector regression (SVR)
Support Vector Regression (SVR) is a relatively newer regression type surrogate model. For a given training set of instance-label pairs (x i , y i ), i = 1,...,n p , where x i ∈ R n and y ∈1,−1m, a linear SVR is defined by f(x) =< w,x > + b, where b is a bias and < .,. > denotes the dot product. To train the SVR, the error, |ξ| = |y − f(x)|, is minimized by solving the following convex optimization problem:
In (14), ε ≥ 0 represents the difference between the actual and the predicted values; ξ i and \(\tilde {\xi }_{i}\) are the slack variables; C represents the flatness of the function; n p represents the number of training points. By applying kernel functions, K(α,β) =< ϕ(α),ϕ(β) > , under KKT conditions, the original problem is mapped into a higher dimensional space. The dual form of SVR for nonlinear regression could be represented as
The standard Kernel functions used in SVR are listed in Table 5. The performance of SVR depends on its penalty parameter C and kernel parameters γ, r, and d. Using hyper-parameter optimization, correlation parameters can be estimated such that it minimizes the model error. To implement SVR in this paper, the LIBSVM (A Library for Support Vector Machines) package (Chang and Lin 2011) is used.
Appendix B: Analytical test functions
Branin-Hoo function (2 variables):
where x 1 ∈ [−510],x 2 ∈ [015].
Hartmann function (3 variables):
where x = (x 1 x 2…x n )x i ∈ [01].
In this function, the number of variables, n = 3; the constants c, A, and P, are respectively a 1 × 4 vector, a 4 × 3 matrix, and a 4 × 3 matrix:
Perm Function (10 variables):
Dixon & Price Function (50 variables):
Appendix C: Relationship between the error of the model selected by COSMOS and the size and distribution of the training data set
In this section, we explore the relationship between the error of the model selected by COSMOS and the size and distribution of the training data set for the Branin-Hoo benchmark function. A single model-kernel combination is considered here (RBF with Multiquadric function). A set of 200 paired training data set is randomly generated: X1, X2, X3,...,X200. The size of each training data set is defined to be: {X1t o40} = 30, {X41t o80} = 60, {X81t o120} = 90, {X121 to 160} = 120, {X161 to 200} = 150. The distribution of samples is different (random) for sets with the same size. For each data set, COSMOS is applied to find the hyper-parameter value that minimizes the median error metric.
The median error of the selected model for the different data sets is illustrated in Fig. 21 as a series of boxplots, with each boxplot corresponding to sample sets of one given size. It is readily evident that error of the selected model is highly sensitive to the size of the training data sets. Although the influence of the training data distribution is also evident from the significant observed variance in the resulting model error, no particular trend is observed.
Appendix D: Implementation of COSMOS on Hartmann and Perm functions
The final solutions, including the best trade-offs between the median and the maximum errors (in the Φ 0, Φ 1, and Φ 2 classes) in Hartmann Perm functions are illustrated in Figs. 22 and 23. For the Hartmann test function, RBF with the Gaussian and Multiquadratic basis functions and Kriging with the Gaussian correlation function under the Φ 1 class constitute the set of Pareto models in both Cascaded and One-Step techniques (Table 4). It is observed from Fig. 22 that in this problem, the Pareto solutions given by COSMOS for different Hyper-parameter classes have a larger spread than those given by the actual error. However, in terms of the best trade-off models, there is a fair agreement between the results of COSMOS and those determined from the actual errors (Table 4). Unlike the Branin-Hoo test function, in the Hartmann function, there is noticeable overlap between the final solutions from the different Hyper-parameter classes.
Figure 23 and Table 4 show that for the Perm test function, at least one model-kernel combination from each of the three classes (Φ 0, Φ 1, and Φ 2) contribute to the Pareto optimal set. In this test problem, Kriging with Linear correlation function and SVR with Sigmoid kernel function are selected as the best models with the lowest median error and the lowest maximum error, respectively. It can be seen from Table 4 that there is promising agreement between the model-kernel combinations chosen by COSMOS and those chosen based on the actual error. From the COSMOS and the actual error results, we also observe that a Pareto solution from the Φ 0 class (RBF-Linear) is located at the elbow of the Pareto Frontier, which could be considered to represents a practically attractive, best trade-off model choice.
Rights and permissions
About this article
Cite this article
Mehmani, A., Chowdhury, S., Meinrenken, C. et al. Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters. Struct Multidisc Optim 57, 1093–1114 (2018). https://doi.org/10.1007/s00158-017-1797-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00158-017-1797-y