Abstract
The selection of parameters or hyper-parameters gives great impact on the performance of a data-driven model. This chapter introduces some commonly used parameter optimization and estimation methods, such as the gradient-based methods (e.g., gradient descend, Newton method, and conjugate gradient method) and the intelligent optimization ones (e.g., genetic algorithm, differential evolution algorithm, and particle swarm optimization). In particular, in this chapter, the conjugate gradient method is employed to optimize the hyper-parameters in a LSSVM model based on noise estimation, which enable to alleviate the impact of noise on the performance of the LSSVM. As for dynamic models, this chapter introduces nonlinear Kalman-filter methods for parameter estimation. The well-known ones include the extended Kalman-filter, the unscented Kalman-filter, and the cubature Kalman-filter. Here, a dual estimation model based on two Kalman-filters is illustrated, which simultaneously estimates the uncertainties of internal state and the output. Besides, the probabilistic methods for parameter estimation are also introduced, where a Bayesian model, especially a variational inference framework, is elaborated in details. In such a framework, a particular variational relevance vector machine (RVM) model based on automatic relevance determination kernel is introduced, which provides the approximated posterior distributions over the kernel parameters. Finally, we give some case studies by employing a number of industrial data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Protter, M. H. (2014). Basic elements of real analysis. Springer.
Fletcher, R. (2005). On the Barzilai-Borwein method. Applied Optimization, 96, 235–256.
Bottou, L. (1998). Online algorithms and stochastic approximations. Cambridge University Press.
Kiwiel, K. C. (2001). Convergence and efficiency of subgradient methods for quasiconvex minimization. Mathematical Programming, 90(1), 1–25.
Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Springer-Verlag New York Inc.
Dai, Y. H. (2013). A perfect example for the BFGS method. Mathematical Programming, 138(1–2), 501–530.
Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation (pp. 49–55). In Proc. Sixth Conf. on Natural Language Learning (CoNLL).
Andrew, G., & Gao, J. (2007). Scalable training of L-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning.
Knyazev, A. V., & Lashuk, I. (2008). Steepest descent and conjugate gradient methods with variable preconditioning. SIAM Journal on Matrix Analysis and Applications, 29(4), 1267.
Hestenes, M. R., & Stiefel, E. L. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 5(2), 409–432.
Fletcher, R., & Reeves, C. (1964). Function minimization by conjugate gradients. Computer Journal, 7(1), 149–154.
Polak, B., & Ribiere, G. (1969). Note sur la convergence des methods de directions conjuguees. Rev Francaise Imformmat Recherche Opertionelle, 16(1), 35–43.
Polyak, B. T. (1969). The conjugate gradient method in extreme problems. USSR Computational Mathematics and Mathematical Physics, 9(1), 94–112.
Fletcher, R. (1987). Practical methods of optimization, Vol. 1: Unconstrained optimization (pp. 10–30). New York: Wiley.
Liu, Y., & Storey, C. (1991). Efficient generalized conjugate gradient algorithms, Part 1: Theory. Journal of Optimization Theory and Applications, 69(1), 129–137.
Dai, H. Y., & Yuan, Y. (2000). A nonlinear conjugate gradient method with a strong global convergence property. SIAM Journal on Optimization, 10(1), 177–182.
Zhang, X. P., Zhao, J., Wei, W., et al. (2010). COG holder level prediction model based on least square support vector machine and its application. Control and Decision, 25(8), 1178–1183.
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.
Srinivas, M., & Patnaik, L. (1994). Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on System, Man and Cybernetics, 4(4), 656–667.
Zhang, J., Chung, H., & Lo, W. L. (2007). Clustering-based adaptive crossover and mutation probabilities for genetic algorithms. IEEE Transactions on Evolutionary Computation, 11(3), 326–335.
Storn, R. (1995). Constrained optimization. Dr. Dobb’s Journal, 119–123.
Das, S., Abraham, A., & Konar, A. (2009). Differential evolution algorithm: Foundations and perspectives (Vol. 178, pp. 63–110).
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization (pp. 1942–1948). In International Conference on Neural Networks.
Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory (pp. 39–43). In International Symposium on Micro Machine and Human Science.
Eberhart, R. C., Shi, Y., & Kennedy, J. (2001). Swarm intelligence. Amsterdam: Elsevier.
Shi, Y., & Eberhart, R. C. (1998). Parameter selection in particle swarm optimization (pp. 591–600). In Evolutionary Programming VI/: Proc. EP98. New York: Springer.
Shi, Y., & Eberhart, R. C. (1998). A modified particle swarm optimizer (pp. 69–73). In Proceedings of the IEEE International Conference on Evolutionary Computation. Piscataway, NJ: IEEE Press.
Kitayama, S., Arakawa, M., & Yamazaki, K. (2006). Penalty function approach for the mixed discrete nonlinear problems by particle swarm optimization. Structural and Multidisciplinary Optimization, 32(3), 191–202.
Li, D., Wang, B., Kita-Yama, S., Yamazaki, K., & Arakawa, M. (2005). Application of particle swarm optimization to the mixed discrete non-linear problems (pp. 315–324). In Artificial intelligence applications and innovations, USA, Vol. 187.
Kitayama, S., & Yasuda, K. (2006). A method for mixed integer programming problems by particle swarm optimization. Electrical Engineering in Japan, 157(2), 40–49.
Chen, W. N., Zhang, J., Chung, H. S. H., et al. (2010). A novel set-based particle swarm optimization method for discrete optimization problems. IEEE Transactions on Evolutionary Computation, 14(2), 278–300.
Gong, Y. J., Zhang, J., Liu, O., et al. (2012). Optimizing the vehicle routing problem with time windows: A discrete particle swarm optimization approach. IEEE Transactions on Systems Man & Cybernetics Part C, 42(2), 254–267.
Robinson, D. G. (2005). Reliability analysis of bulk power systems using swarm intelligence (pp. 96–102). In Reliability and maintainability symposium, 2005. Proceedings. IEEE.
Pampara, G., Franken, N., & Engelbrecht, A. P. (2005). Combining particle swarm optimisation with angle modulation to solve binary problems (pp. 89–96). In The 2005 I.E. Congress on Evolutionary Computation, 2005. IEEE.
Wu, W. C., & Tsai, M. S. (2011). Application of enhanced integer coded particle swarm optimization for distribution system feeder reconfiguration. IEEE Transactions on Power Systems, 26(3), 1591–1599.
Kirkpatrick, S., Gelatt, C. D., et al. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Cerny, V. (1985). Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45, 41–51.
Fleischer, M. A. (1995). Simulated annealing: Past, present, and future (pp. 155–161). In Proceedings of the 1995 Winter Simulation Conference, IEEE Press, Arlington, Virginia.
Henderson, D., Jacobson, S. H., & Johnson, A. W. (2003). Handbook of metaheuristics. Boston: Kluwer.
Kumar, P. (2006). A survey of simulated annealing as a tool for single and multiobjective optimization. Journal of the Operational Research Society, 57(10), 1143–1160.
Sastry, Y. (1971). Decomposition of the extended Kalman filter. IEEE Transactions on Automatic Control, 16(3), 260–261.
Einicke, G. A. (2012). Smoothing, filtering and prediction: Estimating the past, present and future. Rijeka: Intech.
Andreasen, M. M. (2013). Non-linear DSGE Models and the central difference Kalman Filter †. Journal of Applied Econometrics, 28(6), 929–955.
Wan, E. A., & van der Menve, R. (2000). The unscented Kalman Filter for nonlinear estimation. In IEEE Conference on Symposium on Adaptive Systems for Signal Processing, Communications, and Control (AS-SPCC).
Arasaratnam, I., & Haykin, S. (2009). Cubature Kalman filters. IEEE Transactions on Automatic Control, 54(6), 1254–1269.
Sheng, C., Zhao, J., Liu, Y., et al. (2012). Prediction for noisy nonlinear time series by echo state network based on dual estimation. Neurocomputing, 82(4), 186–195.
Venayagamoorthy, G., & Shishir, B. (2009). Effects of spectral radius and settling time in the performance of echo state networks. Neural Networks, 22(7), 861.
Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). New York: Springer.
Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester: Wiley.
Gull, S. F. (1989). Developments in maximum entropy data analysis. In J. Skilling (Ed.), Maximum entropy and Bayesian methods (pp. 53–71). Dordrecht: Kluwer.
MacKay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736.
Berger, J. O. (1985). Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer.
Parisi, G. (1988). Statistical field theory. New York: Addison-Wesley.
Zhao, J., Chen, L., Pedrycz, W., & Wang, W. (in press). Variational inference based automatic relevance determination kernel for embedded feature selection of noisy industrial data, IEEE Transactions on Industrial Electronics. https://doi.org/10.1109/TIE.2018.2815997.
Bishop, C. M., & Tipping, M. E. (2000). Variational relevance vector machines. In Conference on uncertainty in artificial intelligence.
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1(3), 211–244.
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Rasmussen, C., & Williams, C. (2006). Gaussian processes for machine learning. MIT Press.
Zhao, J., Liu, Q., Pedrycz, W., et al. (2012). Effective noise estimation-based online prediction for byproduct gas system in steel industry. IEEE Transactions on Industrial Informatics, 8(4), 953–963.
Zhao, Y., & Keong, K. C. (2004). Fast leave-one-out evaluation and improvement on inference for LS-SVM (pp. 1051–4651). In Proc. IEEE Int. Conf. Pattern Recognit., Cambridge, U.K.
An, S., Liu, W., & Venkatesh, S. (2007). Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognition, 40, 2154–2162.
Chi, M. V., Wong, P. K., & Li, Y. P. (2006). Prediction of automotive engine power and torque using least squares support vector machines and Bayesian inference. Engineering Applications of Artificial Intelligence, 19(3), 277–287.
Rubio, G., Pomares, H., Rojas, I., et al. (2009). Efficient optimization of the parameters of LS-SVM for regression versus cross-validation error (pp. 406–415). In International Conference on Artificial Neural Networks. Springer.
Jones, A. J. (2004). New tools in non-linear modelling and prediction. Computational Management Science, 1(2), 109–149.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Zhao, J., Wang, W., Sheng, C. (2018). Parameter Estimation and Optimization. In: Data-Driven Prediction for Industrial Processes and Their Applications. Information Fusion and Data Science. Springer, Cham. https://doi.org/10.1007/978-3-319-94051-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-94051-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94050-2
Online ISBN: 978-3-319-94051-9
eBook Packages: Computer ScienceComputer Science (R0)