Parameter Estimation and Optimization

Zhao, Jun; Wang, Wei; Sheng, Chunyang

doi:10.1007/978-3-319-94051-9_7

Parameter Estimation and Optimization

Jun Zhao⁵,
Wei Wang⁵ &
Chunyang Sheng⁶

Chapter
First Online: 21 August 2018

1048 Accesses
1 Citations

Part of the book series: Information Fusion and Data Science ((IFDS))

Abstract

The selection of parameters or hyper-parameters gives great impact on the performance of a data-driven model. This chapter introduces some commonly used parameter optimization and estimation methods, such as the gradient-based methods (e.g., gradient descend, Newton method, and conjugate gradient method) and the intelligent optimization ones (e.g., genetic algorithm, differential evolution algorithm, and particle swarm optimization). In particular, in this chapter, the conjugate gradient method is employed to optimize the hyper-parameters in a LSSVM model based on noise estimation, which enable to alleviate the impact of noise on the performance of the LSSVM. As for dynamic models, this chapter introduces nonlinear Kalman-filter methods for parameter estimation. The well-known ones include the extended Kalman-filter, the unscented Kalman-filter, and the cubature Kalman-filter. Here, a dual estimation model based on two Kalman-filters is illustrated, which simultaneously estimates the uncertainties of internal state and the output. Besides, the probabilistic methods for parameter estimation are also introduced, where a Bayesian model, especially a variational inference framework, is elaborated in details. In such a framework, a particular variational relevance vector machine (RVM) model based on automatic relevance determination kernel is introduced, which provides the approximated posterior distributions over the kernel parameters. Finally, we give some case studies by employing a number of industrial data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Protter, M. H. (2014). Basic elements of real analysis. Springer.
Google Scholar
Fletcher, R. (2005). On the Barzilai-Borwein method. Applied Optimization, 96, 235–256.
Article MathSciNet Google Scholar
Bottou, L. (1998). Online algorithms and stochastic approximations. Cambridge University Press.
Google Scholar
Kiwiel, K. C. (2001). Convergence and efficiency of subgradient methods for quasiconvex minimization. Mathematical Programming, 90(1), 1–25.
Article MathSciNet Google Scholar
Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.
Article MathSciNet Google Scholar
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Springer-Verlag New York Inc.
Google Scholar
Dai, Y. H. (2013). A perfect example for the BFGS method. Mathematical Programming, 138(1–2), 501–530.
Article MathSciNet Google Scholar
Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation (pp. 49–55). In Proc. Sixth Conf. on Natural Language Learning (CoNLL).
Google Scholar
Andrew, G., & Gao, J. (2007). Scalable training of L-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning.
Google Scholar
Knyazev, A. V., & Lashuk, I. (2008). Steepest descent and conjugate gradient methods with variable preconditioning. SIAM Journal on Matrix Analysis and Applications, 29(4), 1267.
Article MathSciNet Google Scholar
Hestenes, M. R., & Stiefel, E. L. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 5(2), 409–432.
Article MathSciNet Google Scholar
Fletcher, R., & Reeves, C. (1964). Function minimization by conjugate gradients. Computer Journal, 7(1), 149–154.
Article MathSciNet Google Scholar
Polak, B., & Ribiere, G. (1969). Note sur la convergence des methods de directions conjuguees. Rev Francaise Imformmat Recherche Opertionelle, 16(1), 35–43.
MATH Google Scholar
Polyak, B. T. (1969). The conjugate gradient method in extreme problems. USSR Computational Mathematics and Mathematical Physics, 9(1), 94–112.
Article Google Scholar
Fletcher, R. (1987). Practical methods of optimization, Vol. 1: Unconstrained optimization (pp. 10–30). New York: Wiley.
Google Scholar
Liu, Y., & Storey, C. (1991). Efficient generalized conjugate gradient algorithms, Part 1: Theory. Journal of Optimization Theory and Applications, 69(1), 129–137.
Article MathSciNet Google Scholar
Dai, H. Y., & Yuan, Y. (2000). A nonlinear conjugate gradient method with a strong global convergence property. SIAM Journal on Optimization, 10(1), 177–182.
Article MathSciNet Google Scholar
Zhang, X. P., Zhao, J., Wei, W., et al. (2010). COG holder level prediction model based on least square support vector machine and its application. Control and Decision, 25(8), 1178–1183.
Google Scholar
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.
Google Scholar
Srinivas, M., & Patnaik, L. (1994). Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on System, Man and Cybernetics, 4(4), 656–667.
Article Google Scholar
Zhang, J., Chung, H., & Lo, W. L. (2007). Clustering-based adaptive crossover and mutation probabilities for genetic algorithms. IEEE Transactions on Evolutionary Computation, 11(3), 326–335.
Article Google Scholar
Storn, R. (1995). Constrained optimization. Dr. Dobb’s Journal, 119–123.
Google Scholar
Das, S., Abraham, A., & Konar, A. (2009). Differential evolution algorithm: Foundations and perspectives (Vol. 178, pp. 63–110).
Google Scholar
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization (pp. 1942–1948). In International Conference on Neural Networks.
Google Scholar
Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory (pp. 39–43). In International Symposium on Micro Machine and Human Science.
Google Scholar
Eberhart, R. C., Shi, Y., & Kennedy, J. (2001). Swarm intelligence. Amsterdam: Elsevier.
Google Scholar
Shi, Y., & Eberhart, R. C. (1998). Parameter selection in particle swarm optimization (pp. 591–600). In Evolutionary Programming VI/: Proc. EP98. New York: Springer.
Google Scholar
Shi, Y., & Eberhart, R. C. (1998). A modified particle swarm optimizer (pp. 69–73). In Proceedings of the IEEE International Conference on Evolutionary Computation. Piscataway, NJ: IEEE Press.
Google Scholar
Kitayama, S., Arakawa, M., & Yamazaki, K. (2006). Penalty function approach for the mixed discrete nonlinear problems by particle swarm optimization. Structural and Multidisciplinary Optimization, 32(3), 191–202.
Article MathSciNet Google Scholar
Li, D., Wang, B., Kita-Yama, S., Yamazaki, K., & Arakawa, M. (2005). Application of particle swarm optimization to the mixed discrete non-linear problems (pp. 315–324). In Artificial intelligence applications and innovations, USA, Vol. 187.
Google Scholar
Kitayama, S., & Yasuda, K. (2006). A method for mixed integer programming problems by particle swarm optimization. Electrical Engineering in Japan, 157(2), 40–49.
Article Google Scholar
Chen, W. N., Zhang, J., Chung, H. S. H., et al. (2010). A novel set-based particle swarm optimization method for discrete optimization problems. IEEE Transactions on Evolutionary Computation, 14(2), 278–300.
Article Google Scholar
Gong, Y. J., Zhang, J., Liu, O., et al. (2012). Optimizing the vehicle routing problem with time windows: A discrete particle swarm optimization approach. IEEE Transactions on Systems Man & Cybernetics Part C, 42(2), 254–267.
Article Google Scholar
Robinson, D. G. (2005). Reliability analysis of bulk power systems using swarm intelligence (pp. 96–102). In Reliability and maintainability symposium, 2005. Proceedings. IEEE.
Google Scholar
Pampara, G., Franken, N., & Engelbrecht, A. P. (2005). Combining particle swarm optimisation with angle modulation to solve binary problems (pp. 89–96). In The 2005 I.E. Congress on Evolutionary Computation, 2005. IEEE.
Google Scholar
Wu, W. C., & Tsai, M. S. (2011). Application of enhanced integer coded particle swarm optimization for distribution system feeder reconfiguration. IEEE Transactions on Power Systems, 26(3), 1591–1599.
Article Google Scholar
Kirkpatrick, S., Gelatt, C. D., et al. (1983). Optimization by simulated annealing. Science, 220, 671–680.
Article MathSciNet Google Scholar
Cerny, V. (1985). Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45, 41–51.
Article MathSciNet Google Scholar
Fleischer, M. A. (1995). Simulated annealing: Past, present, and future (pp. 155–161). In Proceedings of the 1995 Winter Simulation Conference, IEEE Press, Arlington, Virginia.
Google Scholar
Henderson, D., Jacobson, S. H., & Johnson, A. W. (2003). Handbook of metaheuristics. Boston: Kluwer.
Google Scholar
Kumar, P. (2006). A survey of simulated annealing as a tool for single and multiobjective optimization. Journal of the Operational Research Society, 57(10), 1143–1160.
Article Google Scholar
Sastry, Y. (1971). Decomposition of the extended Kalman filter. IEEE Transactions on Automatic Control, 16(3), 260–261.
Article MathSciNet Google Scholar
Einicke, G. A. (2012). Smoothing, filtering and prediction: Estimating the past, present and future. Rijeka: Intech.
Google Scholar
Andreasen, M. M. (2013). Non-linear DSGE Models and the central difference Kalman Filter †. Journal of Applied Econometrics, 28(6), 929–955.
MathSciNet Google Scholar
Wan, E. A., & van der Menve, R. (2000). The unscented Kalman Filter for nonlinear estimation. In IEEE Conference on Symposium on Adaptive Systems for Signal Processing, Communications, and Control (AS-SPCC).
Google Scholar
Arasaratnam, I., & Haykin, S. (2009). Cubature Kalman filters. IEEE Transactions on Automatic Control, 54(6), 1254–1269.
Article MathSciNet Google Scholar
Sheng, C., Zhao, J., Liu, Y., et al. (2012). Prediction for noisy nonlinear time series by echo state network based on dual estimation. Neurocomputing, 82(4), 186–195.
Article Google Scholar
Venayagamoorthy, G., & Shishir, B. (2009). Effects of spectral radius and settling time in the performance of echo state networks. Neural Networks, 22(7), 861.
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). New York: Springer.
MATH Google Scholar
Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester: Wiley.
Book Google Scholar
Gull, S. F. (1989). Developments in maximum entropy data analysis. In J. Skilling (Ed.), Maximum entropy and Bayesian methods (pp. 53–71). Dordrecht: Kluwer.
Chapter Google Scholar
MacKay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736.
Article Google Scholar
Berger, J. O. (1985). Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer.
Book Google Scholar
Parisi, G. (1988). Statistical field theory. New York: Addison-Wesley.
MATH Google Scholar
Zhao, J., Chen, L., Pedrycz, W., & Wang, W. (in press). Variational inference based automatic relevance determination kernel for embedded feature selection of noisy industrial data, IEEE Transactions on Industrial Electronics. https://doi.org/10.1109/TIE.2018.2815997.
Bishop, C. M., & Tipping, M. E. (2000). Variational relevance vector machines. In Conference on uncertainty in artificial intelligence.
Google Scholar
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1(3), 211–244.
MathSciNet MATH Google Scholar
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Book Google Scholar
Rasmussen, C., & Williams, C. (2006). Gaussian processes for machine learning. MIT Press.
Google Scholar
Zhao, J., Liu, Q., Pedrycz, W., et al. (2012). Effective noise estimation-based online prediction for byproduct gas system in steel industry. IEEE Transactions on Industrial Informatics, 8(4), 953–963.
Article Google Scholar
Zhao, Y., & Keong, K. C. (2004). Fast leave-one-out evaluation and improvement on inference for LS-SVM (pp. 1051–4651). In Proc. IEEE Int. Conf. Pattern Recognit., Cambridge, U.K.
Google Scholar
An, S., Liu, W., & Venkatesh, S. (2007). Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognition, 40, 2154–2162.
Article Google Scholar
Chi, M. V., Wong, P. K., & Li, Y. P. (2006). Prediction of automotive engine power and torque using least squares support vector machines and Bayesian inference. Engineering Applications of Artificial Intelligence, 19(3), 277–287.
Article Google Scholar
Rubio, G., Pomares, H., Rojas, I., et al. (2009). Efficient optimization of the parameters of LS-SVM for regression versus cross-validation error (pp. 406–415). In International Conference on Artificial Neural Networks. Springer.
Google Scholar
Jones, A. J. (2004). New tools in non-linear modelling and prediction. Computational Management Science, 1(2), 109–149.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dalian University of Technology, Dalian, China
Jun Zhao & Wei Wang
Shandong University of Science and Technology, Qingdao, China
Chunyang Sheng

Authors

Jun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chunyang Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhao, J., Wang, W., Sheng, C. (2018). Parameter Estimation and Optimization. In: Data-Driven Prediction for Industrial Processes and Their Applications. Information Fusion and Data Science. Springer, Cham. https://doi.org/10.1007/978-3-319-94051-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-94051-9_7
Published: 21 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94050-2
Online ISBN: 978-3-319-94051-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics