Skip to main content

Parameter Estimation and Optimization

  • Chapter
  • First Online:

Part of the book series: Information Fusion and Data Science ((IFDS))

Abstract

The selection of parameters or hyper-parameters gives great impact on the performance of a data-driven model. This chapter introduces some commonly used parameter optimization and estimation methods, such as the gradient-based methods (e.g., gradient descend, Newton method, and conjugate gradient method) and the intelligent optimization ones (e.g., genetic algorithm, differential evolution algorithm, and particle swarm optimization). In particular, in this chapter, the conjugate gradient method is employed to optimize the hyper-parameters in a LSSVM model based on noise estimation, which enable to alleviate the impact of noise on the performance of the LSSVM. As for dynamic models, this chapter introduces nonlinear Kalman-filter methods for parameter estimation. The well-known ones include the extended Kalman-filter, the unscented Kalman-filter, and the cubature Kalman-filter. Here, a dual estimation model based on two Kalman-filters is illustrated, which simultaneously estimates the uncertainties of internal state and the output. Besides, the probabilistic methods for parameter estimation are also introduced, where a Bayesian model, especially a variational inference framework, is elaborated in details. In such a framework, a particular variational relevance vector machine (RVM) model based on automatic relevance determination kernel is introduced, which provides the approximated posterior distributions over the kernel parameters. Finally, we give some case studies by employing a number of industrial data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Protter, M. H. (2014). Basic elements of real analysis. Springer.

    Google Scholar 

  2. Fletcher, R. (2005). On the Barzilai-Borwein method. Applied Optimization, 96, 235–256.

    Article  MathSciNet  Google Scholar 

  3. Bottou, L. (1998). Online algorithms and stochastic approximations. Cambridge University Press.

    Google Scholar 

  4. Kiwiel, K. C. (2001). Convergence and efficiency of subgradient methods for quasiconvex minimization. Mathematical Programming, 90(1), 1–25.

    Article  MathSciNet  Google Scholar 

  5. Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation, 24(111), 647–656.

    Article  MathSciNet  Google Scholar 

  6. Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Springer-Verlag New York Inc.

    Google Scholar 

  7. Dai, Y. H. (2013). A perfect example for the BFGS method. Mathematical Programming, 138(1–2), 501–530.

    Article  MathSciNet  Google Scholar 

  8. Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation (pp. 49–55). In Proc. Sixth Conf. on Natural Language Learning (CoNLL).

    Google Scholar 

  9. Andrew, G., & Gao, J. (2007). Scalable training of L-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning.

    Google Scholar 

  10. Knyazev, A. V., & Lashuk, I. (2008). Steepest descent and conjugate gradient methods with variable preconditioning. SIAM Journal on Matrix Analysis and Applications, 29(4), 1267.

    Article  MathSciNet  Google Scholar 

  11. Hestenes, M. R., & Stiefel, E. L. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 5(2), 409–432.

    Article  MathSciNet  Google Scholar 

  12. Fletcher, R., & Reeves, C. (1964). Function minimization by conjugate gradients. Computer Journal, 7(1), 149–154.

    Article  MathSciNet  Google Scholar 

  13. Polak, B., & Ribiere, G. (1969). Note sur la convergence des methods de directions conjuguees. Rev Francaise Imformmat Recherche Opertionelle, 16(1), 35–43.

    MATH  Google Scholar 

  14. Polyak, B. T. (1969). The conjugate gradient method in extreme problems. USSR Computational Mathematics and Mathematical Physics, 9(1), 94–112.

    Article  Google Scholar 

  15. Fletcher, R. (1987). Practical methods of optimization, Vol. 1: Unconstrained optimization (pp. 10–30). New York: Wiley.

    Google Scholar 

  16. Liu, Y., & Storey, C. (1991). Efficient generalized conjugate gradient algorithms, Part 1: Theory. Journal of Optimization Theory and Applications, 69(1), 129–137.

    Article  MathSciNet  Google Scholar 

  17. Dai, H. Y., & Yuan, Y. (2000). A nonlinear conjugate gradient method with a strong global convergence property. SIAM Journal on Optimization, 10(1), 177–182.

    Article  MathSciNet  Google Scholar 

  18. Zhang, X. P., Zhao, J., Wei, W., et al. (2010). COG holder level prediction model based on least square support vector machine and its application. Control and Decision, 25(8), 1178–1183.

    Google Scholar 

  19. Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: University of Michigan Press.

    Google Scholar 

  20. Srinivas, M., & Patnaik, L. (1994). Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on System, Man and Cybernetics, 4(4), 656–667.

    Article  Google Scholar 

  21. Zhang, J., Chung, H., & Lo, W. L. (2007). Clustering-based adaptive crossover and mutation probabilities for genetic algorithms. IEEE Transactions on Evolutionary Computation, 11(3), 326–335.

    Article  Google Scholar 

  22. Storn, R. (1995). Constrained optimization. Dr. Dobb’s Journal, 119–123.

    Google Scholar 

  23. Das, S., Abraham, A., & Konar, A. (2009). Differential evolution algorithm: Foundations and perspectives (Vol. 178, pp. 63–110).

    Google Scholar 

  24. Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization (pp. 1942–1948). In International Conference on Neural Networks.

    Google Scholar 

  25. Eberhart, R., & Kennedy, J. (1995). A new optimizer using particle swarm theory (pp. 39–43). In International Symposium on Micro Machine and Human Science.

    Google Scholar 

  26. Eberhart, R. C., Shi, Y., & Kennedy, J. (2001). Swarm intelligence. Amsterdam: Elsevier.

    Google Scholar 

  27. Shi, Y., & Eberhart, R. C. (1998). Parameter selection in particle swarm optimization (pp. 591–600). In Evolutionary Programming VI/: Proc. EP98. New York: Springer.

    Google Scholar 

  28. Shi, Y., & Eberhart, R. C. (1998). A modified particle swarm optimizer (pp. 69–73). In Proceedings of the IEEE International Conference on Evolutionary Computation. Piscataway, NJ: IEEE Press.

    Google Scholar 

  29. Kitayama, S., Arakawa, M., & Yamazaki, K. (2006). Penalty function approach for the mixed discrete nonlinear problems by particle swarm optimization. Structural and Multidisciplinary Optimization, 32(3), 191–202.

    Article  MathSciNet  Google Scholar 

  30. Li, D., Wang, B., Kita-Yama, S., Yamazaki, K., & Arakawa, M. (2005). Application of particle swarm optimization to the mixed discrete non-linear problems (pp. 315–324). In Artificial intelligence applications and innovations, USA, Vol. 187.

    Google Scholar 

  31. Kitayama, S., & Yasuda, K. (2006). A method for mixed integer programming problems by particle swarm optimization. Electrical Engineering in Japan, 157(2), 40–49.

    Article  Google Scholar 

  32. Chen, W. N., Zhang, J., Chung, H. S. H., et al. (2010). A novel set-based particle swarm optimization method for discrete optimization problems. IEEE Transactions on Evolutionary Computation, 14(2), 278–300.

    Article  Google Scholar 

  33. Gong, Y. J., Zhang, J., Liu, O., et al. (2012). Optimizing the vehicle routing problem with time windows: A discrete particle swarm optimization approach. IEEE Transactions on Systems Man & Cybernetics Part C, 42(2), 254–267.

    Article  Google Scholar 

  34. Robinson, D. G. (2005). Reliability analysis of bulk power systems using swarm intelligence (pp. 96–102). In Reliability and maintainability symposium, 2005. Proceedings. IEEE.

    Google Scholar 

  35. Pampara, G., Franken, N., & Engelbrecht, A. P. (2005). Combining particle swarm optimisation with angle modulation to solve binary problems (pp. 89–96). In The 2005 I.E. Congress on Evolutionary Computation, 2005. IEEE.

    Google Scholar 

  36. Wu, W. C., & Tsai, M. S. (2011). Application of enhanced integer coded particle swarm optimization for distribution system feeder reconfiguration. IEEE Transactions on Power Systems, 26(3), 1591–1599.

    Article  Google Scholar 

  37. Kirkpatrick, S., Gelatt, C. D., et al. (1983). Optimization by simulated annealing. Science, 220, 671–680.

    Article  MathSciNet  Google Scholar 

  38. Cerny, V. (1985). Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45, 41–51.

    Article  MathSciNet  Google Scholar 

  39. Fleischer, M. A. (1995). Simulated annealing: Past, present, and future (pp. 155–161). In Proceedings of the 1995 Winter Simulation Conference, IEEE Press, Arlington, Virginia.

    Google Scholar 

  40. Henderson, D., Jacobson, S. H., & Johnson, A. W. (2003). Handbook of metaheuristics. Boston: Kluwer.

    Google Scholar 

  41. Kumar, P. (2006). A survey of simulated annealing as a tool for single and multiobjective optimization. Journal of the Operational Research Society, 57(10), 1143–1160.

    Article  Google Scholar 

  42. Sastry, Y. (1971). Decomposition of the extended Kalman filter. IEEE Transactions on Automatic Control, 16(3), 260–261.

    Article  MathSciNet  Google Scholar 

  43. Einicke, G. A. (2012). Smoothing, filtering and prediction: Estimating the past, present and future. Rijeka: Intech.

    Google Scholar 

  44. Andreasen, M. M. (2013). Non-linear DSGE Models and the central difference Kalman Filter †. Journal of Applied Econometrics, 28(6), 929–955.

    MathSciNet  Google Scholar 

  45. Wan, E. A., & van der Menve, R. (2000). The unscented Kalman Filter for nonlinear estimation. In IEEE Conference on Symposium on Adaptive Systems for Signal Processing, Communications, and Control (AS-SPCC).

    Google Scholar 

  46. Arasaratnam, I., & Haykin, S. (2009). Cubature Kalman filters. IEEE Transactions on Automatic Control, 54(6), 1254–1269.

    Article  MathSciNet  Google Scholar 

  47. Sheng, C., Zhao, J., Liu, Y., et al. (2012). Prediction for noisy nonlinear time series by echo state network based on dual estimation. Neurocomputing, 82(4), 186–195.

    Article  Google Scholar 

  48. Venayagamoorthy, G., & Shishir, B. (2009). Effects of spectral radius and settling time in the performance of echo state networks. Neural Networks, 22(7), 861.

    Article  Google Scholar 

  49. Bishop, C. M. (2006). Pattern recognition and machine learning (information science and statistics). New York: Springer.

    MATH  Google Scholar 

  50. Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester: Wiley.

    Book  Google Scholar 

  51. Gull, S. F. (1989). Developments in maximum entropy data analysis. In J. Skilling (Ed.), Maximum entropy and Bayesian methods (pp. 53–71). Dordrecht: Kluwer.

    Chapter  Google Scholar 

  52. MacKay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736.

    Article  Google Scholar 

  53. Berger, J. O. (1985). Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer.

    Book  Google Scholar 

  54. Parisi, G. (1988). Statistical field theory. New York: Addison-Wesley.

    MATH  Google Scholar 

  55. Zhao, J., Chen, L., Pedrycz, W., & Wang, W. (in press). Variational inference based automatic relevance determination kernel for embedded feature selection of noisy industrial data, IEEE Transactions on Industrial Electronics. https://doi.org/10.1109/TIE.2018.2815997.

  56. Bishop, C. M., & Tipping, M. E. (2000). Variational relevance vector machines. In Conference on uncertainty in artificial intelligence.

    Google Scholar 

  57. Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1(3), 211–244.

    MathSciNet  MATH  Google Scholar 

  58. Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.

    Book  Google Scholar 

  59. Rasmussen, C., & Williams, C. (2006). Gaussian processes for machine learning. MIT Press.

    Google Scholar 

  60. Zhao, J., Liu, Q., Pedrycz, W., et al. (2012). Effective noise estimation-based online prediction for byproduct gas system in steel industry. IEEE Transactions on Industrial Informatics, 8(4), 953–963.

    Article  Google Scholar 

  61. Zhao, Y., & Keong, K. C. (2004). Fast leave-one-out evaluation and improvement on inference for LS-SVM (pp. 1051–4651). In Proc. IEEE Int. Conf. Pattern Recognit., Cambridge, U.K.

    Google Scholar 

  62. An, S., Liu, W., & Venkatesh, S. (2007). Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognition, 40, 2154–2162.

    Article  Google Scholar 

  63. Chi, M. V., Wong, P. K., & Li, Y. P. (2006). Prediction of automotive engine power and torque using least squares support vector machines and Bayesian inference. Engineering Applications of Artificial Intelligence, 19(3), 277–287.

    Article  Google Scholar 

  64. Rubio, G., Pomares, H., Rojas, I., et al. (2009). Efficient optimization of the parameters of LS-SVM for regression versus cross-validation error (pp. 406–415). In International Conference on Artificial Neural Networks. Springer.

    Google Scholar 

  65. Jones, A. J. (2004). New tools in non-linear modelling and prediction. Computational Management Science, 1(2), 109–149.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhao, J., Wang, W., Sheng, C. (2018). Parameter Estimation and Optimization. In: Data-Driven Prediction for Industrial Processes and Their Applications. Information Fusion and Data Science. Springer, Cham. https://doi.org/10.1007/978-3-319-94051-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94051-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94050-2

  • Online ISBN: 978-3-319-94051-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics