A combined Bayes — maximum likelihood method for regression

  • Alexei Chervonenkis
  • Alex Gammerman
  • Mark Herbster
Part of the International Centre for Mechanical Sciences book series (CISM, volume 431)


In this paper we propose an efficient method for model selection. We apply this method to select the degree of regularization, and either the number of basis functions or the parameters of a kernel function to be used in a regression of the data. The method combines the well-known Bayesian approach with the maximum likelihood method. The Bayesian approach is applied to a set of models with conventional priors that depend on unknown parameters, and the maximum likelihood method is used to determine these parameters. When parameter values determine the complexity of a model, a determination of model complexity is thus obtained. Under the assumption of Gaussian noise the method leads to a computationally feasible procedure for determining the optimum number of basis functions and the degree of regularization in ridge regression. This procedure is an inexpensive alternative to cross-validation. In the non-Gaussian case we show connections to support vectors methods. We also present experimental results comparing this method to other methods of model complexity selection, including cross-validation.


Support Vector Machine Covariance Function Penalty Function Support Vector Regression Maximum Likelihood Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [ABR64]
    M. A. Aizerman, E. M. Braverman. and L. 1. Rozonoér. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control. 25: 821–837. 1964.Google Scholar
  2. [Aka70]
    H. Akaike. Statistical predictor identification. Annals of the Institute for Statistical Mathematics, 22: 203–217, 1970.CrossRefMATHMathSciNetGoogle Scholar
  3. [BM98]
    C. Blake and C. Merz. UCI repository of machine learning databases, 1998.—mlearn/MLRcpository.html.Google Scholar
  4. [CCGH99]
    A. Chervonenkis, P. Chervonenkis, A. Gammerman. and M. Herhster. A combined bayesian - maximum likelihood approach to model selection. In Proceedings of IJCAI99 Workshop on Support Vector Machines.,Stockholm, I999.Google Scholar
  5. [CMV96]
    V. Cherkassky, F. Mulier, and V. Vapnik. Comparison of vc-method with classical methods for model selection. In Proceeding of the World Congress on Neural Networks, pages 957–962, 1996.Google Scholar
  6. [CW79]
    P. Craven and G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik, 31: 377–403, 1979.CrossRefMATHMathSciNetGoogle Scholar
  7. [DBK+97]
    H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, page 155. The MIT Press, 1997.Google Scholar
  8. [Här92]
    W. Härdle. Applied Nonparametric Regression. Springer Verlag, Berlin, 1992.Google Scholar
  9. [HK70]
    A. E. Hoerl and R. W. Kennard. Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12: 55–67, 1970.CrossRefMATHGoogle Scholar
  10. [HR78]
    D. Harrison and D.L. Rubinfeld. Hedonic prices and the demand for clean air. J. Environ. Economics Management, 5: 81–102, 1978.CrossRefMATHGoogle Scholar
  11. [KMNR97]
    Michael Kearns, Yishay Mansour, Andrew Y. Ng, and Dana Ron. An experimental and theoretical comparison of model selection methods. Machine Learning, 27: 7–50, 1997.CrossRefGoogle Scholar
  12. [Kri76]
    D. G. Krige. A review of the development of geostatistics in south africa. In M. Guarascio, M. David, and C. Huijbregts, editors, Advanced geostatistics in the mining industry, pages 279–293. Reidel, 1976.Google Scholar
  13. [Mac92]
    D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4 (3): 415–447, 1992.CrossRefGoogle Scholar
  14. [Mat63]
    G. Matheron. Principles of geostatistics. Economic geology, 58: 1246–1266, 1963.CrossRefGoogle Scholar
  15. [Ris76]
    J. Rissanen. Parameter estimation by shortest description of data. Proc DACE Conf RSME, pages 593—?, 1976.Google Scholar
  16. [Ris87]
    J. Rissanen. Stochastic complexity (with discussion). Journal of the Royal Statistical Society series B, 49: 223–239, 1987.MATHMathSciNetGoogle Scholar
  17. [Sch78]
    G. Schwartz. Estimating the dimension of a model. Annals of statistics. 6: 461–464, 1978.CrossRefMathSciNetGoogle Scholar
  18. [SGV+97]
    M. O. Stitson, A. Gammerman, V. N. Vapnik. V. Vovk, C. Watkins. and J. Weston. Support vector regression with anova decomposition kernels. Technical report, Royal Holloway, University of London. 1997.Google Scholar
  19. [SGV98]
    G. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Pmvc. 15th International Conf. on Machine Learning, pages 515–521. Morgan Kaufmann. San Francisco. CA. 1998.Google Scholar
  20. [Shi8]
    R. Shibata. An optimal selection of regresion variables. Bio, netrika, 68: 45–54, 1981.MATHGoogle Scholar
  21. [TKM70]
    V.F. Turchin, V.P. Kozlov, and M.S. Malkevich. Application of mathematical statistics methods for ill posed problem solving (rus.). Uspehi. Phys. Nauk., 102: 345–386, 1970.CrossRefGoogle Scholar
  22. [Vap82]
    V. N. Vapnik. Estimation of Dependencies Based on Empirical Data. Springer-Verlag, Berlin, 1982.Google Scholar
  23. [Vap98]
    V. Vapnik. Statistical Learning Theory. John_Wiley. 1998.Google Scholar
  24. [Wa197]
    C. S. Wallace. On the selection of the order of a polynomial model. Technical report, Royal Holloway. 1997.Google Scholar
  25. [WB68]
    C. Wallace and D. Boulton. An information measure for classification. Computing Journal, 11(2): 185–195. August 1968.Google Scholar
  26. [WF87]
    C. S. Wallace and P. R. Freeman. Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society series B, 49: 240–265, 1987.MATHMathSciNetGoogle Scholar
  27. [Wi197]
    C. K. I. Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. Technical report. Aston University. UK, 1997. To appear in: Learning and Inference in Graphical Models, ed. M. L Jordan, Kluwer. 1998.Google Scholar

Copyright information

© Springer-Verlag Wien 2001

Authors and Affiliations

  • Alexei Chervonenkis
    • 1
  • Alex Gammerman
    • 2
  • Mark Herbster
    • 2
  1. 1.Institute of Control SciencesMoscow GSP-4Russia
  2. 2.Computer Learning Research Centre Department of Computer Science Royal HollowayUniversity of LondonEgham, SurreyEngland

Personalised recommendations