Multiparameter Regularization for Construction of Extrapolating Estimators in Statistical Learning Theory

  • Shuai LuEmail author
  • Sergiy PereverzyevJr.
  • Sivananthan Sampath


One-parameter regularization methods, such as the Tikhonov regularization, are used to solve the operator equation for the estimator in the statistical learning theory. Recently, there has been a lot of interest in the construction of the so called extrapolating estimators, which approximate the input–output relationship beyond the scope of the empirical data. The standard Tikhonov regularization produces rather poor extrapolating estimators. In this paper, we propose a novel view on the operator equation for the estimator where this equation is seen as a perturbed version of the operator equation for the ideal estimator. This view suggests the dual regularized total least squares (DRTLS) and multi-penalty regularization (MPR), which are multi-parameter regularization methods, as methods of choice for constructing better extrapolating estimators. We propose and test several realizations of DRTLS and MPR for constructing extrapolating estimators. It will be seen that, among the considered realizations, a realization of MPR gives best extrapolating estimators. For this realization, we propose a rule for the choice of the used regularization parameters that allows an automatic selection of the suitable extrapolating estimator.


Statistical Learning Theory Ideal Figure Tikhonov Regularization (TR) Regularization Method Total Least 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



S. Lu is supported by the National Natural Science Foundation of China (No.11101093) and Shanghai Science and Technology Commission (No.11ZR1402800, No.11PJ1400800). S. Sampath is supported by EU-project “DIAdvisor” performed within 7th Framework Programme of EC.


  1. 1.
    Alpaydin E (2004) Introduction to machine learning (adaptive computation and machine learning). MIT PressGoogle Scholar
  2. 2.
    Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Bakushinskii AB (1984) Remarks on choosing regularization parameter using the quasi-optimality and ratio criterion. USSR Comp Math Math Phys 24:181–182MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bauer F, Pereverzev S, Rosasco L (2007) On regularization algorithms in learning theory. J Complex 23:52–72MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learning Res 7:2399–2434MathSciNetzbMATHGoogle Scholar
  6. 6.
    Berg C, Christensen JPR, Ressel P (1984) Harmonic analysis on semigroups. Theory of positive definite and related functions. Springer, New YorkzbMATHCrossRefGoogle Scholar
  7. 7.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, BerlinzbMATHGoogle Scholar
  8. 8.
    Brezinski C, Redivo-Zaglia M, Rodriguez G, Seatzu S (2003) Multi-parameter regularization techniques for ill-conditioned linear systems. Numer Math 94:203–228MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Caponnetto A, De Vito E (2005) Fast rates for regularized least-squares algorithm. In: CBCL Paper 248/AI Memo 2005-013. Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
  10. 10.
    Caponnetto A, De Vito E (2007) Optimal rates for the regularized least-squares algorithm. Found Comput Math 7:331–368MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Carmeli C, De Vito E, Toigo A (2006) Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. Anal Appl Singap 4:377–408MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Cucker F, Smale S (2002) On the mathematical foundations of learning. Bull Am Math Soc New Ser 39:1–49MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    De Vito E, Rosasco L, Caponnetto A (2006) Discretization error analysis for Tikhonov regularization. Anal Appl Singap 4:81–99MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    De Vito E, Rosasco L, Caponnetto A, De Giovannini U, Odone F (2005) Learning from examples as an inverse problem. J Mach Learning Res 6:883–904zbMATHGoogle Scholar
  15. 15.
    Engl HW, Hanke M, Neubauer A (1996) Regularization of inverse problems. Kluwer Academic Publishers, DordrechtzbMATHCrossRefGoogle Scholar
  16. 16.
    Girosi F, Poggio T (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247:978–982MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Hofmann B (1999) Mathematik inverser Probleme. Teubner, StuttgartzbMATHGoogle Scholar
  18. 18.
    Kirsch A (1996) An introduction to the mathematical theory of inverse problems. Springer, BerlinzbMATHCrossRefGoogle Scholar
  19. 19.
    Kunisch K, Zou J (1998) Iterative choices of regularization parameters in linear inverse problems. Inverse Probl 14:1247–1264MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Kurkova V (2010) Learning as an inverse problem in reproducing kernel hilbert spaces. Technical report, institute of computer science, academy of sciences of the Czech RepublicGoogle Scholar
  21. 21.
    Lu S, Pereverzev SV (2011) Multi-parameter regularization and its numerical realization. Numer Math 118:1–31MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Lu S, Pereverzev SV, Shao Y, Tautenhahn U (2010) Discrepancy curves for multi-parameter regularization. J Inv Ill-Posed Probl 18:655–676MathSciNetGoogle Scholar
  23. 23.
    Lu S, Pereverzev SV, Tautenhahn U (2008) Dual regularized total least squares and multi-parameter regularization. Comp Meth Appl Math 8:253–262MathSciNetzbMATHGoogle Scholar
  24. 24.
    Lu S, Pereverzev SV, Tautenhahn U (2009) Regularized total least squares: computational aspects and error bounds. SIAM J Matrix Anal Appl 31:918–941MathSciNetCrossRefGoogle Scholar
  25. 25.
    Lu S, Pereverzev SV, Tautenhahn U (2010) A model function method in regularized total least squares. Appl Anal 89:1693–1703MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Micchelli CA, Pontil M (2005) Learning the kernel function via regularization. J Mach Learning Res 6:1099–1125MathSciNetzbMATHGoogle Scholar
  27. 27.
    Mitchell TM (1997) Machine learning. McGraw Hill, New YorkzbMATHGoogle Scholar
  28. 28.
    Pinelis I, Sakhanenko A (1986) Remarks on inequalities for large deviation probabilities. Theor Probab Appl 30:143–148zbMATHCrossRefGoogle Scholar
  29. 29.
    Poggio T, Girosi F (1990) Networks for approximation and learning. Notices of AMS 78:1481–1497Google Scholar
  30. 30.
    Poggio T, Smale S (2003) The mathematics of learning: dealing with data. Notices Am Math Soc 50(5):537–544MathSciNetzbMATHGoogle Scholar
  31. 31.
    Rieder A (2003) Keine Probleme mit inversen Problemen. Eine Einführung in ihre stabile Lösung. Vieweg, WiesbadenzbMATHCrossRefGoogle Scholar
  32. 32.
    Tikhonov AN, Glasko VB (1965) Use of the regularization method in non-linear problems. USSR Comp Math Math Phys 5:93–107CrossRefGoogle Scholar
  33. 33.
    Vapnik VN (1998) Statistical learning theory. Wiley, NYzbMATHGoogle Scholar
  34. 34.
    Xie J, Zou J (2002) An improved model function method for choosing regularization parameters in linear inverse problems. Inverse Probl 18:631–643MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer New York 2013

Authors and Affiliations

  • Shuai Lu
    • 1
    Email author
  • Sergiy PereverzyevJr.
    • 2
  • Sivananthan Sampath
    • 3
  1. 1.School of Mathematical ScienceFudan UniversityShanghaiChina
  2. 2.Industrial Mathematics InstituteJohannes Kepler University of LinzLinzAustria
  3. 3.Johann Radon Institute for Computational and Applied MathematicsAustrian Academy of SciencesLinzAustria

Personalised recommendations