Advertisement

Efficiency of classification methods based on empirical risk minimization

  • V. I. Norkin
  • M. A. Keyzer
Article
  • 44 Downloads

A binary classification problem is reduced to the minimization of convex regularized empirical risk functionals in a reproducing kernel Hilbert space. The solution is searched for in the form of a finite linear combination of kernel support functions (Vapnik’s support vector machines). Risk estimates for a misclassification as a function of the training sample size and other model parameters are obtained.

Keywords

machine learning classification recognition empirical risk minimization support vector machine (SVM) consistency rate of convergence 

References

  1. 1.
    V. N. Vapnik, Statistical Learning Theory, Wiley, New York (1998).MATHGoogle Scholar
  2. 2.
    L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Springer, New York (1996).MATHGoogle Scholar
  3. 3.
    C. Stone, “Consistent nonparametric regression,” Ann. Statistics, 5, 595–645 (1977).MATHCrossRefGoogle Scholar
  4. 4.
    V. N. Vapnik and A. Ya. Chervonenkis, Pattern Recognition Theory. Statistical Problems of Learning [in Russian], Nauka, Moscow (1974).Google Scholar
  5. 5.
    V. N. Vapnik, Estimation of Dependences based on Empirical Data [in Russian], Nauka, Moscow (1979).Google Scholar
  6. 6.
    M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, Potential Function Method in Machine Learning Theory [in Russian], Nauka, Moscow (1970).Google Scholar
  7. 7.
    B. Schoelkopf and A. J. Smola, Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA (2002).Google Scholar
  8. 8.
    I. Steinwart and A. Christmann, Support Vector Machines, Springer, New York (2008).MATHGoogle Scholar
  9. 9.
    S. Boucheron, O. Bousquet, and G. Lugosi, “Theory of classification: A survey of some recent advances,” ESAIM: Probability and Statistics, 9, 323–375 (2005).MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    M. I. Schlesinger and V. Hlavác, Ten Lectures on Statistical and Structural Pattern Recognition, Kluwer Acad. Publ. (2004).Google Scholar
  11. 11.
    L. Gyorfi, M. Kohler, A. Krzyzak, and H. Walk, A Distribution Free Theory of Nonparametric Regression, Springer, New York–Berlin–Heidelberg (2002).Google Scholar
  12. 12.
    A. M. Gupal, S. V. Pashko, and I. V. Sergienko, “Efficiency of Bayesian classification procedure,” Cybern. Syst. Analysis, 31, No. 4, 543–554 (1995).MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    I. V. Sergienko and A. M. Gupal, “Optimal pattern recognition procedures and their application,” Cybern. Syst. Analysis, 43, No. 6, 799–809 (2007).MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    A. M. Gupal and I. V. Sergienko, Optimal Pattern Recognition Procedures [in Russian], Naukova Dumka, Kyiv (2008).Google Scholar
  15. 15.
    T. Poggio and S. Smale, “The mathematics of learning: Dealing with data,” Notices Amer. Math. Soc., 50, No. 5, 537–544 (2003).MATHMathSciNetGoogle Scholar
  16. 16.
    R. Koenker and G. W. Bassett, “Regression quantiles,” Econometrica, 46, 33–50 (1978).MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    R. Koenker, Quantile Regression, Cambridge Univ. Press, Cambridge–New York (2005).MATHGoogle Scholar
  18. 18.
    Yu. M. Ermoliev and A. I. Yastremskii, Stochastic Models and Methods in Economic Planning [in Russian], Nauka, Moscow (1979).Google Scholar
  19. 19.
    Y. M. Ermoliev and G. Leonardi, “Some proposals for stochastic facility location models,” Math. Modelling, 3, 407–420 (1982).MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    A. Ruszczynski and A. Shapiro (eds.), Stochastic Programming, Vol. 10 of the Handbooks in Operation Research and Management Science, Elsevier, Amsterdam (2003).Google Scholar
  21. 21.
    F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bull. Amer. Math. Soc. (N.S.), 39, No. 1, 1–49 (2002).MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    N. Aronshain, “Theory of reproducing kernels,” Matematika, 7, No. 2, 67–130 (1963).Google Scholar
  23. 23.
    A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer Acad. Publ., Dordrecht–Boston–London (2004).MATHGoogle Scholar
  24. 24.
    A. N. Tikhonov and V. Ya. Arsenin, Methods of Solving Ill-Posed Problems [in Russian], Nauka, Moscow (1986).Google Scholar
  25. 25.
    F. P. Vasil’ev, Methods to Solve Extreme Problems. Minimization Problems in Functional Spaces, Regularization, and Approximation [in Russian], Nauka, Moscow (1981).Google Scholar
  26. 26.
    G. Wahba, “Spline models for observational data,” CBMS-NSF Regional Conference Series in Applied Mathematics, 59, SIAM, Philadelphia, PA (1990).Google Scholar
  27. 27.
    M. A. Keyzer, “Rule-based and support vector (SV-) regression/classification algorithms for joint processing of census, map, survey and district data,” in: Working Paper WP-05-01, Centre for World Food Studies, Amsterdam (http://www.sow.vu.nl/pdf/wp05.01.pdf) (2005).
  28. 28.
    R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer, Berlin (1998).MATHCrossRefGoogle Scholar
  29. 29.
    O. Bousquet and A. Elisseeff, “Stability and generalization,” J. Mach. Learn. Res., No. 2, 499–526 (2002).Google Scholar
  30. 30.
    S. Smale and D. X. Zhou, “Shannon sampling. II: Connections to learning theory,” Appl. Comput. Harmon. Anal., 19, No. 3, 285–302 (2005).MATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    E. De Vito, A. Caponnetto, and L. Rosasco, “Model selection for regularized least-squares algorithm in learning theory,” Found. Comput. Math., 5, No. 1, 59–85 (2005).MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    V. I. Norkin and M. A. Keyzer, “On convergence of kernel learning estimators,” in: L. Sakalauskas, O. W. Weber, and E. K. Zavadskas (eds.), Proc. 20th EURO Mini Conf. on Continuous Optimization and Knowledge-Based Technologies (EUROPT-2008), Inst. Math. and Inform., Vilnius (2008), pp. 306–310.Google Scholar
  33. 33.
    V. I. Norkin and M. A. Keyzer, “Asymptotic efficiency of kernel support vector machines (SVM),” Cybern. Syst. Analysis, 45, No. 4, 575–588 (2009).CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2009

Authors and Affiliations

  1. 1.V. M. Glushkov Institute of CyberneticsNational Academy of Sciences of UkraineKyivUkraine
  2. 2.Centre for World Food StudiesVrije UniversiteitAmsterdamthe Netherlands

Personalised recommendations