A binary classification problem is reduced to the minimization of convex regularized empirical risk functionals in a reproducing kernel Hilbert space. The solution is searched for in the form of a finite linear combination of kernel support functions (Vapnik’s support vector machines). Risk estimates for a misclassification as a function of the training sample size and other model parameters are obtained.
Similar content being viewed by others
References
V. N. Vapnik, Statistical Learning Theory, Wiley, New York (1998).
L. Devroye, L. Gyorfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, Springer, New York (1996).
C. Stone, “Consistent nonparametric regression,” Ann. Statistics, 5, 595–645 (1977).
V. N. Vapnik and A. Ya. Chervonenkis, Pattern Recognition Theory. Statistical Problems of Learning [in Russian], Nauka, Moscow (1974).
V. N. Vapnik, Estimation of Dependences based on Empirical Data [in Russian], Nauka, Moscow (1979).
M. A. Aizerman, E. M. Braverman, and L. I. Rozonoer, Potential Function Method in Machine Learning Theory [in Russian], Nauka, Moscow (1970).
B. Schoelkopf and A. J. Smola, Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA (2002).
I. Steinwart and A. Christmann, Support Vector Machines, Springer, New York (2008).
S. Boucheron, O. Bousquet, and G. Lugosi, “Theory of classification: A survey of some recent advances,” ESAIM: Probability and Statistics, 9, 323–375 (2005).
M. I. Schlesinger and V. Hlavác, Ten Lectures on Statistical and Structural Pattern Recognition, Kluwer Acad. Publ. (2004).
L. Gyorfi, M. Kohler, A. Krzyzak, and H. Walk, A Distribution Free Theory of Nonparametric Regression, Springer, New York–Berlin–Heidelberg (2002).
A. M. Gupal, S. V. Pashko, and I. V. Sergienko, “Efficiency of Bayesian classification procedure,” Cybern. Syst. Analysis, 31, No. 4, 543–554 (1995).
I. V. Sergienko and A. M. Gupal, “Optimal pattern recognition procedures and their application,” Cybern. Syst. Analysis, 43, No. 6, 799–809 (2007).
A. M. Gupal and I. V. Sergienko, Optimal Pattern Recognition Procedures [in Russian], Naukova Dumka, Kyiv (2008).
T. Poggio and S. Smale, “The mathematics of learning: Dealing with data,” Notices Amer. Math. Soc., 50, No. 5, 537–544 (2003).
R. Koenker and G. W. Bassett, “Regression quantiles,” Econometrica, 46, 33–50 (1978).
R. Koenker, Quantile Regression, Cambridge Univ. Press, Cambridge–New York (2005).
Yu. M. Ermoliev and A. I. Yastremskii, Stochastic Models and Methods in Economic Planning [in Russian], Nauka, Moscow (1979).
Y. M. Ermoliev and G. Leonardi, “Some proposals for stochastic facility location models,” Math. Modelling, 3, 407–420 (1982).
A. Ruszczynski and A. Shapiro (eds.), Stochastic Programming, Vol. 10 of the Handbooks in Operation Research and Management Science, Elsevier, Amsterdam (2003).
F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bull. Amer. Math. Soc. (N.S.), 39, No. 1, 1–49 (2002).
N. Aronshain, “Theory of reproducing kernels,” Matematika, 7, No. 2, 67–130 (1963).
A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer Acad. Publ., Dordrecht–Boston–London (2004).
A. N. Tikhonov and V. Ya. Arsenin, Methods of Solving Ill-Posed Problems [in Russian], Nauka, Moscow (1986).
F. P. Vasil’ev, Methods to Solve Extreme Problems. Minimization Problems in Functional Spaces, Regularization, and Approximation [in Russian], Nauka, Moscow (1981).
G. Wahba, “Spline models for observational data,” CBMS-NSF Regional Conference Series in Applied Mathematics, 59, SIAM, Philadelphia, PA (1990).
M. A. Keyzer, “Rule-based and support vector (SV-) regression/classification algorithms for joint processing of census, map, survey and district data,” in: Working Paper WP-05-01, Centre for World Food Studies, Amsterdam (http://www.sow.vu.nl/pdf/wp05.01.pdf) (2005).
R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer, Berlin (1998).
O. Bousquet and A. Elisseeff, “Stability and generalization,” J. Mach. Learn. Res., No. 2, 499–526 (2002).
S. Smale and D. X. Zhou, “Shannon sampling. II: Connections to learning theory,” Appl. Comput. Harmon. Anal., 19, No. 3, 285–302 (2005).
E. De Vito, A. Caponnetto, and L. Rosasco, “Model selection for regularized least-squares algorithm in learning theory,” Found. Comput. Math., 5, No. 1, 59–85 (2005).
V. I. Norkin and M. A. Keyzer, “On convergence of kernel learning estimators,” in: L. Sakalauskas, O. W. Weber, and E. K. Zavadskas (eds.), Proc. 20th EURO Mini Conf. on Continuous Optimization and Knowledge-Based Technologies (EUROPT-2008), Inst. Math. and Inform., Vilnius (2008), pp. 306–310.
V. I. Norkin and M. A. Keyzer, “Asymptotic efficiency of kernel support vector machines (SVM),” Cybern. Syst. Analysis, 45, No. 4, 575–588 (2009).
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated from Kibernetika i Sistemnyi Analiz, No. 5, pp. 93–105, September–October 2009.
Rights and permissions
About this article
Cite this article
Norkin, V.I., Keyzer, M.A. Efficiency of classification methods based on empirical risk minimization. Cybern Syst Anal 45, 750–761 (2009). https://doi.org/10.1007/s10559-009-9153-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10559-009-9153-x