Problems of Information Transmission

, Volume 41, Issue 4, pp 368–384 | Cite as

Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

  • A. B. Juditsky
  • A. V. Nazin
  • A. B. Tsybakov
  • N. Vayatis
Methods of Signal Processing


We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ1-constraint. It is defined by a stochastic version of the mirror descent algorithm which performs descent of the gradient type in the dual space with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order \(C\sqrt {(\log M)/t}\) with an explicit and small constant factor C, where M is the dimension of the problem and t stands for the sample size. A similar bound is proved for a more general setting, which covers, in particular, the regression model with squared loss.


Regression Model General Setting System Theory Finite Number Decision Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schapire, R.E., The Strength of Weak Learnability, Machine Learning, 1990, vol. 5, no.2, pp. 197–227.Google Scholar
  2. 2.
    Freund, Y., Boosting a Weak Learning Algorithm by Majority, Inform. Comput., 1995, vol. 121, no.2, pp. 256–285.CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Schapire, R.E., Freund, Y., Bartlett, P.L., and Lee, W.S., Boosting the Margin: a New Explanation for the Effectiveness of Voting Methods, Ann. Statist., 1998, vol. 26, no.5, pp. 1651–1686.MathSciNetGoogle Scholar
  4. 4.
    Vapnik, V.N., Statistical Learning Theory, New York: Wiley, 1998.Google Scholar
  5. 5.
    Bartlett, P.L, Jordan, M.I., and McAuliffe, J.D., Convexity, Classification, and Risk Bounds, Tech. Report of Dept. Statist., Univ. of California, Berkeley, 2003, no. 638.Google Scholar
  6. 6.
    Lugosi, G. and Vayatis, N., On the Bayes-Risk Consistency of Regularized Boosting Methods (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 30–55.MathSciNetGoogle Scholar
  7. 7.
    Scovel, J.C. and Steinwart, I., Fast Rates for Support Vector Machines, Los Alamos National Lab. Tech. Report, 2003, no. LA-UR-03-9117.Google Scholar
  8. 8.
    Zhang, T., Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 56–85.MATHMathSciNetGoogle Scholar
  9. 9.
    Tsypkin, Ya.Z., Osnovy teorii obychayushchikhsya sistem, Moscow: Nauka, 1970. Translated under the title Foundations of the Theory of Learning Systems, New York: Academic, 1973.Google Scholar
  10. 10.
    Aizerman, M.A., Braverman, E.M., and Rozonoer, L.I., Metod potential'nykh funktsyi v teorii obycheniya mashin (Method of Potential Functions in the Theory of Learning Machines), Moscow: Nauka, 1970.Google Scholar
  11. 11.
    Aizerman, M., Braverman, E., and Rozonoer, L., Extrapolative Problems in Automatic Control and the Method of Potential Functions, Am. Math. Soc. Transl., 1970, vol. 87, pp. 281–303.Google Scholar
  12. 12.
    Devroye, L., Gyorfi, L., and Lugosi, G., A Probabilistic Theory of Pattern Recognition, New York: Springer, 1996.Google Scholar
  13. 13.
    Cesa-Bianchi, N., Conconi, A., and Gentile, C., A Second-Order Perceptron Algorithm, SIAM J. Comput., 2005, vol. 34, no.3, pp. 640–668.CrossRefMathSciNetGoogle Scholar
  14. 14.
    Kivinen, J., Smola, A.J., and Williamson, R.C., Online Learning with Kernels, IEEE Trans. Signal Process., 2004, vol. 52, no.8, pp. 2165–2176.CrossRefMathSciNetGoogle Scholar
  15. 15.
    Zhang, T., Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms, in Proc. 21st Int. Conf. on Machine Learning, Banff, Alberta, Canada, 2004 (ICML'04), New York: ACM, 2004, vol. 69, p. 116.Google Scholar
  16. 16.
    Polyak, B.T. and Juditsky, A.B., Acceleration of Stochastic Approximation by Averaging, SIAM J. Control Optim., 1992, vol. 30, no.4, pp. 838–855.CrossRefMathSciNetGoogle Scholar
  17. 17.
    Nemirovskii, A.S. and Yudin, D.B., Slozhnost' zadach i effektivnost' metodov optimizatsii, Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Efficiency in Optimization, Chichester: Wiley, 1983.Google Scholar
  18. 18.
    Ben-Tal, A., Margalit, T., and Nemirovski, A., The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography, SIAM J. Optim., 2001, vol. 12, no.1, pp. 79–108.CrossRefMathSciNetGoogle Scholar
  19. 19.
    Ben-Tal, A. and Nemirovski, A., The Conjugate Barrier Mirror Descent Method for Non-Smooth Convex Optimization, Preprint of the Faculty of Industr. Eng. Manag., Technion-Israel Inst. Technol., Haifa, 1999. Available at MD.pdf.Google Scholar
  20. 20.
    Kivinen, J. and Warmuth, M.K., Additive Versus Exponentiated Gradient Updates for Linear Prediction, Inform. Comput., 1997, vol. 132, no.1, pp. 1–64.CrossRefMathSciNetGoogle Scholar
  21. 21.
    Helmbold, D.P., Kivinen, J., and Warmuth, M.K., Relative Loss Bounds for Single Neurons, IEEE Trans. Neural Networks, 1999, vol. 10, no.6, pp. 1291–1304.CrossRefGoogle Scholar
  22. 22.
    Kivinen, J. and Warmuth, M.K., Relative Loss Bounds for Multidimensional Regression Problems, Machine Learning, 2001, vol. 45, no.3, pp. 301–329.CrossRefGoogle Scholar
  23. 23.
    Cesa-Bianchi, N. and Gentile, C., Improved Risk Tail Bounds for On-Line Algorithms, Neural Information Processing Systems, NIPS 2004 Workshop on (Ab)Use of Bounds, Whistler, BC, Canada, December 18, 2004. Available at∼cesabian/Pubblicazioni/iada.pdf.Google Scholar
  24. 24.
    Cesa-Bianchi, N., Conconi, A., and Gentile, C., On the Generalization Ability of On-Line Learning Algorithms, IEEE Trans. Inform. Theory, 2004, vol. 50, no.9, pp. 2050–2057.CrossRefMathSciNetGoogle Scholar
  25. 25.
    Juditsky, A. and Nemirovski, A., Functional Aggregation for Nonparametric Estimation, Ann. Statist., 2000, vol. 28, no.3, pp. 681–712.MathSciNetGoogle Scholar
  26. 26.
    Tsybakov, A., Optimal Rates of Aggregation, Computational Learning Theory and Kernel Machines, Scholkopf, B. and Warmuth, M., Eds., Lecture Notes in Artificial Intelligence, Heidelberg: Springer, 2003, vol. 2777, pp. 303–313.Google Scholar
  27. 27.
    Vapnik, V. and Chervonenkis, A., Teoriya raspoznavaniya obrazov, Moscow: Nauka, 1974. Translated under the title Theorie der Zeichenerkennung, Berlin: Akademie-Verlag, 1979.Google Scholar
  28. 28.
    Breiman, L., Arcing the Edge, Tech. Rep. of Statist. Dept., Univ. of California, Berkeley, 1997, no. 486.Google Scholar
  29. 29.
    Friedman, J., Hastie, T., and Tibshirani, R., Additive Logistic Regression: a Statistical View of Boosting (With Discussion and a Rejoinder by the Authors), Ann. Statist., 2000, vol. 28, no.2, pp. 337–407.CrossRefMathSciNetGoogle Scholar
  30. 30.
    Tsybakov, A., Optimal Aggregation of Classifiers in Statistical Learning, Ann. Statist., 2004, vol. 32, no.1, pp. 135–166.MATHMathSciNetGoogle Scholar
  31. 31.
    Tarigan, B. and van de Geer, S.A., Adaptivity of Support Vector Machines with ℓ1 Penalty, Tech. Rep. of Math. Inst., Univ. of Leiden, Leiden, 2004, no. MI 2004-14. Available at∼geer/svm4.pdf.Google Scholar
  32. 32.
    Rockafellar, R.T. and Wets, R.J.B., Variational Analysis, New York: Springer, 1998.Google Scholar
  33. 33.
    Kiwiel, K.C., Proximal Minimization Methods with Generalized Bregman Functions, SIAM J. Control Optim., 1997, vol. 35, no.4, pp. 1142–1168.CrossRefMATHMathSciNetGoogle Scholar
  34. 34.
    Beck, A. and Teboulle, M., Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization, Oper. Research Letters, 2003, vol. 31, no.3, pp. 167–175.MathSciNetGoogle Scholar
  35. 35.
    Polyak, B.T. and Tsypkin, Ya.Z., Criterial Algorithms of Stochastic Optimization, Avtom. Telemekh., 1984, no. 6. pp. 95–104 [Autom. Remote Contr. (Engl. Transl.), vol. 45, no. 6, part 2, pp. 766–774].Google Scholar
  36. 36.
    Vajda, I., Theory of Statistical Inference and Information, Dordrecht: Kluwer, 1986.Google Scholar

Copyright information

© MAIK "Nauka/Interperiodica" 2005

Authors and Affiliations

  • A. B. Juditsky
    • 1
  • A. V. Nazin
    • 2
  • A. B. Tsybakov
    • 3
  • N. Vayatis
    • 3
  1. 1.Laboratoire de Modelisation et CalculUniversite Grenoble IFrance
  2. 2.Institute of Control SciencesRASMoscowRussia
  3. 3.Laboratoire de Probabilites et Modeles AleatoiresUniversite Paris VIFrance

Personalised recommendations