Maximizing the Margin with Boosting

  • Gunnar Rätsch
  • Manfred K. Warmuth
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2375)


AdaBoost produces a linear combination of weak hypotheses. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current linear combination, i.e. by a hyperplane in feature space spanned by the weak hypotheses. The improvement is attributed to the experimental observation that the distances (margins) of the examples to the separating hyperplane are increasing even when the training error is already zero, that is all examples are on the correct side of the hyperplane. We give an iterative version of AdaBoost that explicitly maximizes the minimum margin of the examples. We bound the number of iterations and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision. Our modified algorithm essentially retains the exponential convergence properties of AdaBoost and our result does not depend on the size of the hypothesis class.


Base Learner Training Error Generalization Error Maximum Margin Base Hypothesis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    K.P. Bennett, A. Demiriz, and J. Shawe-Taylor. A column generation algorithm for boosting. In P. Langley, editor, Proceedings, 17th ICML, pages 65–72, San Francisco, 2000.Google Scholar
  2. 2.
    L. Breiman. Prediction games and arcing algorithms. Neural Computation, 11(7):1493–1518, 1999. Also Technical Report 504, Statistics Dept., University of California Berkeley.CrossRefGoogle Scholar
  3. 3.
    Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, September 1995.Google Scholar
  4. 4.
    Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning, pages 148–146. Morgan Kaufmann, 1996.Google Scholar
  5. 5.
    Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Y. Freund and R.E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    A.J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In Proc. of the Fifteenth National Conference on Artifical Intelligence, 1998.Google Scholar
  8. 8.
    R. Hettich and K.O. Kortanek. Semi-infinite programming: Theory, methods and applications. SIAM Review, 3:380–429, September 1993.Google Scholar
  9. 9.
    J. Kivinen and M. Warmuth. Boosting as entropy projection. In Proc. 12th Annu. Conference on Comput. Learning Theory, pages 134–144. ACM Press, New York, NY, 1999.Google Scholar
  10. 10.
    V. Koltchinskii, D. Panchenko, and F. Lozano. Some new bounds on the generalization error of combined classifiers. In Advances in Neural Inf. Proc. Systems, volume 13, 2001.Google Scholar
  11. 11.
    O.L. Mangasarian. Arbitrary-norm separating plane. Op. Res. Letters, 24(1):15–23, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    S. Nash and A. Sofer. Linear and Nonlinear Programming. McGraw-Hill, New York, 1996.Google Scholar
  13. 13.
    J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992.Google Scholar
  14. 14.
    J.R. Quinlan. Boosting first-order learning. Lecture Notes in Comp. Sci., 1160:143, 1996.Google Scholar
  15. 15.
    G. Rätsch. Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, October 2001.
  16. 16.
    G. Rätsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in infinite and finite hypothesis spaces. Machine Learning, 48(1–3):193–221, 2002. Special Issue on New Methods for Model Selection and Model Combination. Also NeuroCOLT2 Technical Report 2000-085.Google Scholar
  17. 17.
    G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for AdaBoost. Machine Learning, 42(3):287–320, March 2001. also NeuroCOLT Technical Report NC-TR-1998-021.Google Scholar
  18. 18.
    R.E. Schapire. The Design and Analysis of Efficient Learning Algorithms. PhD thesis, MIT Press, 1992.Google Scholar
  19. 19.
    R.E. Schapire, Y. Freund, P.L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26(5):1651 ff., 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    R.E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, December 1999. also Proceedings of the 14th Workshop on Computational Learning Theory 1998, pages 80–91.Google Scholar
  21. 21.
    L.G. Valiant. A theory of the learnable. Comm. of the ACM, 27(11):1134–1142, 1984.zbMATHCrossRefGoogle Scholar
  22. 22.
    J. von Neumann. Zur Theorie der Gesellschaftsspiele. Math. Ann., 100:295–320, 1928.CrossRefMathSciNetzbMATHGoogle Scholar
  23. 23.
    T. Zhang. Sequential greedy approximation for certain convex optimization problems. Technical report, IBM T.J. Watson Research Center, 2002.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Gunnar Rätsch
    • 1
  • Manfred K. Warmuth
    • 2
  1. 1.RSISEAustralian National UniversityCanberraAustralia
  2. 2.University of California at Santa CruzSanta CruzUSA

Personalised recommendations