Bundle Methods for Structured Output Learning — Back to the Roots

  • Michal Uřičář
  • Vojtěch Franc
  • Václav Hlaváč
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7944)


Discriminative methods for learning structured output classifiers have been gaining popularity in recent years due to their successful applications in fields like computer vision, natural language processing, etc. Learning of the structured output classifiers leads to solving a convex minimization problem, still hard to solve by standard algorithms in real-life settings. A significant effort has been put to development of specialized solvers among which the Bundle Method for Risk Minimization (BMRM) [1] is one of the most successful. The BMRM is a simplified variant of bundle methods well known in the filed of non-smooth optimization. In this paper, we propose two speed-up improvements of the BMRM: i) using the adaptive prox-term known from the original bundle methods, ii) starting optimization from a non-trivial initial solution. We combine both improvements with the multiple cutting plane model approximation [2]. Experiments on real-life data show consistently faster convergence achieving speedup up to factor of 9.7.


Structured Output Learning Bundle Methods Risk Minimization Structured Output SVM 


  1. 1.
    Teo, C., Vishwanathan, S., Smola, A., Quoc, V.: Bundle Methods for Regularized Risk Minimization. Journal of Machine Learning Research 11, 311–365 (2010)zbMATHGoogle Scholar
  2. 2.
    Uřičář, M., Franc, V.: Efficient Algorithm for Regularized Risk Minimization. In: CVWW 2012: Proceedings of the 17th Computer Vision Winter Workshop, pp. 57–64 (February 2012)Google Scholar
  3. 3.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research 6, 1453–1484 (2005)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
  5. 5.
    Bordes, A., Bottou, L., Gallinari, P.: SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent. Journal of Machine Learning Research 10, 1737–1754 (2009)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In: Proceedings of International Conference on Machine Learning (ICML), pp. 807–814. ACM Press (2007)Google Scholar
  7. 7.
    Joachims, T., Finley, T., Yu, C.N.: Cutting-Plane Training of Structural SVMs. Machine Learning 77(1), 27–59 (2009)zbMATHCrossRefGoogle Scholar
  8. 8.
    Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Mathematical Programming 69, 111–147 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Cheney, E., Goldstain, A.: Newton’s method for convex programming and Tchebytcheff approximation. Numerische Mathematick 1, 253–268 (1959)zbMATHCrossRefGoogle Scholar
  10. 10.
    Lemaréchal, C.: Nonsmooth optimization and descend methods. Technical report, IIASA, Laxenburg, Austria (1978)Google Scholar
  11. 11.
    Uřičář, M., Franc, V., Hlaváč, V.: Detector of facial landmarks learned by the structured output svm. In: VISAPP, vol. (1), pp. 547–556. SciTe Press (2012)Google Scholar
  12. 12.
    Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., Bona, F.D., Binder, A., Gehl, C., Franc, V.: The shogun machine learning toolbox. J. Mach. Learn. Res. 99, 1799–1802 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Michal Uřičář
    • 1
  • Vojtěch Franc
    • 1
  • Václav Hlaváč
    • 1
  1. 1.Center for Machine Perception, Department of Cybernetics, Faculty of Electrical EngineeringCzech Technical University in PraguePrague 6Czech Republic

Personalised recommendations