Learning Diverse Models: The Coulomb Structured Support Vector Machine

  • Martin Schiegg
  • Ferran Diego
  • Fred A. HamprechtEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9907)


In structured prediction, it is standard procedure to discriminatively train a single model that is then used to make a single prediction for each input. This practice is simple but risky in many ways. For instance, models are often designed with tractability rather than faithfulness in mind. To hedge against such model misspecification, it may be useful to train multiple models that all are a reasonable fit to the training data, but at least one of which may hopefully make more valid predictions than the single model in standard procedure. We propose the Coulomb Structured SVM (CSSVM) as a means to obtain at training time a full ensemble of different models. At test time, these models can run in parallel and independently to make diverse predictions. We demonstrate on challenging tasks from computer vision that some of these diverse predictions have significantly lower task loss than that of a single model, and improve over state-of-the-art diversity encouraging approaches.


Structured output learning Diverse predictions Multiple output learning Structured support vector machine 



We would like to thank Abner Guzman-Rivera for making the (Div)MCL source code available.


  1. 1.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2005)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Nowozin, S., Lampert, C.H.: Structured learning and prediction in computer vision. Found. Trends Comput. Graph. Vis. 6(3–4), 185–365 (2011)zbMATHGoogle Scholar
  3. 3.
    Yanover, C., Weiss, Y.: Finding the M most probable configurations in arbitrary graphical models. In: NIPS 2003, pp. 289–296 (2003)Google Scholar
  4. 4.
    Papandreou, G., Yuille, A.L.: Perturb-and-map random fields: using discrete optimization to learn and sample from energy models. In: ICCV (2011)Google Scholar
  5. 5.
    Batra, D., Yadollahpour, P., Guzman-Rivera, A., Shakhnarovich, G.: Diverse M-best solutions in Markov random fields. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 1–16. Springer, Heidelberg (2012)Google Scholar
  6. 6.
    Guzman-Rivera, A., Batra, D., Kohli, P.: Multiple choice learning: learning to produce multiple structured outputs. In: NIPS, pp. 1808–1816 (2012)Google Scholar
  7. 7.
    Guzman-Rivera, A., Kohli, P., Batra, D., Rutenbar, R.A.: Efficiently enforcing diversity in multi-output structured prediction. In: AISTATS (2014)Google Scholar
  8. 8.
    Gane, A., Hazan, T., Jaakkola, T.: Learning with maximum a-posteriori perturbation models. In: AISTATS, pp. 247–256 (2014)Google Scholar
  9. 9.
    Yadollahpour, P., Batra, D., Shakhnarovich, G.: Discriminative re-ranking of diverse segmentations. In: CVPR (2013)Google Scholar
  10. 10.
    Gimpel, K., Batra, D., Dyer, C., Shakhnarovich, G.: A systematic exploration of diversity in machine translation. In: EMNLP (2013)Google Scholar
  11. 11.
    Roig, G., Boix, X., de Nijs, R., Ramos, S., Kühnlenz, K., Van Gool, L.: Active MAP inference in CRFs for efficient semantic segmentation. In: ICCV (2013)Google Scholar
  12. 12.
    Maji, S., Hazan, T., Jaakkola, T.: Active boundary annotation using random map perturbations. In: AISTATS (2014)Google Scholar
  13. 13.
    Premachandran, V., Tarlow, D., Batra, D.: Empirical minimum bayes risk prediction: how to extract an extra few % performance from vision models with just three more parameters. In: CVPR (2014)Google Scholar
  14. 14.
    Kirillov, A., Savchynskyy, B., Schlesinger, D., Vetrov, D., Rother, C.: Inferring m-best diverse labelings in a single one. In: ICCV, pp. 1814–1822 (2015)Google Scholar
  15. 15.
    Hazan, T., Maji, S., Jaakkola, T.: On sampling from the Gibbs distribution with random maximum a-posteriori perturbations. In: NIPS, pp. 1268–1276 (2013)Google Scholar
  16. 16.
    Chen, C., Kolmogorov, V., Zhu, Y., Metaxas, D., Lampert, C.: Computing the M most probable modes of a graphical model. In: AISTATS, pp. 161–169 (2013)Google Scholar
  17. 17.
    Chen, C., Liu, H., Metaxas, D., Zhao, T.: Mode estimation for high dimensional discrete tree graphical models. In: NIPS, pp. 1323–1331 (2014)Google Scholar
  18. 18.
    Kulesza, A., Taskar, B.: Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083 (2012)Google Scholar
  19. 19.
    Lucchi, A., Li, Y., Smith, K., Fua, P.: Structured image segmentation using kernelized features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 400–413. Springer, Heidelberg (2012)Google Scholar
  20. 20.
    Lou, X., Hamprecht, F.A.: Structured learning for cell tracking. In: NIPS (2011)Google Scholar
  21. 21.
    Li, Y.F., Zhou, Z.H.: Towards making unlabeled data never hurt. IEEE Trans. PAMI 37(1), 175–188 (2015)CrossRefGoogle Scholar
  22. 22.
    Lampert, C.H.: Maximum margin multi-label structured prediction. In: NIPS, pp. 289–297 (2011)Google Scholar
  23. 23.
    Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. EC–14(3), 326–334 (1965)CrossRefzbMATHGoogle Scholar
  24. 24.
    Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc., New York (1997)zbMATHGoogle Scholar
  25. 25.
    Herbrich, R., Graepel, T., Williamson, R.C.: The structure of version space. Technical report MSR-TR-2004-63, Microsoft Research, July 2004Google Scholar
  26. 26.
    Herbrich, R., Graepel, T., Campbell, C.: Bayes point machines. JMLR 1, 245–279 (2001)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Graepel, T., Herbrich, R.: The kernel Gibbs sampler. In: NIPS, pp. 514–520 (2001)Google Scholar
  28. 28.
    Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4, 409–423 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Hardin, R., Sloane, N.: A new approach to the construction of optimal designs. J. Stat. Plann. Infer. 37(3), 339–369 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Conway, J.H., Sloane, N.J.A.: Sphere-packings, Lattices, and Groups. Springer, New York (1987)zbMATHGoogle Scholar
  31. 31.
    Saff, E.B., Kuijlaars, A.B.: Distributing many points on a sphere. Math. Intell. 19(1), 5–11 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Katanforoush, A., Shahshahani, M.: Distributing points on the sphere, I. Exp. Math. 12(2), 199–209 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Claxton, T., Benson, G.: Stereochemistry and seven coordination. Can. J. Chem. 44(2), 157–163 (1966)CrossRefGoogle Scholar
  34. 34.
    Erber, T., Hockney, G.: Equilibrium configurations of n equal charges on a sphere. J. Phys. A: Math. Gen. 24(23), L1369 (1991)CrossRefGoogle Scholar
  35. 35.
    Lakhbab, H., EL Bernoussi, S., EL Harif, A.: Energy minimization of point charges on a sphere with a spectral projected gradient method. Int. J. Sci. Eng. Res. 3(5) (2012)Google Scholar
  36. 36.
    Neubauer, S., Watkins, Z.: An algorithm for finding potential minimizing configurations of points on a sphere (1998). Accessed 30 Aug 2016
  37. 37.
    Hochreiter, S., Mozer, M.C., Obermayer, K.: Coulomb classifiers: generalizing support vector machines via an analogy to electrostatic systems. In: NIPS, pp. 561–568 (2003)Google Scholar
  38. 38.
    Ratliff, N.D., Bagnell, J.A., Zinkevich, M.A.: (Online) Subgradient methods for structured prediction. In: AISTATS (2007)Google Scholar
  39. 39.
    Prasad, A., Jegelka, S., Batra, D.: Submodular meets structured: finding diverse subsets in exponentially-large structured item sets. In: NIPS, pp. 2645–2653 (2014)Google Scholar
  40. 40.
    Batra, D., Kowdle, A., Parikh, D., Luo, J., Chen, T.: iCoseg: interactive co-segmentation with intelligent scribble guidance. In: CVPR (2010)Google Scholar
  41. 41.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge. Int. J. Comput. Vis. 88, 303–338 (2010)CrossRefGoogle Scholar
  42. 42.
    Tsai, Y.H., Yang, J., Yang, M.H.: Decomposed learning for joint object segmentation and categorization. In: BMVC (2013)Google Scholar
  43. 43.
    Lee, T., Fidler, S., Dickinson, S.: Learning to combine mid-level cues for object proposal generation. In: CVPR, pp. 1680–1688 (2015)Google Scholar
  44. 44.
    Wang, S., Fidler, S., Urtasun, R.: Lost shopping! monocular localization in large indoor spaces. In: ICCV (2015)Google Scholar
  45. 45.
    Müller, A.C., Behnke, S.: PyStruct - learning structured prediction in python. JMLR 15, 2055–2060 (2014)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. PAMI 28(10), 1568–1583 (2006)CrossRefGoogle Scholar
  47. 47.
    Rother, C., Minka, T., Blake, A., Kolmogorov, V.: Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFs. In: CVPR, pp. 993–1000 (2006)Google Scholar
  48. 48.
    Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: NIPS (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Martin Schiegg
    • 1
    • 2
  • Ferran Diego
    • 1
  • Fred A. Hamprecht
    • 1
    Email author
  1. 1.University of Heidelberg, IWR/HCIHeidelbergGermany
  2. 2.Robert Bosch GmbHStuttgartGermany

Personalised recommendations