What, Where and How Many? Combining Object Detectors and CRFs

  • Ľubor Ladický
  • Paul Sturgess
  • Karteek Alahari
  • Chris Russell
  • Philip H. S. Torr
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6314)


Computer vision algorithms for individual tasks such as object recognition, detection and segmentation have shown impressive results in the recent past. The next challenge is to integrate all these algorithms and address the problem of scene understanding. This paper is a step towards this goal. We present a probabilistic framework for reasoning about regions, objects, and their attributes such as object class, location, and spatial extent. Our model is a Conditional Random Field defined on pixels, segments and objects. We define a global energy function for the model, which combines results from sliding window detectors, and low-level pixel-based unary and pairwise relations. One of our primary contributions is to show that this energy function can be solved efficiently. Experimental results show that our model achieves significant improvement over the baseline methods on CamVid and pascal voc datasets.


Object Detection Conditional Random Field False Positive Detection Conditional Random Field Model Scene Understanding 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Barrow, H.G., Tenenbaum, J.M.: Computational vision. IEEE 69, 572–595 (1981)CrossRefGoogle Scholar
  2. 2.
    Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23, 1222–1239 (2001)Google Scholar
  3. 3.
    Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  5. 5.
    Hoiem, D., Efros, A., Hebert, M.: Closing the loop on scene interpretation. In: CVPR (2008)Google Scholar
  6. 6.
    Gould, S., Gao, T., Koller, D.: Region-based segmentation and object detection. In: NIPS (2009)Google Scholar
  7. 7.
    Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: CVPR (2009)Google Scholar
  8. 8.
    Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Everingham, M., et al.: The PASCAL Visual Object Classes Challenge (VOC) Results (2009)Google Scholar
  10. 10.
    Adelson, E.H.: On seeing stuff: the perception of materials by humans and machines. In: SPIE, vol. 4299, pp. 1–12 (2001)Google Scholar
  11. 11.
    Forsyth, D.A., et al.: Finding pictures of objects in large collections of images. In: Buxton, B.F., Cipolla, R. (eds.) ECCV 1996. LNCS, Part II, vol. 1065, pp. 335–360. Springer, Heidelberg (1996)Google Scholar
  12. 12.
    Heitz, G., Koller, D.: Learning spatial context: Using stuff to find things. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 30–43. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV (2007)Google Scholar
  14. 14.
    Torralba, A., Murphy, K., Freeman, W.T.: Sharing features: Efficient boosting procedures for multiclass object detection. In: CVPR, vol. 2, pp. 762–769 (2004)Google Scholar
  15. 15.
    Tu, Z., et al.: Image parsing: Unifying segmentation, detection, and recognition. IJCV (2005)Google Scholar
  16. 16.
    Ladicky, L., Russell, C., Kohli, P., Torr, P.H.S.: Associative hierarchical crfs for object class image segmentation. In: ICCV (2009)Google Scholar
  17. 17.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: ICCV (2009)Google Scholar
  18. 18.
    Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 733–747. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Larlus, D., Jurie, F.: Combining appearance models and markov random fields for category level object segmentation. In: CVPR (2008)Google Scholar
  20. 20.
    Gu, C., Lim, J., Arbelaez, P., Malik, J.: Recognition using regions. In: CVPR (2009)Google Scholar
  21. 21.
    Winn, J., Shotton, J.: The layout consistent random field for recognizing and segmenting partially occluded objects. In: CVPR (2006)Google Scholar
  22. 22.
    Boykov, Y., Jolly, M.-P.: Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In: ICCV, vol. 1, pp. 105–112 (2001)Google Scholar
  23. 23.
    Kohli, P., Ladicky, L., Torr, P.H.S.: Robust higher order potentials for enforcing label consistency. In: CVPR (2008)Google Scholar
  24. 24.
    He, X., Zemel, R.S., Carreira-Perpiñán, M.Á.: Learning and incorporating top-down cues in image segmentation. In: CVPR, vol. 2, pp. 695–702 (2004)Google Scholar
  25. 25.
    Yang, L., Meer, P., Foran, D.J.: Multiple class segmentation using a unified framework over mean-shift patches. In: CVPR (2007)Google Scholar
  26. 26.
    Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space. PAMI (2002)Google Scholar
  27. 27.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22, 888–905 (2000)Google Scholar
  28. 28.
    Rother, C., Kolmogorov, V., Blake, A.: GrabCut. In: SIGGRAPH, pp. 309–314 (2004)Google Scholar
  29. 29.
    Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI 26, 1124–1137 (2004)Google Scholar
  30. 30.
    Felzenszwalb, P., Huttenlocher, D.: Efficient belief propagation for early vision. In: CVPR (2004)Google Scholar
  31. 31.
    Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. PAMI 28, 1568–1583 (2006)Google Scholar
  32. 32.
    Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: BMVC (2009)Google Scholar
  33. 33.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)Google Scholar
  34. 34.
    Li, F., Carreira, J., Sminchisescu, C.: Object recognition as ranking holistic figure-ground hypotheses. In: CVPR (2010)Google Scholar
  35. 35.
    Gonfaus, J.M., Boix, X., van de Weijer, J., Bagdanov, A.D., Serrat, J., Gonzalez, J.: Harmony potentials for joint classification and segmentation. In: CVPR (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ľubor Ladický
    • 1
  • Paul Sturgess
    • 1
  • Karteek Alahari
    • 1
  • Chris Russell
    • 1
  • Philip H. S. Torr
    • 1
  1. 1.Oxford Brookes University 

Personalised recommendations