Advertisement

Figure-Ground Segmentation—Object-Based

  • Bastian Leibe

Abstract

Tracking with a moving camera is a challenging task due to the combined effects of scene activity and egomotion. As there is no longer a static image background from which moving objects can easily be distinguished, dedicated effort must be spent on detecting objects of interest in the input images and on determining their precise extent. In recent years, there has been considerable progress in the development of approaches that apply object detection and class-specific segmentation in order to facilitate tracking under such circumstances (“tracking-by-detection”). In this chapter, we will give an overview of the main concepts and techniques used in such tracking-by-detection systems. In detail, the chapter will present fundamental techniques and current state-of-the-art approaches for performing object detection, for obtaining detailed object segmentations from single images based on top–down and bottom–up cues, and for propagating this information over time.

Keywords

Object Detection Appearance Model Conditional Random Field Window Location Pedestrian Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

Bastian Leibe’s research has been funded, in parts, by the EU project EUROPA (ICT-2008-231888) and by the UMIC Cluster of Excellence (DFG EXC 89).

References

  1. 1.
    Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Bull. Calcutta Math. Soc. 9(7), 1545–1588 (1997) Google Scholar
  2. 2.
    Andriluka, M., Roth, S., Schiele, B.: People tracking-by-detection and people detection-by-tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  3. 3.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3D pose estimation and tracking by detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  4. 4.
    Avidan, S.: Ensemble tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) Google Scholar
  5. 5.
    Bansal, M., Jung, S.-H., Matei, B., Eledath, J., Sawhney, H.: A real-time pedestrian detection system based on structure and appearance classification. In: IEEE International Conference on Robotics and Automation (2010) Google Scholar
  6. 6.
    Barinova, O., Lempitsky, V., Kohli, P.: On the detection of multiple object instances using hough transforms. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  7. 7.
    Bibby, C., Reid, I.: Robust real-time visual tracking using pixel-wise posteriors. In: European Conference on Computer Vision (2008) Google Scholar
  8. 8.
    Breitenstein, M., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Robust tracking-by-detection using a detector confidence particle filter. In: International Conference on Computer Vision (2009) Google Scholar
  9. 9.
    Breitenstein, M.D., Reichlin, F., Leibe, B., Meier, E.K., Van Gool, L.: Online multi-person tracking-by-detection from a single, uncalibrated camera. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1820–1833 (2011) Google Scholar
  10. 10.
    Collins, R.: Mean-shift blob tracking through scale space. In: IEEE Conference on Computer Vision and Pattern Recognition (2003) Google Scholar
  11. 11.
    Comaniciu, D., Ramesh, V., Meer, P.: The variable bandwidth mean shift and data-driven scale selection. In: International Conference on Computer Vision (2001) Google Scholar
  12. 12.
    Cremers, D., Rousson, M., Deriche, R.: A review of statistical approaches to level set segmentation integrating color, texture, motion and shape. Int. J. Comput. Vis. 72, 195–215 (2007) CrossRefGoogle Scholar
  13. 13.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) Google Scholar
  14. 14.
    Dollar, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: British Machine Vision Conference (2010) Google Scholar
  15. 15.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: A benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Google Scholar
  16. 16.
    Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  17. 17.
    Ess, A., Leibe, B., Schindler, K., Van Gool, L.: Robust multi-person tracking from a mobile platform. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1831–1846 (2009) CrossRefGoogle Scholar
  18. 18.
    Ess, A., Mueller, T., Grabner, H., van Gool, L.: Segmentation-based urban traffic scene understanding. In: British Machine Vision Conference (2009) Google Scholar
  19. 19.
    Ess, A., Schindler, K., Leibe, B., Van Gool, L.: Object detection and tracking for autonomous navigation in dynamic environments. Int. J. Robot. Res. 29(14) (2010) Google Scholar
  20. 20.
    Everingham, M., et al.: The 2005 pascal visual object class challenge. In: Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment. LNAI, vol. 3944. Springer, Berlin (2006) Google Scholar
  21. 21.
    Everingham, M., Sivic, J., Zisserman, A.: “Hello! My name is…Buffy”—Automatic naming of characters in TV video. In: British Machine Vision Conference (2006) Google Scholar
  22. 22.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010) CrossRefGoogle Scholar
  23. 23.
    Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1) (2005) Google Scholar
  24. 24.
    Felzenszwalb, P., Girshick, R., McAllester, D.: Cascade object detection with deformable part models. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  25. 25.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  26. 26.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9) (2010) Google Scholar
  27. 27.
    Ferrari, V., Marin, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  28. 28.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: European Conference on Computational Learning Theory, pp. 23–37 (1995) Google Scholar
  29. 29.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Google Scholar
  30. 30.
    Gammeter, S., Ess, A., Jaeggli, T., Schindler, K., Leibe, B., Van Gool, L.: Articulated multi-body tracking under egomotion. In: European Conference on Computer Vision (2008) Google Scholar
  31. 31.
    Grabner, H., Bischof, H.: On-line boosting and vision. In: IEEE Conference on Computer Vision and Pattern Recognition (2006) Google Scholar
  32. 32.
    Heisele, B., Serre, T., Pontil, M., Poggio, T.: Component-based face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 657–662 (2001) Google Scholar
  33. 33.
    Jurie, F., Dhome, M.: Real time 3D template matching. In: IEEE Conference on Computer Vision and Pattern Recognition (2001) Google Scholar
  34. 34.
    Kalal, Z., Matas, J., Mikolajczyk, K.: P–N learning: Bootstrapping binary classifiers by structural constraints. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  35. 35.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Face-TLD: Tracking-learning-detection applied to faces. In: International Conference on Image Processing (2010) Google Scholar
  36. 36.
    Kumar, M.P., Torr, P.H.S., Zisserman, A.: OBJ CUT. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) Google Scholar
  37. 37.
    Kuo, C.-H., Huang, C., Nevatia, R.: Multi-target tracking by on-line learned discriminative appearance models. In: IEEE Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  38. 38.
    Ladickỳ, L., Russell, C., Kohli, P., Torr, P.: Associative hierarchical crfs for object class image segmentation. In: International Conference on Computer Vision (2009) Google Scholar
  39. 39.
    Ladickỳ, L., Sturgess, P., Alahari, K., Russell, C., Torr, P.: What, where and how many? Combining object detectors and CRFs. In: European Conference on Computer Vision (2010) Google Scholar
  40. 40.
    Larlus, D., Verbeek, J., Jurie, F.: Category level object segmentation by combining bag-of-words models and Markov random fields. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  41. 41.
    Leibe, B., Cornelis, N., Cornelis, K., Van Gool, L.: Dynamic 3D scene analysis from a moving vehicle. In: IEEE Conference on Computer Vision and Pattern Recognition (2007) Google Scholar
  42. 42.
    Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV’04 Workshop on Statistical Learning in Computer Vision (2004) Google Scholar
  43. 43.
    Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. Int. J. Comput. Vis. 77(1–3), 259–289 (2008) CrossRefGoogle Scholar
  44. 44.
    Leibe, B., Schiele, B.: Interleaved object categorization and segmentation. In: British Machine Vision Conference (2003) Google Scholar
  45. 45.
    Leibe, B., Schindler, K., Cornelis, N., Van Gool, L.: Coupled object detection and tracking from static cameras and moving vehicles. IEEE Trans. Pattern Anal. Mach. Intell. 30(10) (2008) Google Scholar
  46. 46.
    Leibe, B., Schindler, K., Van Gool, L.: Coupled object detection and tracking from static cameras and moving vehicles. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1683–1698 (2008) CrossRefGoogle Scholar
  47. 47.
    Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) Google Scholar
  48. 48.
    Leistner, C., Saffari, A., Bischof, H.: MIForests: Multiple-instance learning with randomized trees. In: European Conference on Computer Vision (2010) Google Scholar
  49. 49.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) CrossRefGoogle Scholar
  50. 50.
    Maji, S., Malik, J.: Object detection using a max-margin hough transform. In: IEEE Conference on Computer Vision and Pattern Recognition (2009) Google Scholar
  51. 51.
    Mikolajczyk, C., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust part detectors. In: European Conference on Computer Vision (2004) Google Scholar
  52. 52.
    Mitzel, D., Horbert, E., Ess, A., Leibe, B.: Multi-person tracking with sparse detection and continuous segmentation. In: European Conference on Computer Vision (2010) Google Scholar
  53. 53.
    Mohan, A., Papageorgiou, C., Poggio, T.: Example-based object detection in images by components. IEEE Trans. Pattern Anal. Mach. Intell. 23(4), 349–361 (2001) CrossRefGoogle Scholar
  54. 54.
    Osuna, E., Freund, R., Girosi, F.: Training support vector machines: An application to face detection. In: IEEE Conference on Computer Vision and Pattern Recognition (1997) Google Scholar
  55. 55.
    Papageorgiou, C., Poggio, T.: A trainable system for object detection. Int. J. Comput. Vis. 38(1), 15–33 (2000) MATHCrossRefGoogle Scholar
  56. 56.
    Prisacariu, V.A., Reid, I.D.: fastHOG—A real-time Gpu implementation of hog. Technical Report 2310/09, Dept. of Eng. Sc., Univ. of Oxford (2009) Google Scholar
  57. 57.
    Razavi, N., Gall, J., Van Gool, L.: Backprojection revisited: Scalable multi-view object detection and similarity metrics for detections. In: European Conference on Computer Vision (2010) Google Scholar
  58. 58.
    Rematas, K.: Efficient multi-view object detection and segmentation. Diploma Thesis, Mobile Multimedia Processing group, RWTH Aachen University (2009) Google Scholar
  59. 59.
    Ren, X.: Finding people in archive films through tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  60. 60.
    Rother, C., Kolmogorov, V., Blake, A.: Grabcut: Interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH (2004) Google Scholar
  61. 61.
    Shotton, J., Johnson, M., Cipolla, R.: TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: European Conference on Computer Vision (2006) Google Scholar
  62. 62.
    Sorokin, A., Forsyth, D.: Utility data annotation with Amazon Mechanical Turk. In: Workshop on Internet Vision (2008) Google Scholar
  63. 63.
    Stalder, S., Grabner, H., Van Gool, L.: Cascaded confidence filter for improved tracking-by-detection. In: European Conference on Computer Vision (2010) Google Scholar
  64. 64.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Van Gool, L.: Shape-from-recognition: Recognition enables meta-data transfer. Comput. Vis. Image Underst. 113(12), 1222–1234 (2009) CrossRefGoogle Scholar
  65. 65.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Van Gool, L.: Using multi-view recognition and meta-data annotation to guide a robot’s attention. Int. J. Robot. Res. 28(8) (2009) Google Scholar
  66. 66.
    Tu, Z., Chen, X., Yuille, A.L., Zhu, S.-C.: Image parsing: Unifying segmentation, detection, and recognition. In: International Conference on Computer Vision (2003) Google Scholar
  67. 67.
    Tuzel, O., Porikli, F., Meer, P.: Human detection via classification on Riemannian manifolds. In: IEEE Conference on Computer Vision and Pattern Recognition (2007) Google Scholar
  68. 68.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) MATHGoogle Scholar
  69. 69.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: International Conference on Computer Vision (2009) Google Scholar
  70. 70.
    Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2) (2004) Google Scholar
  71. 71.
    Wojek, C., Dorko, G., Schulz, A., Schiele, B.: Sliding windows for rapid object class localization: A parallel technique. In: DAGM Annual Pattern Recognition Symposium (2008) Google Scholar
  72. 72.
    Wojek, C., Schiele, B.: A dynamic conditional random field model for joint labeling of object and scene classes. In: European Conference on Computer Vision (2008) Google Scholar
  73. 73.
    Wu, B., Nevatia, R.: Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: International Conference on Computer Vision (2005) Google Scholar
  74. 74.
    Wu, B., Nevatia, R.: Tracking of multiple, partially occluded humans based on static body part detections. In: IEEE Conference on Computer Vision and Pattern Recognition (2006) Google Scholar
  75. 75.
    Wu, B., Nevatia, R.: Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet part detectors. Int. J. Comput. Vis. 75(2), 247–266 (2007) CrossRefGoogle Scholar
  76. 76.
    Zhang, L., Nevatia, R.: Efficient scan-window based object detection using GPGPU. In: CVPR’08 CVGPU Workshop (2008) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.UMIC Research CentreRWTH Aachen UniversityAachenGermany

Personalised recommendations