Advertisement

Part-Based Models for Finding People and Estimating Their Pose

  • Deva Ramanan

Abstract

This chapter will survey approaches to person detection and pose estimation with the use of part-based models. After a brief introduction/motivation for the need for parts, the bulk of the chapter will be split into three core sections on Representation, Inference, and Learning. We begin by describing various gradient-based and color descriptors for parts. We next focus on representations for encoding structural relations between parts, describing extensions of classic pictorial structures models to capture occlusion and appearance relations. We will use the formalism of probabilistic models to unify such representations and introduce the issues of inference and learning. We describe various efficient algorithms designed for tree-structured models, as well as focusing on discriminative formalisms for learning model parameters. We finally end with applications of pedestrian detection, human pose estimation, and people tracking.

Keywords

Part Model Inference Algorithm Pedestrian Detection Orientation Histogram Part Template 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This work has been supported by NSF Grant 0954083 and ONR-MURI Grant N00014-10-1-0933.

References

  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1014–1021 (2009) Google Scholar
  2. 2.
    Balan, A., Black, M.J.: The naked truth: Estimating body shape under clothing. In: European Conference on Computer Vision, pp. 15–29 (2008) Google Scholar
  3. 3.
    Binford, T.O.: Visual perception by computer. In: IEEE Conference on Systems and Control, vol. 313 (1971) Google Scholar
  4. 4.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1365–1372 (2010) Google Scholar
  5. 5.
    Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8–15 (1997) Google Scholar
  6. 6.
    Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language TV broadcasts. In: British Machine Vision Conference (2008) Google Scholar
  7. 7.
    Burl, M., Weber, M., Perona, P.: A probabilistic approach to object recognition using local photometry and global geometry. In: European Conference on Computer Vision, pp. 628–641 (1998) Google Scholar
  8. 8.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: European Conference on Computer Vision, vol. 2, pp. 484–498 (1998) Google Scholar
  9. 9.
    Crandall, D., Felzenszwalb, P.F., Huttenlocher, D.P.: Spatial priors for part-based recognition using statistical models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 10–17 (2005) Google Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005) Google Scholar
  11. 11.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: A benchmark. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2009) Google Scholar
  12. 12.
    Duetscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 126–133 (2000) Google Scholar
  13. 13.
    Eichner, M., Ferrari, V.: We are family: joint pose estimation of multiple persons. In: European Conference on Computer Vision, pp. 228–242 (2010) Google Scholar
  14. 14.
    Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: 2d articulated human pose estimation software. http://www.vision.ee.ethz.ch/~calvin/articulated_human_pose_estimation_code/
  15. 15.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010) CrossRefGoogle Scholar
  16. 16.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Discriminatively trained deformable part models. http://people.cs.uchicago.edu/~pff/latent/
  17. 17.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010) CrossRefGoogle Scholar
  18. 18.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005) CrossRefGoogle Scholar
  19. 19.
    Felzenszwalb, P.F., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008) Google Scholar
  20. 20.
    Fergus, R., Perona, P., Zisserman, A.: et al. Object class recognition by unsupervised scale-invariant learning. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 264–271 (2003) Google Scholar
  21. 21.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008) Google Scholar
  22. 22.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Pose search: Retrieving people using their pose. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2009) Google Scholar
  23. 23.
    Finley, T., Joachims, T.: Training structural svms when exact inference is intractable. In: International Conference on Machine Learning, pp. 304–311 (2008) Google Scholar
  24. 24.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973) CrossRefGoogle Scholar
  25. 25.
    Forsyth, D.A., Fleck, M.M.: Body plans. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 678–683 (2002) Google Scholar
  26. 26.
    Hua, G., Yang, M.H., Wu, Y.: Learning to estimate human pose with data driven belief propagation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 747–754 (2005) Google Scholar
  27. 27.
    Ioffe, S., Forsyth, D.: Human tracking with mixtures of trees. In: IEEE International Conference on Computer Vision, vol. 1, pp. 690–695 (2002) Google Scholar
  28. 28.
    Ioffe, S., Forsyth, D.A.: Probabilistic methods for finding people. Int. J. Comput. Vis. 43(1), 45–68 (2001) MATHCrossRefGoogle Scholar
  29. 29.
    Isard, M.: Pampas: Real-valued graphical models for computer vision. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 613–620 (2003) Google Scholar
  30. 30.
    Isard, M., Blake, A.: Condensation – conditional density propagation for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998) CrossRefGoogle Scholar
  31. 31.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (2010) Google Scholar
  32. 32.
    Ju, S.X., Black, M.J., Yacoob, Y.: Cardboard people: A parameterized model of articulated image motion. In: International Conference on Automatic Face and Gesture Recognition (1996) Google Scholar
  33. 33.
    Kumar, M.P., Torr, P.H.S., Zisserman, A.: Learning layered pictorial structures from video. In: Indian Conference on Computer Vision, Graphics and Image Processing (2004) Google Scholar
  34. 34.
    Kumar, M.P., Zisserman, A., Torr, P.H.S.: Efficient discriminative learning of parts-based models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 552–559 (2010) Google Scholar
  35. 35.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289 (2001) Google Scholar
  36. 36.
    Lan, X., Huttenlocher, D.P.: Beyond trees: Common-factor models for 2d human pose recovery. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 470–477 (2005) Google Scholar
  37. 37.
    Lee, M.W., Cohen, I.: Proposal maps driven mcmc for estimating human body pose in static images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 334–341 (2004) Google Scholar
  38. 38.
    Leibe, B., Leonardis, A., Schiele, B.: An implicit shape model for combined object categorization and segmentation. In: Toward Category-Level Object Recognition, pp. 508–524 (2006) CrossRefGoogle Scholar
  39. 39.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) CrossRefGoogle Scholar
  40. 40.
    Marr, D., Nishihara, H.K.: Representation and recognition of the spatial organization of three-dimensional shapes. Proc. R. Soc. Lond. B, Biol. Sci. 200(1140), 269–294 (1978) CrossRefGoogle Scholar
  41. 41.
    Matthews, I., Baker, S.: Active appearance models revisited. Int. J. Comput. Vis. 60(2), 135–164 (2004) CrossRefGoogle Scholar
  42. 42.
    Mori, G., Ren, X., Efros, A.A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004) Google Scholar
  43. 43.
    Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: European Conference on Computer Vision, pp. 241–254 (2010) Google Scholar
  44. 44.
    Ramanan, D.: Learning to parse images of articulated bodies. http://www.ics.uci.edu/~dramanan/papers/parse/index.html
  45. 45.
    Ramanan, D.: Learning to parse images of articulated bodies. Adv. Neural Inf. Process. Syst. 19, 1129–1136 (2007) Google Scholar
  46. 46.
    Ramanan, D., Forsyth, D.A.: Finding and tracking people from the bottom up. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 467–474 (2003) Google Scholar
  47. 47.
    Ramanan, D., Forsyth, D.A.: Using temporal coherence to build models of animals. In: IEEE International Conference on Computer Vision, vol. 1, pp. 338–345 (2003) CrossRefGoogle Scholar
  48. 48.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 271–278 (2005) Google Scholar
  49. 49.
    Ramanan, D., Forsyth, D.A., Zisserman, A.: Tracking people by learning their appearance. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 65–81 (2007) CrossRefGoogle Scholar
  50. 50.
    Ramanan, D., Sminchisescu, C.: Training deformable models for localization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 206–213 (2006) Google Scholar
  51. 51.
    Ronfard, R., Schmid, C., Triggs, B.: Learning to parse pictures of people. In: European Conference on Computer Vision, pp. 700–714 (2002) Google Scholar
  52. 52.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 422–429 (2010) CrossRefGoogle Scholar
  53. 53.
    Sapp, B., Toshev, A., Taskar, B.: Cascaded models for articulated pose estimation. In: European Conference on Computer Vision, pp. 406–420 (2010) Google Scholar
  54. 54.
    Sidenbladh, H., Black, M., Sigal, L.: Implicit probabilistic models of human motion for synthesis and tracking. In: European Conference on Computer Vision, pp. 784–800 (2002) Google Scholar
  55. 55.
    Sigal, L., Black, M.J.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2041–2048 (2006) Google Scholar
  56. 56.
    Sigal, L., Isard, M., Sigelman, B.H., Black, M.J.: Attractive people: Assembling loose-limbed models using non-parametric belief propagation. In: Advances in Neural Information Processing Systems, vol. 16 (2004) Google Scholar
  57. 57.
    Sivic, J., Zisserman, A.: Video google: Efficient visual search of videos. In: Toward Category-Level Object Recognition, pp. 127–144 (2006) CrossRefGoogle Scholar
  58. 58.
    Sudderth, E., Ihler, A., Isard, M., Freeman, W., Willsky, A.: Nonparametric belief propagation. Commun. ACM 53(10), 95–103 (2010) CrossRefGoogle Scholar
  59. 59.
    Sudderth, E., Mandel, M., Freeman, W., Willsky, A.: Distributed occlusion reasoning for tracking with nonparametric belief propagation. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1369–1376 (2004) Google Scholar
  60. 60.
    Tian, T.P., Sclaroff, S.: Fast globally optimal 2D human detection with loopy graph models. In: CVPR, pp. 81–88 (2010) Google Scholar
  61. 61.
    Tian, T.P., Sclaroff, S.: Fast multi-aspect 2d human detection. In: European Conference on Computer Vision, pp. 453–466 (2010) Google Scholar
  62. 62.
    Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: European Conference on Computer Vision, pp. 227–240 (2010) Google Scholar
  63. 63.
    Wang, Y., Mori, G.: Multiple tree models for occlusion and spatial constraints in human pose estimation. In: European Conference on Computer Vision, pp. 710–724 (2008) Google Scholar
  64. 64.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: European Conference on Computer Vision, pp. 18–32 (2000) Google Scholar
  65. 65.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures of parts. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011) Google Scholar
  66. 66.
    Yanover, C., Weiss, Y.: Finding the M most probable configurations using loopy belief propagation. In: Advances in Neural Information Processing Systems (2004) Google Scholar
  67. 67.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 17–24 (2010) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CaliforniaIrvineUSA

Personalised recommendations