Advertisement

Benchmark Datasets for Pose Estimation and Tracking

  • Mykhaylo Andriluka
  • Leonid Sigal
  • Michael J. Black

Abstract

This chapter discusses the needs for standard datasets in the articulated pose estimation and tracking communities. It describes the datasets that are currently available and the performance of state-of-the-art methods on them. We discuss issues of ground-truth collection and quality, complexity of appearance and poses, evaluation metrics and partitioning of data. We also discusses limitations of current datasets and possible directions in developing new datasets for future use.

Keywords

Action Recognition Motion Capture Motion Capture System Motion Capture Data Pose Estimation Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Agarwal, A., Triggs, B.: 3d human pose from silhouettes by relevance vector regression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 882–888 (2004) Google Scholar
  2. 2.
    Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28, 44–58 (2006) CrossRefGoogle Scholar
  3. 3.
    Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009) Google Scholar
  4. 4.
    Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  5. 5.
    Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (2000) Google Scholar
  6. 6.
    Bergtholdt, M., Kappes, J.H., Schmidt, S., Schnörr, C.: A study of parts-based object class detection using complete graphs. Int. J. Comput. Vis. 87(1–2), 93–117 (2010) MathSciNetCrossRefGoogle Scholar
  7. 7.
    Bo, L., Sminchisescu, C.: Twin Gaussian processes for structured prediction. Int. J. Comput. Vis. 87(1–2), 28–52 (2010) CrossRefGoogle Scholar
  8. 8.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE International Conference on Computer Vision (2009). http://www.eecs.berkeley.edu/~lbourdev/h3d/ Google Scholar
  9. 9.
    Brubaker, M., Fleet, D., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Comput. Vis. 87(1–2), 140–155 (2010) CrossRefGoogle Scholar
  10. 10.
    Corazza, S., Mündermann, L., Gambaretto, E., Ferrigno, G., Andriacchi, T.: Markerless motion capture through visual hull, articulated ICP and subject specific model generation. Int. J. Comput. Vis. 87(1–2), 156–169 (2010) CrossRefGoogle Scholar
  11. 11.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005) Google Scholar
  12. 12.
    Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: British Machine Vision Conference (2009). http://www.vision.ee.ethz.ch/~calvin/ethz_pascal_stickmen/index.html Google Scholar
  13. 13.
    Eichner, M., Ferrari, V.: We are family: Joint pose estimation of multiple persons. In: European Conference on Computer Vision (2010) Google Scholar
  14. 14.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://pascallin.ecs.soton.ac.uk/challenges/VOC/ CrossRefGoogle Scholar
  15. 15.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005) CrossRefGoogle Scholar
  16. 16.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008). http://www.robots.ox.ac.uk/~vgg/data/stickmen/index.html Google Scholar
  17. 17.
    Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973) CrossRefGoogle Scholar
  18. 18.
    Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007) Google Scholar
  19. 19.
    Freifeld, O., Weiss, A., Zuff, S., Black, M.J.: Contour people: A parameterized model of 2D articulated human shape. In: Computer Vision and Pattern Recognition (2010) Google Scholar
  20. 20.
    Gall, J., Rosenhahn, B., Brox, T., Seidel, H.-P.: Optimization and filtering for human motion capture. Int. J. Comput. Vis. 87(1–2), 75–92 (2010) CrossRefGoogle Scholar
  21. 21.
    Gammeter, S., Ess, A., Jaeggli, T., Schindler, K., Leibe, B., Van Gool, L.: Articulated multi-body tracking under egomotion. In: European Conference on Computer Vision (2008) Google Scholar
  22. 22.
    Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~varung/cvpr10/ Google Scholar
  23. 23.
    Gupta, A., Kembhavi, A., Davis, L.S.: Observing human–object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009) CrossRefGoogle Scholar
  24. 24.
    Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.-P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009) Google Scholar
  25. 25.
    Hogg, D.: Model-based vision: a program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983) CrossRefGoogle Scholar
  26. 26.
    Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006) Google Scholar
  27. 27.
    Ionescu, C., Bo, L., Sminchisescu, C.: Structural SVM for visual localization and continuous state estimation. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  28. 28.
    Jiang, H.: Human pose estimation using consistent max-covering. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  29. 29.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (2010) Google Scholar
  30. 30.
    Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011) Google Scholar
  31. 31.
    Kjellström, H., Kragić, D., Black, M.J.: Tracking people interacting with objects. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  32. 32.
    Kumar, M.P., Zisserman, A., Torr, P.H.S.: Efficient discriminative learning of parts-based models. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  33. 33.
    Lan, X., Huttenlocher, D.P.: Beyond trees: Common-factor models for 2d human pose recovery. In: IEEE International Conference on Computer Vision (2005) Google Scholar
  34. 34.
    Lee, C.-S., Elgammal, A.: Coupled visual and kinematic manifold models for tracking. Int. J. Comput. Vis. 87(1–2), 118–139 (2010) CrossRefGoogle Scholar
  35. 35.
    Lee, M.W., Cohen, I.: Proposal maps driven MCMC for estimating human body pose in static images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004) Google Scholar
  36. 36.
    Li, R., Tian, T.-P., Sclaroff, S., Yang, M.-H.: 3d human motion tracking with a coordinated mixture of factor analyzers. Int. J. Comput. Vis. 87(1–2), 170–190 (2010) CrossRefGoogle Scholar
  37. 37.
    Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: Computer Vision and Pattern Recognition (2011) Google Scholar
  38. 38.
    Ning, H., Xu, W., Gong, Y., Huang, T.: Latent pose estimator for continuous action recognition. In: European Conference on Computer Vision, pp. 419–433 (2008) Google Scholar
  39. 39.
    Peursum, P., Venkatesh, S., West, G.: A study on smoothing for particle filtered 3d human body tracking. Int. J. Comput. Vis. 87(1–2), 53–74 (2010) CrossRefGoogle Scholar
  40. 40.
    Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.-P., Rosenhahn, B.: Multisensor-fusion for 3d full-body human motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://www.tnt.uni-hannover.de/project/MPI08_Database/ Google Scholar
  41. 41.
    Ramanan, D.: Learning to parse images of articulated bodies. In: Advances in Neural Information Processing Systems (2006). http://www.ics.uci.edu/~dramanan/papers/parse/people.zip Google Scholar
  42. 42.
    Ren, X., Berg, A.C., Malik, J.: Recovering human body configurations using pairwise constraints between parts. In: IEEE International Conference on Computer Vision (2005) Google Scholar
  43. 43.
    Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  44. 44.
    Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: IEEE International Conference on Computer Vision, vol. 2, pp. 750–759 (2003) CrossRefGoogle Scholar
  45. 45.
    Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010). http://vision.cs.brown.edu/humaneva/index.html CrossRefGoogle Scholar
  46. 46.
    Sigal, L., Black, M.J.: Guest editorial: State of the art in image- and video-based human pose and motion estimation. Int. J. Comput. Vis. 87(1–2), 1–3 (2010) CrossRefGoogle Scholar
  47. 47.
    Singh, V., Nevatia, R., Huang, C.: Efficient inference with multiple heterogeneous part detectors for human pose estimation. In: European Conference on Computer Vision (2010) Google Scholar
  48. 48.
    Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional visual tracking in kernel space. In: Advances in Neural Information Processing Systems (2005) Google Scholar
  49. 49.
    Sminchisescu, C., Kanaujia, A., Metaxas, D.: Learning joint top–down and bottom–up processes for 3d visual inference. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006) Google Scholar
  50. 50.
    Tian, T.-P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010) Google Scholar
  51. 51.
    Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: European Conference on Computer Vision (2010) Google Scholar
  52. 52.
    Urtasun, R., Darrell, T.: Local probabilistic regression for activity-independent human pose inference. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  53. 53.
    Vlasic, D., Adelsberger, R., Vannucci, G., Barnwell, J., Gross, M., Matusik, W., Popović, J.: Practical motion capture in everyday surroundings. ACM Trans. Graph. 26(3), 35 (2007) CrossRefGoogle Scholar
  54. 54.
    Wang, P., Rehg, J.M.: A modular approach to the analysis and evaluation of particle filters for figure tracking. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 790–797 (2006). http://www.cc.gatech.edu/~pingwang/Project/FigureTracking.html Google Scholar
  55. 55.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011) Google Scholar
  56. 56.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human–object interaction activities. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~bangpeng/resource/mutual_context_annotation.rar Google Scholar
  57. 57.
    Zhang, J., Luo, J., Collins, R., Liu, Y.: Body localization in still images using hierarchical models and hybrid search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Mykhaylo Andriluka
    • 1
  • Leonid Sigal
    • 2
  • Michael J. Black
    • 3
    • 4
  1. 1.Max Planck Institute for Computer ScienceSaarbrückenGermany
  2. 2.Disney ResearchPittsburghUSA
  3. 3.Max Planck Institute for Intelligent SystemsTübingenGermany
  4. 4.Department of Computer ScienceBrown UniversityProvidenceUSA

Personalised recommendations