2D Human Pose Estimation in TV Shows

Ferrari, Vittorio; Marín-Jiménez, Manuel; Zisserman, Andrew

doi:10.1007/978-3-642-03061-1_7

2D Human Pose Estimation in TV Shows

Vittorio Ferrari¹⁹,
Manuel Marín-Jiménez²⁰ &
Andrew Zisserman²¹

Conference paper

1305 Accesses
28 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5604))

Abstract

The goal of this work is fully automatic 2D human pose estimation in unconstrained TV shows and feature films. Direct pose estimation on this uncontrolled material is often too difficult, especially when knowing nothing about the location, scale, pose, and appearance of the person, or even whether there is a person in the frame or not.

We propose an approach that progressively reduces the search space for body parts, to greatly facilitate the task for the pose estimator. Moreover, when video is available, we propose methods for exploiting the temporal continuity of both appearance and pose for improving the estimation based on individual frames.

The method is fully automatic and self-initializing, and explains the spatio-temporal volume covered by a person moving in a shot by soft-labeling every pixel as belonging to a particular body part or to the background. We demonstrate upper-body pose estimation by running our system on four episodes of the TV series Buffy the vampire slayer (i.e. three hours of video). Our approach is evaluated quantitatively on several hundred video frames, based on ground-truth annotation of 2D poses. Finally, we present an application to full-body action recognition on the Weizmann dataset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, A., Triggs, B.: 3d human pose from silhouettes by relevance vector regression. In: CVPR (2004)
Google Scholar
Agarwal, A., Triggs, B.: Tracking articulated motion using a mixture of autoregressive models. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 54–65. Springer, Heidelberg (2004)
Chapter Google Scholar
Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: CVPR (2008)
Google Scholar
Bishop, C.: Pattern recognition and machine learning. Springer, Heidelberg (2006)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV (2005)
Google Scholar
Bray, M., Kohli, P., Torr, P.: Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part II. LNCS, vol. 3952, pp. 642–655. Springer, Heidelberg (2006)
Chapter Google Scholar
Dalal, N., Triggs, B.: Histogram of Oriented Gradients for Human Detection. In: CVPR, vol. 2, pp. 886–893 (2005)
Google Scholar
Davis, J., Bobick, A.: The representation and recognition of action using temporal templates. In: CVPR (1997)
Google Scholar
Felzenszwalb, P., Huttenlocher, D.: Pictorial structures for object recognition. IJCV 61(1) (2005)
Google Scholar
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR (June 2008)
Google Scholar
Ferrari, V., Tuytelaars, T., Van Gool, L.: Real-time affine region tracking and coplanar grouping. In: CVPR (2001)
Google Scholar
Gammeter, S., Ess, A., Jaeggli, T., Schindler, K., Van Gool, L.: Articulated multi-body tracking under egomotion. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 816–830. Springer, Heidelberg (2008)
Chapter Google Scholar
Ikizler, N., Duygulu, P.: Human action recognition using distribution of oriented rectangular patches. In: ICCV workshop on Human Motion Understanding (2007)
Google Scholar
Jojic, N., Winn, J., Zitnick, L.: Escaping local minima through hierarchical model selection: Automatic object discovery, segmentation, and tracking in video. In: CVPR (2006)
Google Scholar
Kumar, M.P., Torr, P.H.S., Zisserman, A.: Learning layered pictorial structures from video. In: ICVGIP, pp. 148–153 (2004)
Google Scholar
Kumar, M.P., Torr, P.H.S., Zisserman, A.: Learning layered motion segmentations of video. In: ICCV (2005)
Google Scholar
Laptev, I.: Improvements of object detection using boosted histograms. In: BMVC (2006)
Google Scholar
Laptev, I., Perez, P.: Retrieving actions in movies. In: ICCV (2007)
Google Scholar
Lin, Z., Davis, L., Doermann, D., DeMenthon, D.: An interactive approach to pose-assisted and appearance-based segmentation of humans. In: ICCV workshop on Interactive Computer Vision (2007)
Google Scholar
Mori, G., Ren, X., Efros, A., Malik, J.: Recovering human body configurations: Combining segmentation and recognition. In: CVPR (2004)
Google Scholar
Niebles, J., Fei-Fei, L.: A hierarchical model model of shape and appearance for human action classification. In: CVPR (2007)
Google Scholar
Ozuysal, M., Lepetit, V., Fleuret, F., Fua, P.: Feature harvesting for tracking-by-detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 592–605. Springer, Heidelberg (2006)
Chapter Google Scholar
Ramanan, D.: Learning to parse images of articulated bodies. In: NIPS (2006)
Google Scholar
Ramanan, D., Forsyth, D.A., Zisserman, A.: Strike a pose: Tracking people by finding stylized poses. In: CVPR, vol. 1, pp. 271–278 (2005)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts 23(3), 309–314 (2004)
Google Scholar
Schroff, F., Criminisi, A., Zisserman, A.: Single-histogram class models for image segmentation. In: Kalra, P.K., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338, pp. 82–93. Springer, Heidelberg (2006)
Chapter Google Scholar
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR (2007)
Google Scholar
Sigal, L., Bhatia, S., Roth., S., Black, M., Isard, M.: Tracking loose-limbed people. In: CVPR (2004)
Google Scholar
Sigal, L., Black, M.J.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: CVPR, vol. 2, pp. 2041–2048 (2006)
Google Scholar
Sivic, J., Everingham, M., Zisserman, A.: Person spotting: video shot retrieval for face sets. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 226–236. Springer, Heidelberg (2005)
Chapter Google Scholar
Sminchisescu, C., Triggs, B.: Estimating articulated human motion with covariance scaled sampling. In: IJRR (2003)
Google Scholar
Thurau, C., Hlavac, V.: Pose primitive based human action recognition in videos or still images. In: CVPR (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

ETH Zurich, Switzerland
Vittorio Ferrari
University of Granada, Spain
Manuel Marín-Jiménez
University of Oxford, UK
Andrew Zisserman

Authors

Vittorio Ferrari
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Marín-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Informatik III, Universität Bonn, Römerstraße 164, 53117, Bonn, Germany
Daniel Cremers & Frank R. Schmidt &
Institut für Informationsverarbeitung, Leibniz Universität Hannover, Appelstraße 9A, 30167, Hannover, Germany
Bodo Rosenhahn
Department of Statistics and Psychology, University of California - Los Angeles, 8967 Math Sciences Building, 90095-1554, Los Angeles, CA, USA
Alan L. Yuille

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrari, V., Marín-Jiménez, M., Zisserman, A. (2009). 2D Human Pose Estimation in TV Shows. In: Cremers, D., Rosenhahn, B., Yuille, A.L., Schmidt, F.R. (eds) Statistical and Geometrical Approaches to Visual Motion Analysis. Lecture Notes in Computer Science, vol 5604. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03061-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-03061-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03060-4
Online ISBN: 978-3-642-03061-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics