FlowCap: 2D Human Pose from Optical Flow

  • Javier Romero
  • Matthew Loper
  • Michael J. Black
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9358)


We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and velocities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation.


  1. 1.
  2. 2.
    Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92(1), 1–31 (2011)CrossRefGoogle Scholar
  3. 3.
    Bissacco, A., Yang, M.-H., Soatto, S.: Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In: CVPR, pp. 1–8 (2007)Google Scholar
  4. 4.
    Bogo, F., Romero, J., Loper, M., Black, M.J.: FAUST: dataset and evaluation for 3D mesh registration. In: CVPR, pp. 3794–3801 (2014)Google Scholar
  5. 5.
    Bradski, G.R., Davis, J.W.: Motion segmentation and pose recognition with motion history gradients. Mach. Vis. Appl. 13(3), 174–184 (2002)CrossRefGoogle Scholar
  6. 6.
    Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: CVPR, pp. 8–15 (1998)Google Scholar
  7. 7.
    Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733 (2003)Google Scholar
  8. 8.
    Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. IJCV 99, 190–214 (2012)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Elhayek, A., Aguiar, E., Jain, A., Tompson, J., Pishchulin, L., Andriluka, M., Bregler, C., Schiele, B., Theobalt, C.: Efficient convnet-based marker-less motion capture in general scenes with a low number of cameras. In: CVPR, pp. 3810–3818 (2015)Google Scholar
  10. 10.
    Fablet, R., Black, M.J.: Automatic detection and tracking of human motion with a view-based representation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 476–491. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  11. 11.
    Felzenszwalb, P. Girshick, R., McAllester, D.: Cascade object detection with deformable part models. In: CVPR, pp. 2241–2248 (2010)Google Scholar
  12. 12.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR, pp. 1–8 (2008)Google Scholar
  13. 13.
    Fragkiadaki, K., Hu, H., Shi, J.: Pose from flow and flow from pose estimation. In: CVPR, pp. 2059–2066 (2013)Google Scholar
  14. 14.
    Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M.J.: DRAPE: DRessing any PErson. SIGGRAPH 31(4), 35:1–35:10 (2012)Google Scholar
  15. 15.
    Hirshberg, D.A., Loper, M., Rachlin, E., Black, M.J.: Coregistration: simultaneous alignment and modeling of articulated 3D shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 242–255. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  16. 16.
    Jain, A., Tompson, J., LeCun, Y., Bregler, C.: MoDeep: a deep learning framework using motion features for human pose estimation. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 302–315. Springer, Heidelberg (2015) Google Scholar
  17. 17.
    Ju, S., Black, M.J., Yacoob, Y.: Cardboard people: a parameterized model of articulated motion. In: Face and Gesture, pp. 38–44 (1996)Google Scholar
  18. 18.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: CVPR, pp. 1281–1288 (2011)Google Scholar
  20. 20.
    Schwarz, L., Mkhitaryan, A., Mateus, D., Navab, N.: Estimating human 3D pose from time-of-flight images based on geodesic distances and optical flow. In: Face and Gesture, pp. 700–706 (2011)Google Scholar
  21. 21.
    Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A.: Efficient human pose estimation from single depth images. PAMI 35(12), 2821–2840 (2013)CrossRefGoogle Scholar
  22. 22.
    Sidenbladh, H., Black, M.J., Fleet, D.J.: Stochastic tracking of 3D human figures using 2D image motion. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 702–718. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  23. 23.
    Sigal, L., Balan, A., Black, M.J.: HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV 87(1), 4–27 (2010)CrossRefGoogle Scholar
  24. 24.
    Wachter, S., Nagel, H.: Tracking persons in monocular image sequences. CVIU 74(3), 174–192 (1999)Google Scholar
  25. 25.
    Welch, G., Bishop, G.: An introduction to the Kalman filter. UNC, TR 95–041, (2006)Google Scholar
  26. 26.
    Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., Bischof, H.: Anisotropic Huber-L1 optical flow. In: BMVC, pp. 108.1–108.11 (2009)Google Scholar
  27. 27.
    Yang,Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: CVPR, pp. 1385–1392 (2011)Google Scholar
  28. 28.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: ICCV, pp. 3312–3319 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Javier Romero
    • 1
  • Matthew Loper
    • 1
  • Michael J. Black
    • 1
  1. 1.Max Planck Institute for Intelligent SystemsTübingenGermany

Personalised recommendations