Abstract
We estimate 2D human pose from video using only optical flow. The key insight is that dense optical flow can provide information about 2D body pose. Like range data, flow is largely invariant to appearance but unlike depth it can be directly computed from monocular video. We demonstrate that body parts can be detected from dense flow using the same random forest approach used by the Microsoft Kinect. Unlike range data, however, when people stop moving, there is no optical flow and they effectively disappear. To address this, our FlowCap method uses a Kalman filter to propagate body part positions and velocities over time and a regression method to predict 2D body pose from part centers. No range sensor is required and FlowCap estimates 2D human pose from monocular video sources containing human motion. Such sources include hand-held phone cameras and archival television video. We demonstrate 2D body pose estimation in a range of scenarios and show that the method works with real-time optical flow. The results suggest that optical flow shares invariances with range data that, when complemented with tracking, make it valuable for pose estimation.
References
Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92(1), 1–31 (2011)
Bissacco, A., Yang, M.-H., Soatto, S.: Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In: CVPR, pp. 1–8 (2007)
Bogo, F., Romero, J., Loper, M., Black, M.J.: FAUST: dataset and evaluation for 3D mesh registration. In: CVPR, pp. 3794–3801 (2014)
Bradski, G.R., Davis, J.W.: Motion segmentation and pose recognition with motion history gradients. Mach. Vis. Appl. 13(3), 174–184 (2002)
Bregler, C., Malik, J.: Tracking people with twists and exponential maps. In: CVPR, pp. 8–15 (1998)
Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733 (2003)
Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: 2D articulated human pose estimation and retrieval in (almost) unconstrained still images. IJCV 99, 190–214 (2012)
Elhayek, A., Aguiar, E., Jain, A., Tompson, J., Pishchulin, L., Andriluka, M., Bregler, C., Schiele, B., Theobalt, C.: Efficient convnet-based marker-less motion capture in general scenes with a low number of cameras. In: CVPR, pp. 3810–3818 (2015)
Fablet, R., Black, M.J.: Automatic detection and tracking of human motion with a view-based representation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 476–491. Springer, Heidelberg (2002)
Felzenszwalb, P. Girshick, R., McAllester, D.: Cascade object detection with deformable part models. In: CVPR, pp. 2241–2248 (2010)
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR, pp. 1–8 (2008)
Fragkiadaki, K., Hu, H., Shi, J.: Pose from flow and flow from pose estimation. In: CVPR, pp. 2059–2066 (2013)
Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M.J.: DRAPE: DRessing any PErson. SIGGRAPH 31(4), 35:1–35:10 (2012)
Hirshberg, D.A., Loper, M., Rachlin, E., Black, M.J.: Coregistration: simultaneous alignment and modeling of articulated 3D shape. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 242–255. Springer, Heidelberg (2012)
Jain, A., Tompson, J., LeCun, Y., Bregler, C.: MoDeep: a deep learning framework using motion features for human pose estimation. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9004, pp. 302–315. Springer, Heidelberg (2015)
Ju, S., Black, M.J., Yacoob, Y.: Cardboard people: a parameterized model of articulated motion. In: Face and Gesture, pp. 38–44 (1996)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)
Sapp, B., Weiss, D., Taskar, B.: Parsing human motion with stretchable models. In: CVPR, pp. 1281–1288 (2011)
Schwarz, L., Mkhitaryan, A., Mateus, D., Navab, N.: Estimating human 3D pose from time-of-flight images based on geodesic distances and optical flow. In: Face and Gesture, pp. 700–706 (2011)
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., Blake, A.: Efficient human pose estimation from single depth images. PAMI 35(12), 2821–2840 (2013)
Sidenbladh, H., Black, M.J., Fleet, D.J.: Stochastic tracking of 3D human figures using 2D image motion. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1843, pp. 702–718. Springer, Heidelberg (2000)
Sigal, L., Balan, A., Black, M.J.: HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV 87(1), 4–27 (2010)
Wachter, S., Nagel, H.: Tracking persons in monocular image sequences. CVIU 74(3), 174–192 (1999)
Welch, G., Bishop, G.: An introduction to the Kalman filter. UNC, TR 95–041, (2006)
Werlberger, M., Trobin, W., Pock, T., Wedel, A., Cremers, D., Bischof, H.: Anisotropic Huber-L1 optical flow. In: BMVC, pp. 108.1–108.11 (2009)
Yang,Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: CVPR, pp. 1385–1392 (2011)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)
Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: ICCV, pp. 3312–3319 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Romero, J., Loper, M., Black, M.J. (2015). FlowCap: 2D Human Pose from Optical Flow. In: Gall, J., Gehler, P., Leibe, B. (eds) Pattern Recognition. DAGM 2015. Lecture Notes in Computer Science(), vol 9358. Springer, Cham. https://doi.org/10.1007/978-3-319-24947-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-24947-6_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24946-9
Online ISBN: 978-3-319-24947-6
eBook Packages: Computer ScienceComputer Science (R0)