Creatures Great and SMAL: Recovering the Shape and Motion of Animals from Video

Biggs, Benjamin; Roddick, Thomas; Fitzgibbon, Andrew; Cipolla, Roberto

doi:10.1007/978-3-030-20873-8_1

Benjamin Biggs¹⁸,
Thomas Roddick¹⁸,
Andrew Fitzgibbon¹⁹ &
…
Roberto Cipolla¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11365))

Included in the following conference series:

Asian Conference on Computer Vision

2644 Accesses
24 Citations
3 Altmetric

Abstract

We present a system to recover the 3D shape and motion of a wide variety of quadrupeds from video. The system comprises a machine learning front-end which predicts candidate 2D joint positions, a discrete optimization which finds kinematically plausible joint correspondences, and an energy minimization stage which fits a detailed 3D model to the image. In order to overcome the limited availability of motion capture training data from animals, and the difficulty of generating realistic synthetic training images, the system is designed to work on silhouette data. The joint candidate predictor is trained on synthetically generated silhouette images, and at test time, deep learning methods or standard video segmentation tools are used to extract silhouettes from real data. The system is tested on animal videos from several species, and shows accurate reconstructions of 3D shape and pose.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Food and Agriculture Organization of the United Nations: FAOSTAT statistics database (2016). Accessed FAOSTAT 21 Nov 2017
Google Scholar
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation (2018)
Google Scholar
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of CVPR (2014)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human poseestimation. In: Proceedings of BMVC, pp. 12.1–12.11 (2010)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results (2012)
Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of CVPR (2016)
Google Scholar
Wilhelm, N., Vögele, A., Zsoldos, R., Licka, T., Krüger, B., Bernard, J.: Furyexplorer: visual-interactive exploration of horse motion capture data. In: Visualization and Data Analysis 2015, vol. 9397, p. 93970F (2015)
Google Scholar
Zuffi, S., Kanazawa, A., Jacobs, D., Black, M.J.: 3D menagerie: modeling the 3D shape and pose of animals. In: Proceedings of CVPR, pp. 5524–5532. IEEE (2017)
Google Scholar
Shotton, J., et al.: Real-time human pose recognition in parts from a single depth image. In: Proceedings of CVPR. IEEE (2011)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of CVPR, vol. 1, p. 7 (2017)
Google Scholar
Zuffi, S., Kanazawa, A., Black, M.J.: Lions and tigers and bears: capturing non-rigid, 3D, articulated shape from images. In: Proceedings of CVPR (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of CVPR (2009)
Google Scholar
Li, X., et al.: Video object segmentation with re-identification. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)
Google Scholar
Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)
Google Scholar
Cashman, T.J., Fitzgibbon, A.W.: What shape are dolphins? Building 3D morphable models from 2Dimages. IEEE TPAMI 35, 232–244 (2013)
Article Google Scholar
Reinert, B., Ritschel, T., Seidel, H.P.: Animated 3D creatures from single-view video by skeletal sketching. In: Graphics Interface, pp. 133–141 (2016)
Google Scholar
Khamis, S., Taylor, J., Shotton, J., Keskin, C., Izadi, S., Fitzgibbon, A.: Learning an efficient model of hand shape variation from depth images. In: Proceedings of CVPR. IEEE (2015)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34, 248:1–248:16 (2015). (Proceedings of SIGGRAPH Asia)
Google Scholar
Chen, Y., Kim, T.-K., Cipolla, R.: Inferring 3D shapes and deformations from single views. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6313, pp. 300–313. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15558-1_22
Chapter Google Scholar
Favreau, L., Reveret, L., Depraz, C., Cani, M.P.: Animal gaits from video. In: Proceedings of the 2004 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 277–286 (2004)
Google Scholar
Tan, V., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: Proceedings of BMVC (2017)
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Wiles, O., Zisserman, A.: SilNet: single-and multi-view reconstruction by learning from silhouettes. In: Proceedings of BMVC (2017)
Google Scholar
Andriluka, M., Roth, S., Schiele, B.: Monocular 3D pose estimation and tracking by detection. In: Proceedings of CVPR, pp. 623–630. IEEE (2010)
Google Scholar
Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: Proceedings of CVPR, pp. 588–595. IEEE (2013)
Google Scholar
Bulat, A., Tzimiropoulos, G.: Human pose estimation via convolutional part heatmap regression. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 717–732. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_44
Chapter Google Scholar
Mathis, A., et al.: DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Technical report, Nature Publishing Group (2018)
Google Scholar
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Joint object and part segmentation using deep learned potentials. In: Proceedings of ICCV, pp. 1573–1581 (2015)
Google Scholar
Wang, J., Yuille, A.L.: Semantic part segmentation using compositional model combining shape and appearance. In: Proceedings of CVPR, pp. 1788–1797 (2015)
Google Scholar
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of CVPR (2014)
Google Scholar
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of CVPR (2017)
Google Scholar
Loper, M.M., Black, M.J.: OpenDR: an approximate differentiable renderer. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 154–169. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_11
Chapter Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Diamond, S., Boyd, S.: CVXPY: a Python-embedded modeling language for convex optimization. J. Mach. Learn. Res. 17, 1–5 (2016)
MathSciNet MATH Google Scholar
Park, J., Boyd, S.: General heuristics for nonconvex quadratically constrained quadratic programming (2017)
Google Scholar
Blum, H.: A transformation for extracting new descriptors of shape. Models Percept. Speech Vis. Forms 1967, 362–380 (1967)
Google Scholar
Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press, Cambridge (1992)
Book Google Scholar
Lourakis, M., Argyros, A.A.: Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment? In: Proceedings of ICCV, pp. 1526–1531 (2005)
Google Scholar
Adobe Systems Inc.: Creating a green screen key using ultra key. https://helpx.adobe.com/premiere-pro/atv/cs5-cs55-video-tutorials/creating-a-green-screen-key-using-ultra-key.html. Accessed 14 Mar 2018
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. TPAMI 35, 2878–2890 (2013)
Article Google Scholar
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34
Chapter Google Scholar

Download references

Acknowledgment

The authors would like to thank GlaxoSmithKline for sponsoring this work.

Author information

Authors and Affiliations

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK
Benjamin Biggs, Thomas Roddick & Roberto Cipolla
Microsoft Research, 21 Station Road, Cambridge, CB1 2FB, UK
Andrew Fitzgibbon

Authors

Benjamin Biggs
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Roddick
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Fitzgibbon
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Cipolla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Benjamin Biggs , Thomas Roddick , Andrew Fitzgibbon or Roberto Cipolla .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 88929 KB)

Supplementary material 1 (pdf 1056 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biggs, B., Roddick, T., Fitzgibbon, A., Cipolla, R. (2019). Creatures Great and SMAL: Recovering the Shape and Motion of Animals from Video. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-20873-8_1
Published: 26 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20872-1
Online ISBN: 978-3-030-20873-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics