Abstract
In this work we present a convolutional neural network-based (CNN) model that predicts future movements of a ball given a series of images depicting the ball and its environment. For training and evaluation, we use artificially generated images sequences. Two scenarios are analyzed: Prediction in a simple table tennis environment and a more challenging squash environment. Classical 2D convolution layers are compared with 3D convolution layers that extract the motion information of the ball from contiguous frames. Moreover, we investigate whether networks with stereo visual input perform better than those with monocular vision only. Our experiments suggest that CNNs can indeed predict physical behaviour with small error rates on unseen data but the performance drops for very complex underlying movements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bhat, K.S., Seitz, S.M., Popović, J., Khosla, P.K.: Computing the physical parameters of rigid-body motion from video. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 551–565. Springer, Heidelberg (2002). doi:10.1007/3-540-47969-4_37
Fragkiadaki, K., Agrawal, P., Levine, S., Malik, J.: Learning visual predictive models of physics for playing billiards. CoRR abs/1511.07404 (2015)
Hamrick, J.B., Battaglia, P., Tenenbaum, J.B.: Probabilistic internal physics models guide judgments about object dynamics. In: Proceedings of the 33th Annual Meeting of the Cognitive Science Society, CogSci 2011, July 20–23, Boston (2011)
Hinton, G.: Neural networks for machine learning, lecture notes. http://www.cs.toronto.edu/~tijmen/csc321
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 1725–1732. IEEE Computer Society, Washington (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc. (2012)
Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, pp. 430–438, New York City, 19–24 June 2016
Michalski, V., Memisevic, R., Konda, K.: Modeling deep temporal dependencies with recurrent grammar cells. In: Advances in Neural Information Processing Systems, vol. 27, pp. 1925–1933. Curran Associates Inc. (2014)
Mottaghi, R., Bagherinezhad, H., Rastegari, M., Farhadi, A.: Newtonian image understanding: unfolding the dynamics of objects in static images. CoRR abs/1511.04048 (2015)
Kyriazis, N., Oikonomidis, I., Argyros, A.: Binding computer vision to physics based simulation: the case study of a bouncing ball. In: Proceedings of the British Machine Vision Conference, p. 43.1–43.11. BMVA Press (2011)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497. IEEE (2015)
Walker, J., Doersch, C., Gupta, A., Hebert, M.: An uncertain future: forecasting from static images using variational autoencoders. CoRR abs/1606.07873 (2016)
Wu, J., Yildirim, I., Lim, J.J., Freeman, B., Tenenbaum, J.: Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds) Advances in Neural Information Processing Systems, vol. 28, pp. 127–135. Curran Associates Inc. (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Warnecke, A., Lüddecke, T., Wörgötter, F. (2017). Convolutional Neural Networks for Movement Prediction in Videos. In: Roth, V., Vetter, T. (eds) Pattern Recognition. GCPR 2017. Lecture Notes in Computer Science(), vol 10496. Springer, Cham. https://doi.org/10.1007/978-3-319-66709-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-66709-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66708-9
Online ISBN: 978-3-319-66709-6
eBook Packages: Computer ScienceComputer Science (R0)