Advertisement

RED: A Simple but Effective Baseline Predictor for the TrajNet Benchmark

  • Stefan BeckerEmail author
  • Ronny Hug
  • Wolfgang Hübner
  • Michael Arens
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)

Abstract

In recent years, there is a shift from modeling the tracking problem based on Bayesian formulation towards using deep neural networks. Towards this end, in this paper the effectiveness of various deep neural networks for predicting future pedestrian paths are evaluated. The analyzed deep networks solely rely, like in the traditional approaches, on observed tracklets without human-human interaction information. The evaluation is done on the publicly available TrajNet benchmark dataset [39], which builds up a repository of considerable and popular datasets for trajectory prediction. We show how a Recurrent-Encoder with a Dense layer stacked on top, referred to as RED-predictor, is able to achieve top-rank at the TrajNet 2018 challenge compared to elaborated models. Further, we investigate failure cases and give explanations for observed phenomena, and give some recommendations for overcoming demonstrated shortcomings.

Keywords

Trajectory forecasting Path prediction Trajectory-based activity forecasting 

Notes

Acknowledgements

The authors thank the organizers of the TrajNet challenge for providing a framework towards a more meaningful, standardized trajectory prediction benchmarking.

References

  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org
  2. 2.
    Akaike, H.: Fitting autoregressive models for prediction. Ann. Inst. Stat. Math. 21(1), 243–247 (1969). http://EconPapers.repec.org/RePEc:spr:aistmt:v:21:y:1969:i:1:p:243–247MathSciNetCrossRefGoogle Scholar
  3. 3.
    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 961–971. IEEE (2016)Google Scholar
  4. 4.
    Alahi, A., et al.: Learning to predict human behaviour in crowded scenes. In: Group and Crowd Behavior for Computer Vision. Elsevier (2017)Google Scholar
  5. 5.
    Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint abs/1803.01271 (2018). http://arxiv.org/abs/1803.01271
  6. 6.
    Ballan, L., Castaldo, F., Alahi, A., Palmieri, F., Savarese, S.: Knowledge transfer for scene-specific motion prediction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 697–713. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_42CrossRefGoogle Scholar
  7. 7.
    Brownlee, J.: Introduction to time series forecasting with python: how to prepare data and develop models to predict the future (2017). https://books.google.de/books?id=bA5ItAEACAAJ
  8. 8.
    Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha, Qatar (2014). http://www.aclweb.org/anthology/D14-1179
  9. 9.
    Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A., Bengio, Y.: A recurrent latent variable model for sequential data. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  10. 10.
    Coscia, P., Castaldo, F., Palmieri, F.A., Alahi, A., Savarese, S., Ballan, L.: Long-term path prediction in urban scenarios using circular distributions. Image Vis. Comput. 69, 81–91 (2018).  https://doi.org/10.1016/j.imavis.2017.11.006. http://www.sciencedirect.com/science/article/pii/S0262885617301853CrossRefGoogle Scholar
  11. 11.
    Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Conference on Computer Vision and Pattern Recognition. IEEE (2015)Google Scholar
  12. 12.
    Draper, N.R., Smith, H.: Applied Regression Analysis. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1966)zbMATHGoogle Scholar
  13. 13.
    Ellis, D., Sommerlade, E., Reid, I.: Modelling pedestrian trajectory patterns with Gaussian processes. In: International Conference on Computer Vision Workshops (ICCVW), pp. 1229–1234. IEEE (2009).  https://doi.org/10.1109/ICCVW.2009.5457470
  14. 14.
    Ferryman, J., Shahrokni, A.: Pets 2009: dataset and challenge. In: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS), pp. 1–6 (2009).  https://doi.org/10.1109/PETS-WINTER.2009.5399556
  15. 15.
    Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649 (2013).  https://doi.org/10.1109/ICASSP.2013.6638947
  16. 16.
    Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2018)Google Scholar
  17. 17.
    Hasan, I., Setti, F., Tsesmelis, T., Bue, A.D., Galasso, F., Cristani, M.: MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses. In: Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2018)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016).  https://doi.org/10.1109/CVPR.2016.90
  19. 19.
    Helbing, D., Molnár, P.: Social force model for pedestrian dynamics. Phys. Rev. E 51, 4282–4286 (1995).  https://doi.org/10.1103/PhysRevE.51.4282. https://link.aps.org/doi/10.1103/PhysRevE.51.4282CrossRefGoogle Scholar
  20. 20.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).  https://doi.org/10.1162/neco.1997.9.8.1735CrossRefGoogle Scholar
  21. 21.
    Huang, S., et al.: Deep learning driven visual path prediction from a single image. IEEE Trans. Image Process. 25(12), 5892–5904 (2016).  https://doi.org/10.1109/TIP.2016.2613686MathSciNetCrossRefGoogle Scholar
  22. 22.
    Huber, M.: Nonlinear Gaussian filtering: theory, algorithms, and applications. Ph.D. thesis, Karlsruhe Institute of Technology (KIT) (2015)Google Scholar
  23. 23.
    Hug, R., Becker, S., Hübner, W., Arens, M.: On the reliability of LSTM-MDL models for predicting pedestrian trajectories. In: Representations, Analysis and Recognition of Shape and Motion from Imaging Data (RFMI), Savoie, France (2017)Google Scholar
  24. 24.
    Hug, R., Becker, S., Hübner, W., Arens, M.: Particle-based pedestrian path prediction using LSTM-MDL models. In: IEEE International Conference on Intelligent Transportation Systems (ITSC) (2018). http://arxiv.org/abs/1804.05546
  25. 25.
    Kalman, R.E.: A new approach to linear filtering and prediction problems. ASME J. Basic Eng. 82, 35–45 (1960)CrossRefGoogle Scholar
  26. 26.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations (ICLR) (2015)Google Scholar
  27. 27.
    Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33765-9_15CrossRefGoogle Scholar
  28. 28.
    Kooij, J.F.P., Schneider, N., Flohr, F., Gavrila, D.M.: Context-based pedestrian path prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 618–633. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_40CrossRefGoogle Scholar
  29. 29.
    Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H.S., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2017)Google Scholar
  30. 30.
    Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. Comput. Graph. Forum 26(3), 655–664 (2007)CrossRefGoogle Scholar
  31. 31.
    Li, Z., Zhou, Y., Xiao, S., He, C., Li, H.: Auto-conditioned LSTM network for extended complex human motion synthesis. arXiv preprint abs/1707.05363 (2017). http://arxiv.org/abs/1707.05363
  32. 32.
    Ma, W., Huang, D., Lee, N., Kitani, K.M.: Forecasting interactive dynamics of pedestrians with fictitious play. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4636–4644. IEEE (2017).  https://doi.org/10.1109/CVPR.2017.493
  33. 33.
    Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4674–4683. IEEE (2017).  https://doi.org/10.1109/CVPR.2017.497
  34. 34.
    McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman & Hall, CRC, London (1989)CrossRefGoogle Scholar
  35. 35.
    van den Oord, A., et al.: Wavenet: a generative model for raw audio. arXiv preprint abs/1609.03499 (2016). http://arxiv.org/abs/1609.03499
  36. 36.
    Pellegrini, S., Ess, A., Schindler, K., van Gool, L.: You’ll never walk alone: modeling social behavior for multi-target tracking. In: International Conference on Computer Vision, pp. 261–268. IEEE (2009).  https://doi.org/10.1109/ICCV.2009.5459260
  37. 37.
    Priestley, M.B.: Spectral Analysis and Time Series. Academic Press, London, New York (1981)zbMATHGoogle Scholar
  38. 38.
    Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_33CrossRefGoogle Scholar
  39. 39.
    Sadeghian, A., Kosaraju, V., Gupta, A., Savarese, S., Alahi, A.: Trajnet: towards a benchmark for human trajectory prediction. arXiv preprint (2018)Google Scholar
  40. 40.
    Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Savarese, S.: SoPhie: an attentive GAN for predicting paths compliant to social and physical constraints. arXiv preprint arXiv:1806.01482 (2018)
  41. 41.
    Vemula, A., Muelling, K., Oh, J.: Modeling cooperative navigation in dense human crowds. In: International Conference on Robotics and Automation (ICRA), pp. 1685–1692. IEEE, May 2017.  https://doi.org/10.1109/ICRA.2017.7989199
  42. 42.
    Williams, C.K.I.: Prediction with Gaussian processes: from linear regression to linear prediction and beyond. In: Jordan, M.I. (ed.) Learning in Graphical Models. NATO ASI Series, pp. 599–621. Springer, Dordrecht (1998).  https://doi.org/10.1007/978-94-011-5014-9_23CrossRefGoogle Scholar
  43. 43.
    Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2048–2057. PMLR, Lille, France (2015)Google Scholar
  44. 44.
    Xue, H., Huynh, D.Q., Reynolds, H.M.: SS-LSTM: a hierarchical LSTM model for pedestrian trajectory prediction. In: Winter Conference on Applications of Computer Vision (WACV). IEEE (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Fraunhofer Institute for Optronics, System Technologies, and Image Exploitation IOSBEttlingenGermany

Personalised recommendations