Advertisement

An Empirical Study Towards Understanding How Deep Convolutional Nets Recognize Falls

  • Yan ZhangEmail author
  • Heiko Neumann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11134)

Abstract

Detecting unintended falls is essential for ambient intelligence and healthcare of elderly people living alone. In recent years, deep convolutional nets are widely used in human action analysis, based on which a number of fall detection methods have been proposed. Despite their highly effective performances, the behaviors of how the convolutional nets recognize falls are still not clear. In this paper, instead of proposing a novel approach, we perform a systematical empirical study, attempting to investigate the underlying fall recognition process. We propose four tasks to investigate, which involve five types of input modalities, seven net instances and different training samples. The obtained quantitative and qualitative results reveal the patterns that the nets tend to learn, and several factors that can heavily influence the performances on fall recognition. We expect that our conclusions are favorable to proposing better deep learning solutions to fall detection systems.

Keywords

Deep convolutional nets Fall recognition Empirical study 

Notes

Acknowledgements

This work is supported by a grant of the Federal Ministry of Education and Research of Germany (BMBF) for the project of SenseEmotion.

References

  1. 1.
    Ancona, M., Ceolini, E., Öztireli, C., Gross, M.: A unified view of gradient-based attribution methods for deep neural networks. arXiv preprint arXiv:1711.06104 (2017)
  2. 2.
    Anderson, D., Keller, J.M., Skubic, M., Chen, X., He, Z.: Recognizing falls from silhouettes. In: Proceedings of the 28th IEEE EMBS Annual International Conference, pp. 6388–6391. IEEE (2006)Google Scholar
  3. 3.
    Babiker, H.K.B., Goebel, R.: An introduction to deep visual explanation. arXiv preprint arXiv:1711.09482 (2017)
  4. 4.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  5. 5.
    Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  6. 6.
    Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imag. Vis. 20(1–2), 89–97 (2004)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Charfi, I., Miteran, J., Dubois, J., Atri, M., Tourki, R.: Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and adaboost-based classification. J. Electron. Imag. 22(4), 041106 (2013)CrossRefGoogle Scholar
  8. 8.
    Dykes, P.C., et al.: Fall prevention in acute care hospitals: a randomized trial. Jama 304(17), 1912–1918 (2010)CrossRefGoogle Scholar
  9. 9.
    Gillain, S., Elbouz, L., Beaudart, C., Bruyère, O., Reginster, J., Petermans, J.: Falls in the elderly. Revue medicale de Liege 69(5–6), 258–264 (2014)Google Scholar
  10. 10.
    Gkioxari, G., Malik, J.: Finding action tubes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 759–768. IEEE (2015)Google Scholar
  11. 11.
    Graves, A., Fernández, S., Schmidhuber, J.: Bidirectional LSTM networks for improved phoneme classification and recognition. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 799–804. Springer, Heidelberg (2005).  https://doi.org/10.1007/11550907_126CrossRefGoogle Scholar
  12. 12.
    Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: dense human pose estimation in the wild. arXiv preprint arXiv:1802.00434 (2018)
  13. 13.
    Igual, R., Medrano, C., Plaza, I.: Challenges, issues and trends in fall detection systems. Biomed. Eng. Online 12(1), 66 (2013)CrossRefGoogle Scholar
  14. 14.
    Insafutdinov, E., et al.: Arttrack: articulated multi-person tracking in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1293–1301 (2017)Google Scholar
  15. 15.
    Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: DeeperCut: a deeper, stronger, and faster multi-person pose estimation model. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 34–50. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_3CrossRefGoogle Scholar
  16. 16.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  17. 17.
    Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1003–1012, July 2017Google Scholar
  18. 18.
    Lea, C., Reiter, A., Vidal, R., Hager, G.D.: Segmental spatiotemporal CNNs for fine-grained action segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 36–52. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_3CrossRefGoogle Scholar
  19. 19.
    Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_7CrossRefGoogle Scholar
  20. 20.
    Li, C., Zhang, Z., Lee, W.S., Lee, G.H.: Convolutional sequence to sequence model for human dynamics. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5226–5234 (2018)Google Scholar
  21. 21.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 3 (2013)Google Scholar
  22. 22.
    Mubashir, M., Shao, L., Seed, L.: A survey on fall detection: principles and approaches. Neurocomputing 100, 144–152 (2013)CrossRefGoogle Scholar
  23. 23.
    Neverova, N., Wolf, C., Taylor, G.W., Nebout, F.: Multi-scale deep learning for gesture detection and localization. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 474–490. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16178-5_33CrossRefGoogle Scholar
  24. 24.
    Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Vision-based fall detection with convolutional neural networks. Wirel. Commun. Mob. Comput. 2017 (2017)CrossRefGoogle Scholar
  25. 25.
    Piccardi, M.: Background subtraction techniques: a review. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3099–3104. IEEE (2004)Google Scholar
  26. 26.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  27. 27.
    Rougier, C., Meunier, J., St-Arnaud, A., Rousseau, J.: Robust video surveillance for fall detection based on human shape deformation. IEEE Trans. Circ. Syst. Video Technol. 21(5), 611–622 (2011)CrossRefGoogle Scholar
  28. 28.
    Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1417–1426 (2017)Google Scholar
  29. 29.
    Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017)
  30. 30.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  31. 31.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)Google Scholar
  32. 32.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  33. 33.
    Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1961–1970. IEEE (2016)Google Scholar
  34. 34.
    Solbach, M.D., Tsotsos, J.K.: Vision-based fallen person detection for the elderly. arXiv preprint arXiv:1707.07608 (2017)
  35. 35.
    Stone, E.E., Skubic, M.: Fall detection in homes of older adults using the microsoft kinect. IEEE J. Biomed. Health Inf. 19(1), 290–301 (2015)CrossRefGoogle Scholar
  36. 36.
    Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365 (2017)
  37. 37.
    Tang, S., Andriluka, M., Andres, B., Schiele, B.: Multiple people tracking by lifted multicut and person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3539–3548 (2017)Google Scholar
  38. 38.
    Töreyin, B.U., Dedeoğlu, Y., Çetin, A.E.: HMM based falling person detection using both audio and video. In: Sebe, N., Lew, M., Huang, T.S. (eds.) HCI 2005. LNCS, vol. 3766, pp. 211–220. Springer, Heidelberg (2005).  https://doi.org/10.1007/11573425_21CrossRefGoogle Scholar
  39. 39.
    Vishwakarma, V., Mandal, C., Sural, S.: Automatic detection of human fall in video. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 616–623. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-77046-6_76CrossRefGoogle Scholar
  40. 40.
    Wang, S., Chen, L., Zhou, Z., Sun, X., Dong, J.: Human fall detection in surveillance video based on pcanet. Multimed. Tools Appl. 75(19), 11603–11613 (2016)CrossRefGoogle Scholar
  41. 41.
    Wu, F., Zhao, H., Zhao, Y., Zhong, H.: Development of a wearable-sensor-based fall detection system. Int. J. Telemedicine Appl. 2015, 2 (2015)Google Scholar
  42. 42.
    Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
  43. 43.
    Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2678–2687 (2016)Google Scholar
  44. 44.
    Zhang, T., Wang, J., Xu, L., Liu, P.: Fall detection by wearable sensor and one-class SVM algorithm. In: Huang, D.S., Li, K., Irwin, G.W. (eds.) Intelligent Computing in Signal Processing and Pattern Recognition. Lecture Notes in Control and Information Sciences, vol. 345, pp. 858–863. Springer, Heidelberg (2006).  https://doi.org/10.1007/978-3-540-37258-5_104CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations