Advertisement

Data Driven Sensing for Action Recognition Using Deep Convolutional Neural Networks

  • Ronak GuptaEmail author
  • Prashant Anand
  • Vinay Kaushik
  • Santanu Chaudhury
  • Brejesh Lall
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11941)

Abstract

Tasks such as action recognition requires high quality features for accurate inference. But the use of high resolution and large volume of video data poses a significant challenge for inference in terms of storage and computational complexity. In addition, compressive sensing as a potential solution to the aforementioned problems has been shown to recover signals at higher compression ratios with loss in information. Hence, a framework is required that performs good quality action recognition on compressively sensed data. In this paper, we present data-driven sensing for spatial multiplexers trained with combined mean square error (MSE) and perceptual loss using Deep convolutional neural networks. We employ subpixel convolutional layers with the 2D Convolutional Encoder-Decoder model, that learns the downscaling filters to bring the input from higher dimension to lower dimension in encoder and learns the reverse, i.e. upscaling filters in the decoder. We stack this Encoder with Inflated 3D ConvNet and train the cascaded network with cross-entropy loss for Action recognition. After encoding data and undersampling it by over 100 times (10 \(\times \) 10) from the input size, we obtain 75.05% accuracy on UCF-101 and 50.39% accuracy on HMDB-51 with our proposed architecture setting the baseline for reconstruction free action recognition with data-driven sensing using deep learning. We experimentally infer that the encoded information from such spatial multiplexers can directly be used for action recognition.

Keywords

Data driven compressive sensing (CS) 3D Deep Convolutional Neural Networks (DCNN) Perceptual compression Reconstruction-free action recognition 

Notes

Acknowledgment

The NVIDIA DGX-1 for experiments was provided by CSIR-CEERI, Pilani, India

References

  1. 1.
    Mousavi, A., Dasarathy, G., Baraniuk, R.G.: DeepCodec: adaptive sensing and recovery via deep convolutional neural networks. arXiv preprint arXiv:1707.03386 (2017)
  2. 2.
    Xu, K., Ren, F.: CSVideoNet: a real-time end-to-end learning framework for high-frame-rate video compressive sensing. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1680–1688. IEEE (2018)Google Scholar
  3. 3.
    Lohit, S., Singh, R., Kulkarni, K., Turaga, P.: Rate-adaptive neural networks for spatial multiplexers. arXiv preprint arXiv:1809.02850 (2018)
  4. 4.
    Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)Google Scholar
  5. 5.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  6. 6.
    Blau, Y., Mechrez, R., Timofte, R., Michaeli, T., Zelnik-Manor, L.: The 2018 PIRM challenge on perceptual image super-resolution. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11133, pp. 334–355. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-11021-5_21CrossRefGoogle Scholar
  7. 7.
    Mousavi, A., Patel, A.B., Baraniuk, R.G.: A deep learning approach to structured signal recovery. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1336–1343. IEEE (2015)Google Scholar
  8. 8.
    Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., Ashok, A.: Reconnet: non-iterative reconstruction of images from compressively sensed measurements. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 449–458 (2016)Google Scholar
  9. 9.
    Kulkarni, K., Turaga, P.: Recurrence textures for human activity recognition from compressive cameras. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 1417–1420. IEEE (2012)Google Scholar
  10. 10.
    Kulkarni, K., Turaga, P.: Reconstruction-free action inference from compressive imagers. IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 772–784 (2016)CrossRefGoogle Scholar
  11. 11.
    Adler, A., Elad, M., Zibulevsky, M.: Compressed learning: a deep neural network approach. arXiv preprint arXiv:1610.09615 (2016)
  12. 12.
    Lohit, S., Kulkarni, K., Turaga, P.: Direct inference on compressive measurements using convolutional neural networks. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1913–1917. IEEE (2016)Google Scholar
  13. 13.
    Zisselman, E., Adler, A., Elad, M.: Compressed learning for image classification: a deep neural network approach. Process. Anal. Learn. Images Shapes Forms 19, 1 (2018)MathSciNetGoogle Scholar
  14. 14.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  15. 15.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  16. 16.
    Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE (2017)Google Scholar
  17. 17.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
  18. 18.
    Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2556–2563. IEEE (2011)Google Scholar
  19. 19.
    Needell, D., Tropp, J.A.: Cosamp: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)Google Scholar
  21. 21.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)Google Scholar
  22. 22.
    Chadha, A., Abbas, A., Andreopoulos, Y.: Compressed-domain video classification with deep neural networks: there’s way too much information to decode the matrix. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 1832–1836. IEEE (2017)Google Scholar
  23. 23.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). http://tensorflow.org/, software available from tensorflow.org
  24. 24.
    Nvidia gpu cloud tensorflow. nVIDIA offers GPU accelerated containers via NVIDIA GPU Cloud (NGC) for use on DGX systems. https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ronak Gupta
    • 1
    Email author
  • Prashant Anand
    • 1
  • Vinay Kaushik
    • 1
  • Santanu Chaudhury
    • 1
    • 2
  • Brejesh Lall
    • 1
  1. 1.Department of Electrical EngineeringIndian Institute of Technology DelhiNew DelhiIndia
  2. 2.Indian Institute of Technology JodhpurJodhpurIndia

Personalised recommendations