Advertisement

Recognizing People in Blind Spots Based on Surrounding Behavior

  • Kensho HaraEmail author
  • Hirokatsu Kataoka
  • Masaki Inaba
  • Kenichi Narioka
  • Yutaka Satoh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

Recent advances in computer vision have achieved remarkable performance improvements. These technologies mainly focus on recognition of visible targets. However, there are many invisible targets in blind spots in real situations. Humans may be able to recognize such invisible targets based on contexts (e.g. visible human behavior and environments) around the targets, and used such recognition to predict situations in blind spots on a daily basis. As the first step towards recognizing targets in blind spots captured in videos, we propose a convolutional neural network that recognizes whether or not there is a person in a blind spot. Based on the experiments that used the volleyball dataset, which includes various interactions of players, with artificial occlusions, our proposed method achieved 90.3% accuracy in the recognition.

Keywords

Action recognition Convolutional Neural Networks 

References

  1. 1.
    Baradad, M., et al.: Inferring light fields from shadows. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6267–6275 (2018)Google Scholar
  2. 2.
    Bouman, K.L., et al.: Turning corners into cameras: Principles and methods. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 2289–2297 (2017)Google Scholar
  3. 3.
    Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the Kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733 (2017)Google Scholar
  4. 4.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  5. 5.
    Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6047–6056, June 2018Google Scholar
  6. 6.
    Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imageNet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6546–6555 (2018)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  8. 8.
    He, Y., Shirakabe, S., Satoh, Y., Kataoka, H.: Human action recognition without human. In: Proceedings of the ECCV Workshop on Brave New Ideas for Motion Representations in Videos, pp. 11–17 (2016)Google Scholar
  9. 9.
    Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1980 (2016)Google Scholar
  10. 10.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, pp. 448–456 (2015)Google Scholar
  11. 11.
    Kay, W., et al.: The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)Google Scholar
  13. 13.
    Mak, L.C., Furukawa, T.: Non-line-of-sight localization of a controlled sound source. In: Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics, pp. 475–480 (2009)Google Scholar
  14. 14.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the International Conference on Machine Learning, pp. 807–814. Omnipress (2010)Google Scholar
  15. 15.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the International Conference on Computer Vision (ICCV), pp. 4489–4497 (2015)Google Scholar
  16. 16.
    Zhao, M., et al.: Through-wall human pose estimation using radio signals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7356–7365 (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Kensho Hara
    • 1
    Email author
  • Hirokatsu Kataoka
    • 1
  • Masaki Inaba
    • 2
  • Kenichi Narioka
    • 2
  • Yutaka Satoh
    • 1
  1. 1.National Institute of Advanced Industrial Science and Technology (AIST)TsukubaJapan
  2. 2.DENSO CORPORATIONChuo-kuJapan

Personalised recommendations