Advertisement

SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

  • Benjamin Coors
  • Alexandru Paul Condurache
  • Andreas Geiger
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)

Abstract

Omnidirectional cameras offer great benefits over classical cameras wherever a wide field of view is essential, such as in virtual reality applications or in autonomous robots. Unfortunately, standard convolutional neural networks are not well suited for this scenario as the natural projection surface is a sphere which cannot be unwrapped to a plane without introducing significant distortions, particularly in the polar regions. In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural network models to the omnidirectional case. We demonstrate the effectiveness of our method on the tasks of image classification and object detection, exploiting two newly created semi-synthetic and real-world omnidirectional datasets.

Supplementary material

474192_1_En_32_MOESM1_ESM.pdf (2.8 mb)
Supplementary material 1 (pdf 2895 KB)

References

  1. 1.
    Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35(8), 1872–1886 (2013)CrossRefGoogle Scholar
  2. 2.
    Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv.org 1512.03012 (2015)
  3. 3.
    Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: International Conference on Learning Representations (2018)Google Scholar
  4. 4.
    Cohen, T.S., Welling, M.: Group equivariant convolutional networks. In: Proceedings of the International Conference on Machine learning (ICML) (2016)Google Scholar
  5. 5.
    Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV). arXiv.org abs/ 1703.06211 (2017)
  6. 6.
    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems (NIPS) (2016)Google Scholar
  7. 7.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  9. 9.
    Henriques, J.F., Vedaldi, A.: Warped convolutions: efficient invariance to spatial transformations. In: Proceedings of the International Conference on Machine learning (ICML) (2017)Google Scholar
  10. 10.
    Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through \(360^\circ \) sports video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  11. 11.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  12. 12.
    Jeon, Y., Kim, J.: Active convolution: learning the shape of convolution for image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  13. 13.
    Jia, Y.: Learning semantic image representations at a large scale. Ph.D. thesis, EECS Department, University of California, Berkeley, May 2014. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-93.html
  14. 14.
    Khasanova, R., Frossard, P.: Graph-based classification of omnidirectional images. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops (2017)Google Scholar
  15. 15.
    Khasanova, R., Frossard, P.: Graph-based isometry invariant representation learning. In: Proceedings of the International Conference on Machine learning (ICML) (2017)Google Scholar
  16. 16.
    Lai, W., Huang, Y., Joshi, N., Buehler, C., Yang, M., Kang, S.B.: Semantic-driven generation of hyperlapse from \(360^\circ \) video (2017)Google Scholar
  17. 17.
    Liu, W.: SSD: single shot multiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  18. 18.
    Ma, J., Wang, W., Wang, L.: Irregular convolutional neural networks (2017)Google Scholar
  19. 19.
    Monroy, R., Lutz, S., Chalasani, T., Smolic, A.: SalNet360: saliency maps for omni-directional images with CNN. In: ICME (2017)Google Scholar
  20. 20.
    Pearson, F.: Map Projections: Theory and Applications. Taylor & Francis, London (1990)Google Scholar
  21. 21.
    Ran, L., Zhang, Y., Zhang, Q., Yang, T.: Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors 17(6), 1341 (2017)CrossRefGoogle Scholar
  22. 22.
    Ruder, M., Dosovitskiy, A., Brox, T.: Artistic style transfer for videos and spherical images. arXiv.org abs/1708.04538 (2017)
  23. 23.
    Russakovsky, O.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Saff, E.B., Kuijlaars, A.B.J.: Distributing many points on a sphere. Math. Intell. 19(1), 5–11 (1997)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Sifre, L., Mallat, S.: Rotation, scaling and deformation invariant scattering for texture discrimination. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)Google Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)Google Scholar
  27. 27.
    Su, Y.C., Grauman, K.: Making \(360^\circ \) video watchable in 2D: learning videography for click free viewing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  28. 28.
    Su, Y.C., Jayaraman, D., Grauman, K.: Pano2Vid: automatic cinematography for watching \(360^\circ \) videos. In: Proceedings of the Asian Conference on Computer Vision (ACCV) (2016)Google Scholar
  29. 29.
    Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  30. 30.
    Yu-Chuan Su, K.G.: Flat2sphere: learning spherical convolution for fast features from \(360^\circ \) imagery. In: Advances in Neural Information Processing Systems (NIPS) (2017)Google Scholar
  31. 31.
    Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Oriented response networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Benjamin Coors
    • 1
    • 3
  • Alexandru Paul Condurache
    • 2
    • 3
  • Andreas Geiger
    • 1
  1. 1.Autonomous Vision GroupMPI for Intelligent Systems and University of TübingenTübingenGermany
  2. 2.Institute for Signal ProcessingUniversity of LübeckLübeckGermany
  3. 3.Robert Bosch GmbHStuttgartGermany

Personalised recommendations