Compressing the Input for CNNs with the First-Order Scattering Transform

  • Edouard OyallonEmail author
  • Eugene Belilovsky
  • Sergey Zagoruyko
  • Michal Valko
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)


We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small translation invariance, this transform preserves most of the signal information needed for classification while substantially reducing the spatial resolution and total signal size. We demonstrate that cascading a CNN with this representation performs on par with ImageNet classification models, commonly used in downstream tasks, such as the ResNet-50. We subsequently apply our trained hybrid ImageNet model as a base model on a detection system, which has typically larger image inputs. On Pascal VOC and COCO detection tasks we demonstrate improvements in the inference speed and training memory consumption compared to models trained directly on the input image.


CNN SIFT Image descriptors First-order scattering 



E. Oyallon was supported by a GPU donation from NVIDIA and partially supported by a grant from the DPEI of Inria (AAR 2017POD057) for the collaboration with CWI. S. Zagoruyko was supported by the DGA RAPID project DRAAF. The research presented was also supported by European CHIST-ERA project DELTA, French Ministry of Higher Education and Research, Nord-Pas-de-Calais Regional Council, Inria and Otto-von-Guericke-Universität Magdeburg associated-team north-European project Allocate, and French National Research Agency projects ExTra-Learn (n.ANR-14-CE24-0010-01) and BoB (n.ANR-16-CE23-0003).


  1. 1.
    LeCun, Y., Kavukcuoglu, K., Farabet, C., et al.: Convolutional networks and applications in vision. In: International Symposium on Circuits and Systems, pp. 253–256 (2010)Google Scholar
  2. 2.
    Williams, T., Li, R.: Wavelet pooling for convolutional neural networks. In: International Conference on Learning Representations (2018)Google Scholar
  3. 3.
    Rippel, O., Snoek, J., Adams, R.P.: Spectral representations for convolutional neural networks. In: Neural Information Processing Systems, pp. 2449–2457 (2015)Google Scholar
  4. 4.
    Le, Q.V.: Building high-level features using large scale unsupervised learning. In: International Conference on Acoustics, Speech and Signal Processing, pp. 8595–8598 (2013)Google Scholar
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  6. 6.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  8. 8.
    Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, Upper Saddle River (2002)Google Scholar
  9. 9.
    Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press, Cambridge (1999)zbMATHGoogle Scholar
  10. 10.
    Mallat, S., Hwang, W.L.: Singularity detection and processing with wavelets. Trans. Inf. Theory 38(2), 617–643 (1992)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. Signal Process. Mag. 18(5), 36–58 (2001)CrossRefGoogle Scholar
  12. 12.
    Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: a hybrid classification architecture. In: Computer Vision and Pattern Recognition, pp. 3743–3752 (2015)Google Scholar
  13. 13.
    Fujieda, S., Takayama, K., Hachisuka, T.: Wavelet convolutional neural networks for texture classification. arXiv:1707.07394 (2017)
  14. 14.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Computer Vision and Pattern Recognition (2017)Google Scholar
  15. 15.
    Levinskis, A.: Convolutional neural network feature reduction using wavelet transform. Elektronika ir Elektrotech. 19(3), 61–64 (2013)CrossRefGoogle Scholar
  16. 16.
    Gueguen, L., Sergeev, A., Liu, R., Yosinski, J.: Faster neural networks straight from JPEG. In: International Conference on Learning Representations Workshop (2018)Google Scholar
  17. 17.
    Oyallon, E., Belilovsky, E., Zagoruyko, S.: Scaling the scattering transform: deep hybrid networks. In: International Conference on Computer Vision (2017)Google Scholar
  18. 18.
    Bruna, J., Mallat, S.: Invariant scattering convolution networks. Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)CrossRefGoogle Scholar
  19. 19.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Morel, J.M., Yu, G.: ASIFT: a new framework for fully affine invariant image comparison. J. Imaging Sci. 2(2), 438–469 (2009)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607 (1996)CrossRefGoogle Scholar
  22. 22.
    Mallat, S.: Group invariant scattering. Commun. Pure Appl. Math. 65(10), 1331–1398 (2012)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Mallat, S., Waldspurger, I.: Phase retrieval for the cauchy wavelet transform. J. Fourier Anal. Appl. 21(6), 1251–1309 (2015)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Waldspurger, I., d’Aspremont, A., Mallat, S.: Phase recovery, maxcut and complex semidefinite programming. Math. Program. 149(1–2), 47–81 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Krajsek, K., Mester, R.: A unified theory for steerable and quadrature filters. In: Braz, J., Ranchordas, A., Araújo, H., Jorge, J. (eds.) GRAPP/VISAPP-2006. CCIS, vol. 4, pp. 201–214. Springer, Heidelberg (2007). Scholar
  26. 26.
    Soulard, R.: Ondelettes analytiques et monogènes pour la représentation des images couleur. Ph.D. thesis, Université de Poitiers (2012)Google Scholar
  27. 27.
    Delprat, N., Escudié, B., Guillemain, P., Kronland-Martinet, R., Tchamitchian, P., Torresani, B.: Asymptotic wavelet and gabor analysis: extraction of instantaneous frequencies. Trans. Inf. Theory 38(2), 644–664 (1992)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Bruna, J., Mallat, S.: Audio texture synthesis with scattering moments. arXiv:1311.0407 (2013)
  29. 29.
    Bruna, J.: Scattering representations for recognition. Ph.D. thesis, École Polytechnique (2013)Google Scholar
  30. 30.
    Lindeberg, T.: Feature detection with automatic scale selection. Int. J. Comput. Vis. 30(2), 79–116 (1998)CrossRefGoogle Scholar
  31. 31.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: British Machine Vision Conference (2016)Google Scholar
  33. 33.
    Torfason, R., Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., Van Gool, L.: Towards image understanding from deep compression without decoding. arXiv:1803.06131 (2018)
  34. 34.
    Yang, J., Lu, J., Batra, D., Parikh, D.: A faster pytorch implementation of faster R-CNN. (2017)
  35. 35.
    Girshick, R.B.: Fast R-CNN. In: International Conference on Computer Vision, pp. 1440–1448 (2015)Google Scholar
  36. 36.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)CrossRefGoogle Scholar
  37. 37.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
  38. 38.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  39. 39.
    Paszke, A., et al.: Automatic differentiation in pytorch (2017)Google Scholar
  40. 40.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: International Conference Computer Vision, pp. 2980–2988 (2017)Google Scholar
  41. 41.
    Lin, T.-Y.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Edouard Oyallon
    • 1
    • 4
    • 5
    Email author
  • Eugene Belilovsky
    • 2
  • Sergey Zagoruyko
    • 3
  • Michal Valko
    • 4
  1. 1.CentraleSupelecUniversité Paris-SaclayGif-sur-YvetteFrance
  2. 2.MILAUniversity of MontrealMontrealCanada
  3. 3.WILLOW – Inria ParisParisFrance
  4. 4.SequeL – Inria LilleLilleFrance
  5. 5.GALEN – Inria SaclayPalaiseauFrance

Personalised recommendations