Learning Sparse Features with an Auto-Associator

  • Sébastien Rebecchi
  • Hélène  Paugam-MoisyEmail author
  • Michèle Sebag
Part of the Studies in Computational Intelligence book series (SCI, volume 557)


A major issue in statistical machine learning is the design of a representation, or feature space, facilitating the resolution of the learning task at hand. Sparse representations in particular facilitate discriminant learning: On the one hand, they are robust to noise. On the other hand, they disentangle the factors of variation mixed up in dense representations, favoring the separability and interpretation of data. This chapter focuses on auto-associators (AAs), i.e. multi-layer neural networks trained to encode/decode the data and thus de facto defining a feature space. AAs, first investigated in the 80s, were recently reconsidered as building blocks for deep neural networks. This chapter surveys related work about building sparse representations, and presents a new non-linear explicit sparse representation method referred to as Sparse Auto-Associator (SAA), integrating a sparsity objective within the standard auto-associator learning criterion. The comparative empirical validation of SAAs on state-of-art handwritten digit recognition benchmarks shows that SAAs outperform standard auto-associators in terms of classification performance and yield similar results as denoising auto-associators. Furthermore, SAAs enable to control the representation size to some extent, through a conservative pruning of the feature space.


Feature Space Sparse Representation Sparse Code Deep Neural Network Dictionary Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by ANR (the French National Research Agency) as part of the ASAP project under grant ANR_09_EMER_001_04.


  1. 1.
    M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54, 4311–4322 (2006)Google Scholar
  2. 2.
    P. Baldi, K. Hornik, Neural networks and principal component analysis: learning from examples without local minima. Neural Networks 2, 53–58 (1989)CrossRefGoogle Scholar
  3. 3.
    Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)CrossRefzbMATHGoogle Scholar
  4. 4.
    Y. Bengio, P. Lamblin, V. Popovici, H. Larochelle, in Neural Information Processing Systems (NIPS). Greedy Layer-wise Training of Deep Networks (2007), pp. 1–8Google Scholar
  5. 5.
    H. Bourlard, Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59, 291–294 (1988)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    E.J. Candès, The restricted isometry property and its implications for compressed sensing. Comptes Rendus de l’Académie des Sci. 346, 589–592 (2008)CrossRefzbMATHGoogle Scholar
  7. 7.
    S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)CrossRefMathSciNetGoogle Scholar
  8. 8.
    A. Coates, A.Y. Karpathy, A. ans Ng, in Neural Information Processing Systems (NIPS). Emergence of Object-Selective Features in Unsupervised Feature Learning (2012)Google Scholar
  9. 9.
    D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via \(\ell ^1\) minimization. Proc. Nat. Acad. Sci. U.S.A. 100, 2197–2202 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks. in International Conference on Artificial Intelligence and Statistics (AISTATS) (2011), pp. 315–323Google Scholar
  11. 11.
    H. Goh, N. Thome, M. Cord, Biasing restricted Boltzmann machines to manipulate latent selectivity and sparsity. in Neural Information Processing Systems (NIPS): Workshop on Deep Learning and Unsupervised Feature, Learning (2010), pp. 1–8Google Scholar
  12. 12.
    K. Gregor, Y. LeCun, Learning fast approximations of sparse coding. in International Conference on Machine Learning (ICML) (2010), pp. 399–406Google Scholar
  13. 13.
    G.E. Hinton, Connectionist learning procedures. Artif. Intell. 40, 185–234 (1989)CrossRefGoogle Scholar
  14. 14.
    G.E. Hinton, A practical guide to training restricted Boltzmann machines. Technical Report UTML TR 2010–003, University of Toronto (2010)Google Scholar
  15. 15.
    G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors. in Neural information processing systems (NIPS) (2012) [cs.NE] 3 July 2012Google Scholar
  17. 17.
    N. Japkowicz, S.J. Hanson, M.A. Gluck, Nonlinear autoassociation is not equivalent to PCA. Neural Comput. 12, 531–545 (2000)CrossRefGoogle Scholar
  18. 18.
    K. Kavukcuoglu, M.A. Ranzato, Y. LeCun, Fast inference in sparse coding algorithms with applications to object recognition. in Neural Information Processing Systems (NIPS): Workshop on Optimization for Machine Learning (2008) arXiv:1010.3467v1 [cs.CV] 18 Oct 2010
  19. 19.
    H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin, Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10, 1–40 (2009)zbMATHGoogle Scholar
  20. 20.
    H. Larochelle, D. Erhan, A. Courville, J. Bergstra, Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In International conference on machine learning (ICML) (2007), pp. 473–480Google Scholar
  21. 21.
    Y. LeCun, Learning invariant feature hierarchies. in European Conference in Computer Vision (ECCV). Lecture Notes in Computer Science, vol. 7583. (Springer, New York, 2012), pp. 496–505Google Scholar
  22. 22.
    Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  23. 23.
    Y. LeCun, L. Bottou, G.B. Orr, K.-R. Müller, Efficient backprop. in Neural Networks: Tricks of the Trade (1998). pp. 9–50Google Scholar
  24. 24.
    H. Lee, C. Ekanadham, A.Y. Ng, Sparse deep belief net model for visual area V2. in Neural Information Processing Systems (NIPS) (2007), pp. 873–880Google Scholar
  25. 25.
    H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. in International Conference on Machine Learning (ICML) (2009), p. 77Google Scholar
  26. 26.
    J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online dictionary learning for sparse coding. in International Conference on Machine Learning (ICML) (2009), pp. 689–696Google Scholar
  27. 27.
    A. Rakotomamonjy, Surveying and comparing simultaneous sparse approximation (or group-LASSO) algorithms. Signal Process. 91, 1505–1526 (2011)CrossRefzbMATHGoogle Scholar
  28. 28.
    M.A. Ranzato, F.-J. Huang, Y.-L. Boureau, Y. LeCun, Unsupervised learning of invariant feature hierarchies with applications to object recognition. in Computer Vision and Pattern Recognition (CVPR) (2007), pp. 1–8Google Scholar
  29. 29.
    D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  30. 30.
    J.A. Tropp, Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50, 2231–2242 (2004)CrossRefMathSciNetGoogle Scholar
  31. 31.
    P.E. Utgoff, D.J. Stracuzzi, Many-layered learning. Neural Comput. 14, 2497–2539 (2002)CrossRefzbMATHGoogle Scholar
  32. 32.
    V.N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998)Google Scholar
  33. 33.
    P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders. in International Conference on Machine Learning (ICML) (2008), pp. 1096–1103Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Sébastien Rebecchi
    • 1
  • Hélène  Paugam-Moisy
    • 1
    Email author
  • Michèle Sebag
    • 1
    • 2
  1. 1.CNRS, LRI UMR 8623, TAO, INRIA SaclayUniversité Paris-Sud 11OrsayFrance
  2. 2.CNRS, LIRIS UMR 5205Université Lumière Lyon 2BronFrance

Personalised recommendations