Skip to main content

Learning Sparse Features with an Auto-Associator

  • Chapter
  • First Online:
  • 1684 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 557))

Abstract

A major issue in statistical machine learning is the design of a representation, or feature space, facilitating the resolution of the learning task at hand. Sparse representations in particular facilitate discriminant learning: On the one hand, they are robust to noise. On the other hand, they disentangle the factors of variation mixed up in dense representations, favoring the separability and interpretation of data. This chapter focuses on auto-associators (AAs), i.e. multi-layer neural networks trained to encode/decode the data and thus de facto defining a feature space. AAs, first investigated in the 80s, were recently reconsidered as building blocks for deep neural networks. This chapter surveys related work about building sparse representations, and presents a new non-linear explicit sparse representation method referred to as Sparse Auto-Associator (SAA), integrating a sparsity objective within the standard auto-associator learning criterion. The comparative empirical validation of SAAs on state-of-art handwritten digit recognition benchmarks shows that SAAs outperform standard auto-associators in terms of classification performance and yield similar results as denoising auto-associators. Furthermore, SAAs enable to control the representation size to some extent, through a conservative pruning of the feature space.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Original MNIST database: http://yann.lecun.com/exdb/mnist/.

  2. 2.

    MNIST variants site: http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/Mnist Vari-ations.

  3. 3.

    The probabilistic sparsification heuristics has been experimented too and found to yield similar results (omitted for the sake of brevity).

  4. 4.

    All statistical tests are heteroscedastic bilateral T tests. A difference is considered significant if the p-value is less than \(0.001\).

  5. 5.

    Complementary experiments, varying the pruning threshold in a range around 0, yield same performance (results omitted for brevity).

References

  1. M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54, 4311–4322 (2006)

    Google Scholar 

  2. P. Baldi, K. Hornik, Neural networks and principal component analysis: learning from examples without local minima. Neural Networks 2, 53–58 (1989)

    Article  Google Scholar 

  3. Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)

    Article  MATH  Google Scholar 

  4. Y. Bengio, P. Lamblin, V. Popovici, H. Larochelle, in Neural Information Processing Systems (NIPS). Greedy Layer-wise Training of Deep Networks (2007), pp. 1–8

    Google Scholar 

  5. H. Bourlard, Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59, 291–294 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  6. E.J. Candès, The restricted isometry property and its implications for compressed sensing. Comptes Rendus de l’Académie des Sci. 346, 589–592 (2008)

    Article  MATH  Google Scholar 

  7. S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)

    Article  MathSciNet  Google Scholar 

  8. A. Coates, A.Y. Karpathy, A. ans Ng, in Neural Information Processing Systems (NIPS). Emergence of Object-Selective Features in Unsupervised Feature Learning (2012)

    Google Scholar 

  9. D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via \(\ell ^1\) minimization. Proc. Nat. Acad. Sci. U.S.A. 100, 2197–2202 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  10. X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks. in International Conference on Artificial Intelligence and Statistics (AISTATS) (2011), pp. 315–323

    Google Scholar 

  11. H. Goh, N. Thome, M. Cord, Biasing restricted Boltzmann machines to manipulate latent selectivity and sparsity. in Neural Information Processing Systems (NIPS): Workshop on Deep Learning and Unsupervised Feature, Learning (2010), pp. 1–8

    Google Scholar 

  12. K. Gregor, Y. LeCun, Learning fast approximations of sparse coding. in International Conference on Machine Learning (ICML) (2010), pp. 399–406

    Google Scholar 

  13. G.E. Hinton, Connectionist learning procedures. Artif. Intell. 40, 185–234 (1989)

    Article  Google Scholar 

  14. G.E. Hinton, A practical guide to training restricted Boltzmann machines. Technical Report UTML TR 2010–003, University of Toronto (2010)

    Google Scholar 

  15. G.E. Hinton, S. Osindero, Y.-W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  16. G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors. in Neural information processing systems (NIPS) (2012) arxiv.org/abs/1207.0580v1 [cs.NE] 3 July 2012

    Google Scholar 

  17. N. Japkowicz, S.J. Hanson, M.A. Gluck, Nonlinear autoassociation is not equivalent to PCA. Neural Comput. 12, 531–545 (2000)

    Article  Google Scholar 

  18. K. Kavukcuoglu, M.A. Ranzato, Y. LeCun, Fast inference in sparse coding algorithms with applications to object recognition. in Neural Information Processing Systems (NIPS): Workshop on Optimization for Machine Learning (2008) arXiv:1010.3467v1 [cs.CV] 18 Oct 2010

  19. H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin, Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10, 1–40 (2009)

    MATH  Google Scholar 

  20. H. Larochelle, D. Erhan, A. Courville, J. Bergstra, Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In International conference on machine learning (ICML) (2007), pp. 473–480

    Google Scholar 

  21. Y. LeCun, Learning invariant feature hierarchies. in European Conference in Computer Vision (ECCV). Lecture Notes in Computer Science, vol. 7583. (Springer, New York, 2012), pp. 496–505

    Google Scholar 

  22. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  23. Y. LeCun, L. Bottou, G.B. Orr, K.-R. Müller, Efficient backprop. in Neural Networks: Tricks of the Trade (1998). pp. 9–50

    Google Scholar 

  24. H. Lee, C. Ekanadham, A.Y. Ng, Sparse deep belief net model for visual area V2. in Neural Information Processing Systems (NIPS) (2007), pp. 873–880

    Google Scholar 

  25. H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. in International Conference on Machine Learning (ICML) (2009), p. 77

    Google Scholar 

  26. J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online dictionary learning for sparse coding. in International Conference on Machine Learning (ICML) (2009), pp. 689–696

    Google Scholar 

  27. A. Rakotomamonjy, Surveying and comparing simultaneous sparse approximation (or group-LASSO) algorithms. Signal Process. 91, 1505–1526 (2011)

    Article  MATH  Google Scholar 

  28. M.A. Ranzato, F.-J. Huang, Y.-L. Boureau, Y. LeCun, Unsupervised learning of invariant feature hierarchies with applications to object recognition. in Computer Vision and Pattern Recognition (CVPR) (2007), pp. 1–8

    Google Scholar 

  29. D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  Google Scholar 

  30. J.A. Tropp, Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50, 2231–2242 (2004)

    Article  MathSciNet  Google Scholar 

  31. P.E. Utgoff, D.J. Stracuzzi, Many-layered learning. Neural Comput. 14, 2497–2539 (2002)

    Article  MATH  Google Scholar 

  32. V.N. Vapnik, Statistical Learning Theory (Wiley, New York, 1998)

    Google Scholar 

  33. P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders. in International Conference on Machine Learning (ICML) (2008), pp. 1096–1103

    Google Scholar 

Download references

Acknowledgments

This work was supported by ANR (the French National Research Agency) as part of the ASAP project under grant ANR_09_EMER_001_04.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hélène Paugam-Moisy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rebecchi, S., Paugam-Moisy, H., Sebag, M. (2014). Learning Sparse Features with an Auto-Associator. In: Kowaliw, T., Bredeche, N., Doursat, R. (eds) Growing Adaptive Machines. Studies in Computational Intelligence, vol 557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55337-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55337-0_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55336-3

  • Online ISBN: 978-3-642-55337-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics