Semi Supervised Autoencoders: Better Focusing Model Capacity during Feature Extraction

  • Hani Almousli
  • Pascal Vincent
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8226)


Previous work showed that unsupervised layer-wise pre-training can be used to overcome the difficulty of training a deep architecture. In this paper, we address one of the limitations when using unsupervised models like regularized autoencoders to learn features that we hope to be useful for a subsequent supervised task, namely their blindness to that specific task. We propose to change the cost function to focus on accurate reconstruction of input features that seem more useful for the supervised task. This is achieved by using A-norm in the minimized reconstruction error instead of Euclidean-norm. Through the choice of an appropriate A-matrix, the capacity of the model can be steered towards modeling relevant information for the classification task. Comparative experiments with the proposed denoising autoencoder variant show that this way of proceeding yields extracted features that achieve better classification performance on several datasets.


Neural Network Deep Learning Semi-Supervised Learning Pre-training Learning Representation Autoencoder Denoising Autoencoder Convolutional Net 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bengio, Y., Le Cun, Y.: Scaling learning algorithms towards AI (2007)Google Scholar
  2. 2.
    Hinton, G.: Connectionist learning procedures. Artificial Intelligence 40, 185–234 (1989)CrossRefGoogle Scholar
  3. 3.
    Utgo, P., Stracuzzi, D.: Many-layered learning. Neural Computation 14, 2497–2539 (2002)CrossRefGoogle Scholar
  4. 4.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: International Conference on Machine Learning (2009)Google Scholar
  5. 5.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, vol. 19, pp. 153–160 (2007)Google Scholar
  6. 6.
    Lee, H., Ekanadham, C., Ng, A.: Sparse deep belief net model for visual area V2. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, Cambridge, MA (2008)Google Scholar
  7. 7.
    Bengio, Y., Courville, A., Vincent, P.: Perspectives, Unsupervised Feature Learning and Deep Learning: A Review and New., Montreal (2012)Google Scholar
  8. 8.
    Erhan, D., Bengio, Y., Courville, A., Bengio, S.: Why Does Unsupervised Pre-training Help Deep Learning. Journal of Machine Learning Research 11, 625–660 (2010)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Rumelhart, D., Hinton, G.: Williams: Learning representations by back-propagation (1986)Google Scholar
  10. 10.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and Composing Robust Features with Denoising Autoencoders (2008)Google Scholar
  11. 11.
    Rifai Salah, P.: Contractive Autoencoders: Explicit Invariance During Feature Extraction (2011)Google Scholar
  12. 12.
    Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy based model (2007)Google Scholar
  13. 13.
    Goodfellow, I., Le, Q., Saxe, A., Ng, A.: Measuring invariances in deep networks. In: NIPS, pp. 646–654 (2009)Google Scholar
  14. 14.
    Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithms with applications to object recognition (2008)Google Scholar
  15. 15.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning (2009)Google Scholar
  16. 16.
  17. 17.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference, SciPy (2010)Google Scholar
  18. 18.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion (2010)Google Scholar
  19. 19.
    Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  20. 20.
    Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech (2011)Google Scholar
  21. 21.
    Schwenk, H., Rousseau, A., Attik, M.: Large pruned or continuous space language models on a gpu for statistical machine translation. In: Workshop on the Future of Language Modeling for HLT (2011)Google Scholar
  22. 22.
    Rifai, S., Bengio, Y., Courville, A., Vincent, P., Mirza, M.: Disentangling factors of variation for facial expression recognition (2012)Google Scholar
  23. 23.
    Bengio, Y.: Learning Deep Architecture for AI, Montreal, Canada (2007)Google Scholar
  24. 24.
    Olshausen, B., Field, D.: Sparse coding with an overcomplete basis set: a strategy employed by V1. Vision Research 37, 3311–3325 (1997)CrossRefGoogle Scholar
  25. 25.
    Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets 18, 1527–1554 (2006)Google Scholar
  26. 26.
    Susskind, J., Anderson, A., Hinton, G.: The Toronto Face Database, Toronto (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hani Almousli
    • 1
  • Pascal Vincent
    • 1
  1. 1.Dept. IROUniversité de MontréalMontréalCanada

Personalised recommendations