An Information Theoretic Approach to the Autoencoder

  • Vincenzo CrescimannaEmail author
  • Bruce Graham
Conference paper
Part of the Proceedings of the International Neural Networks Society book series (INNS, volume 1)


We present a variation of the Autoencoder (AE) that explicitly maximizes the mutual information between the input data and the hidden representation. The proposed model, the InfoMax Autoencoder (IMAE), by construction is able to learn a robust representation and good prototypes of the data. IMAE is compared both theoretically and then computationally with the state of the art models: the Denoising and Contractive Autoencoders in the one-hidden layer setting and the Variational Autoencoder in the multi-layer case. Computational experiments are performed with the MNIST and Fashion-MNIST datasets and demonstrate particularly the strong clusterization performance of IMAE.


Infomax Autoencoder Representation learning 



This research is funded by the University of Stirling CONTEXT research programme and by Bambu (B2B Robo advisor, Singapore).


  1. 1.
    Baldi, P., Hornik, K.: Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2(1), 53–58 (1989)CrossRefGoogle Scholar
  2. 2.
    Barlow, H.: Possible principles underlying the transformation of sensory messages. In: Sensory Communication. MIT Press, Cambridge (1961)Google Scholar
  3. 3.
    Bell, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. Neural Comput. 7(6), 1129–1159 (1995)CrossRefGoogle Scholar
  4. 4.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  5. 5.
    Brunel, N., Nadal, J.P.: Mutual information, Fisher information, and population coding. Neural Comput. 10(7), 1731–1757 (1998)CrossRefGoogle Scholar
  6. 6.
    Hyvärinen, A., Hurri, J., Hoyer, P.O.: Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. Springer, London (2009)CrossRefGoogle Scholar
  7. 7.
    Hyvrinen, A.: New approximations of differential entropy for independent component analysis and projection pursuit. Neural Inf. Process. Syst. 10, 273–279 (1998)Google Scholar
  8. 8.
    Hyvrinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000)CrossRefGoogle Scholar
  9. 9.
    Karhunen, J., Raiko, T., Cho, K.: Unsupervised deep learning: a short review. In: Advances in Independent Component Analysis and Learning Machines, pp. 125–142 (2015)Google Scholar
  10. 10.
    Kim, B., Rudin, C., Shah, J.A.: The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: Advances in Neural Information Processing Systems, pp. 1952–1960 (2014)Google Scholar
  11. 11.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (2014)Google Scholar
  12. 12.
    Lee, T.-W., Girolami, M., Bell, A.J., Sejnowski, T.J.: A unifying information-theoretic framework for independent component analysis. Comput. Math. Appl. 39(11), 1–21 (2000)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Linsker, R.: Self-organization in a perceptual network. Computer 21(3), 105–117 (1988)CrossRefGoogle Scholar
  14. 14.
    Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  15. 15.
    Papoulis, A., Pillai, S.U.: Probability, Random Variables, and Stochastic Processes. Tata McGraw-Hill Education, New Delhi (2002)Google Scholar
  16. 16.
    Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 833–840. Omnipress (2011)Google Scholar
  17. 17.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, July, pp. 1096–1103 (2008)Google Scholar
  19. 19.
    Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of StirlingStirlingUK

Personalised recommendations