On the Expressive Power of Deep Architectures

Bengio, Yoshua; Delalleau, Olivier

doi:10.1007/978-3-642-24412-4_3

On the Expressive Power of Deep Architectures

Yoshua Bengio²² &
Olivier Delalleau²²

Conference paper

3608 Accesses
101 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6925))

Abstract

Deep architectures are families of functions corresponding to deep circuits. Deep Learning algorithms are based on parametrizing such circuits and tuning their parameters so as to approximately optimize some training objective. Whereas it was thought too difficult to train deep architectures, several successful algorithms have been proposed in recent years. We review some of the theoretical motivations for deep architectures, as well as some of their practical successes, and propose directions of investigations to address some of the remaining challenges.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Attwell, D., Laughlin, S.B.: An energy budget for signaling in the grey matter of the brain. Journal of Cerebral Blood Flow And Metabolism 21, 1133–1145 (2001)
Article Google Scholar
Barron, A.E.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. on Information Theory 39, 930–945 (1993)
Article MathSciNet MATH Google Scholar
Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009); Also published as a book. Now Publishers
Article MATH Google Scholar
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Workshop on Unsupervised and Transfer Learning, ICML 2011 (2011)
Google Scholar
Bengio, Y., Delalleau, O.: Justifying and generalizing contrastive divergence. Neural Computation 21(6), 1601–1621 (2009)
Article MathSciNet MATH Google Scholar
Bengio, Y., Delalleau, O.: Shallow versus deep sum-product networks. In: The Learning Workshop, Fort Lauderdale, Florida (2011)
Google Scholar
Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines. MIT Press, Cambridge (2007)
Google Scholar
Bengio, Y., Monperrus, M.: Non-local manifold tangent learning. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, NIPS 2004, vol. 17, pp. 129–136. MIT Press, Cambridge (2005)
Google Scholar
Bengio, Y., Delalleau, O., Le Roux, N.: The curse of highly variable functions for local kernel machines. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems (NIPS 2005), vol. 18, pp. 107–114. MIT Press, Cambridge (2006)
Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems (NIPS 2006), vol. 19, pp. 153–160. MIT Press, Cambridge (2007)
Google Scholar
Bengio, Y., Delalleau, O., Simard, C.: Decision trees do not generalize to new variations. Computational Intelligence 26(4), 449–467 (2010)
Article MathSciNet MATH Google Scholar
Bengio, Y., Bastien, F., Bergeron, A., Boulanger-Lewandowski, N., Breuel, T., Chherawala, Y., Cisse, M., Côté, M., Erhan, D., Eustache, J., Glorot, X., Muller, X., Pannetier Lebeuf, S., Pascanu, R., Rifai, S., Savard, F., Sicard, G.: Deep learners benefit more from out-of-distribution examples. In: JMLR W&CP: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011 (2011)
Google Scholar
Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics 59, 291–294 (1988)
Article MathSciNet MATH Google Scholar
Braverman, M.: Poly-logarithmic independence fools bounded-depth boolean circuits. Communications of the ACM 54(4), 108–115 (2011)
Article Google Scholar
Breuleux, O., Bengio, Y., Vincent, P.: Quickly generating representative samples from an rbm-derived process. Neural Computation 23(8), 2058–2073 (2011)
Article MathSciNet Google Scholar
Bromley, J., Benz, J., Bottou, L., Guyon, I., Jackel, L., LeCun, Y., Moore, C., Sackinger, E., Shah, R.: Signature verification using a siamese time delay neural network. In: Advances in Pattern Recognition Systems using Neural Network Technologies, pp. 669–687. World Scientific, Singapore (1993)
Google Scholar
Caruana, R.: Learning many related tasks at the same time with backpropagation. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems (NIPS 1994), vol. 7, pp. 657–664. MIT Press, Cambridge (1995)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2005). IEEE Press, Los Alamitos (2005)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), pp. 160–167. ACM, New York (2008)
Google Scholar
Desjardins, G., Courville, A., Bengio, Y., Vincent, P., Delalleau, O.: Tempered Markov chain monte carlo for training of restricted Boltzmann machine. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), pp. 145–152 (2010)
Google Scholar
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research 11, 625–660 (2010)
MathSciNet MATH Google Scholar
Freund, Y., Haussler, D.: Unsupervised learning of distributions on binary vectors using two layer networks. Technical Report UCSC-CRL-94-25, University of California, Santa Cruz (1994)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the Twenty-eight International Conference on Machine Learning, ICML 2011 (2011)
Google Scholar
Goodfellow, I., Le, Q., Saxe, A., Ng, A.: Measuring invariances in deep networks. In: Bengio, Y., Schuurmans, D., Williams, C., Lafferty, J., Culotta, A. (eds.) Advances in Neural Information Processing Systems (NIPS 2009), vol. 22, pp. 646–654 (2009)
Google Scholar
Gutmann, M., Hyvarinen, A.: Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010 (2010)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2006), pp. 1735–1742. IEEE Press, Los Alamitos (2006)
Chapter Google Scholar
Hadsell, R., Erkan, A., Sermanet, P., Scoffier, M., Muller, U., LeCun, Y.: Deep belief net learning in a long-range vision system for autonomous off-road driving. In: Proc. Intelligent Robots and Systems (IROS 2008), pp. 628–633 (2008)
Google Scholar
Håstad, J.: Almost optimal lower bounds for small depth circuits. In: Proceedings of the 18th Annual ACM Symposium on Theory of Computing, pp. 6–20. ACM Press, Berkeley (1986)
Google Scholar
Håstad, J., Goldmann, M.: On the power of small-depth threshold circuits. Computational Complexity 1, 113–129 (1991)
Article MathSciNet MATH Google Scholar
Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Amherst, pp. 1–12. Lawrence Erlbaum, Hillsdale (1986)
Google Scholar
Hinton, G.E.: Connectionist learning procedures. Artificial Intelligence 40, 185–234 (1989)
Article Google Scholar
Hinton, G.E.: Products of experts. In: Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN), vol. 1, pp. 1–6. IEE, Edinburgh (1999)
Google Scholar
Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length, and helmholtz free energy. In: Cowan, D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems (NIPS 1993), vol. 6, pp. 3–10. Morgan Kaufmann Publishers, Inc., San Francisco (1994)
Google Scholar
Hinton, G.E., Sejnowski, T.J., Ackley, D.H.: Boltzmann machines: Constraint satisfaction networks that learn. Technical Report TR-CMU-CS-84-119, Carnegie-Mellon University, Dept. of Computer Science (1984)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
Hyvärinen, A.: Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research 6, 695–709 (2005)
MathSciNet MATH Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: Proc. International Conference on Computer Vision (ICCV 2009), pp. 2146–2153. IEEE, Los Alamitos (2009)
Chapter Google Scholar
Kavukcuoglu, K., Ranzato, M., LeCun, Y.: Fast inference in sparse coding algorithms with applications to object recognition. Technical report, Computational and Biological Learning Lab, Courant Institute, NYU. Tech Report CBLL-TR-2008-12-01 (2008)
Google Scholar
Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR 2009), pp. 1605–1612. IEEE, Los Alamitos (2009)
Chapter Google Scholar
Kingma, D., LeCun, Y.: Regularized estimation of image statistics by score matching. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1126–1134 (2010)
Google Scholar
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Ghahramani, Z. (ed.) Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML 2007), pp. 473–480. ACM, New York (2007)
Google Scholar
Larochelle, H., Erhan, D., Vincent, P.: Deep learning using robust interdependent codes. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp. 312–319 (2009)
Google Scholar
Le Roux, N., Bengio, Y.: Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation 20(6), 1631–1649 (2008)
Article MathSciNet MATH Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Bottou, L., Littman, M. (eds.) Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML 2009). ACM, Montreal (2009)
Google Scholar
Lennie, P.: The cost of cortical computation. Current Biology 13(6), 493–497 (2003)
Article Google Scholar
Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I., Lavoie, E., Muller, X., Desjardins, G., Warde-Farley, D., Vincent, P., Courville, A., Bergstra, J.: Unsupervised and transfer learning challenge: a deep learning approach. In: Workshop on Unsupervised and Transfer Learning, ICML 2011 (2011)
Google Scholar
Mobahi, H., Collobert, R., Weston, J.: Deep learning from temporal coherence in video. In: Bottou, L., Littman, M. (eds.) Proceedings of the 26th International Conference on Machine Learning, pp. 737–744. Omnipress, Montreal (2009)
Google Scholar
Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Research 37, 3311–3325 (1997)
Article Google Scholar
Osindero, S., Hinton, G.E.: Modeling image patches with a directed hierarchy of markov random field. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems (NIPS 2007), vol. 20, pp. 1121–1128. MIT Press, Cambridge (2008)
Google Scholar
Poon, H., Domingos, P.: Sum-product networks: A new deep architecture. In: NIPS, Workshop on Deep Learning and Unsupervised Feature Learning, Whistler, Canada (2010)
Google Scholar
Poon, H., Domingos, P.: Sum-product networks for deep learning. In: Learning Workshop. FL, Fort Lauderdale (2011)
Google Scholar
Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems (NIPS 2006), vol. 19, pp. 1137–1144. MIT Press, Cambridge (2007)
Google Scholar
Ranzato, M., Boureau, Y.-L., LeCun, Y.: Sparse feature learning for deep belief networks. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems (NIPS 2007), vol. 20, pp. 1185–1192. MIT Press, Cambridge (2008)
Google Scholar
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: Explicit invariance during feature extraction. In: Proceedings of the Twenty-eight International Conference on Machine Learning, ICML 2011 (2011)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Article MATH Google Scholar
Salakhutdinov, R., Hinton, G.E.: Semantic hashing. In: Proceedings of the 2007 Workshop on Information Retrieval and Applications of Graphical Models (SIGIR 2007). Elsevier, Amsterdam (2007)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), vol. 5, pp. 448–455 (2009)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: An efficient learning procedure for deep Boltzmann machines. Technical Report MIT-CSAIL-TR-2010-037, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (2010)
Google Scholar
Salakhutdinov, R., Mnih, A., Hinton, G.E.: Restricted Boltzmann machines for collaborative filtering. In: Ghahramani, Z. (ed.) Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML 2007), pp. 791–798. ACM, New York (2007)
Google Scholar
Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1, ch. 6, pp. 194–281. MIT Press, Cambridge (1986)
Google Scholar
Tieleman, T.: Training restricted boltzmann machines using approximations to the likelihood gradient. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), pp. 1064–1071. ACM, New York (2008)
Google Scholar
Tieleman, T., Hinton, G.: Using fast weights to improve persistent contrastive divergence. In: Bottou, L., Littman, M. (eds.) Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML 2009), pp. 1033–1040. ACM, New York (2009)
Google Scholar
Vincent, P.: A connection between score matching and denoising autoencoders. Neural Computation 23(7), 1661–1674 (2011)
Article MathSciNet MATH Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), pp. 1096–1103. ACM, New York (2008)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Welling, M.: Herding dynamic weights for partially observed random field models. In: Proceedings of the 25th Conference in Uncertainty in Artificial Intelligence (UAI 2009). Morgan Kaufmann, San Francisco (2009)
Google Scholar
Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), pp. 1168–1175. ACM, New York (2008)
Google Scholar
Wolpert, D.H.: The lack of a priori distinction between learning algorithms. Neural Computation 8(7), 1341–1390 (1996)
Article Google Scholar
Yao, A.: Separating the polynomial-time hierarchy by oracles. In: Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, pp. 1–10 (1985)
Google Scholar
Younes, L.: On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports 65(3), 177–228 (1999)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. IRO, Université de Montréal, Montréal, QC, H3C 3J7, Canada
Yoshua Bengio & Olivier Delalleau

Authors

Yoshua Bengio
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Delalleau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Helsinki, (Gustaf Hällströmin katu 2b), P.O. Box 68, 00014, Helsinki, Finland
Jyrki Kivinen & Esko Ukkonen &
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, AB, Canada
Csaba Szepesvári
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bengio, Y., Delalleau, O. (2011). On the Expressive Power of Deep Architectures. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science(), vol 6925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-24412-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24411-7
Online ISBN: 978-3-642-24412-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics