Skip to main content

Introduction and Motivation

  • Chapter
  • First Online:
Deep Neural Networks in a Mathematical Framework

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 4042 Accesses

Abstract

This chapter serves as a basic introduction to neural networks, including their history and some applications in which they have achieved state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    E.g. self-driving cars, finance, other important systems.

  2. 2.

    Although the perceptron is just a specific case of logistic regression, which has roots from 1944 and earlier; see [7], for example.

  3. 3.

    We have generally two main classes of deep networks: supervised networks, requiring a specific target for each input, and unsupervised networks, which have no specific targets and only look to find structure within the input data. We can also have semi-supervised learning, in which some proportion of the training examples have targets, but this is not as common. Finally, another category called reinforcement learning exists, in which an autonomous agent attempts to learn a task, but the neural networks used within this are still often supervised—they attempt to predict the value of an action given the current state.

  4. 4.

    MNIST is from [34].

  5. 5.

    E.g. Wikipedia articles, LaTeX documents.

  6. 6.

    Although there were other major contributions to the first so-called A.I. winter, including over-promising to grant agencies when the current technology could not deliver; see [30] for more.

  7. 7.

    Perceptrons have no hidden layers.

  8. 8.

    This was also inspired by biological function, as the ReLU activation function is a realistic description of neuron firing [20].

References

  1. D. Ackley, G. Hinton, T. Sejnowski, A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)

    Google Scholar 

  2. M. Arjovsky, L. Bottou, Towards principled methods for training generative adversarial networks. arXiv:1701.04862 (2017, preprint)

    Google Scholar 

  3. M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN. arXiv:1701.07875 (2017, preprint)

    Google Scholar 

  4. D. Ballard, Modular learning in neural networks, in AAAI (1987), pp. 279–284.

    Google Scholar 

  5. A. Baydin, B. Pearlmutter, A. Radul, J. Siskind, Automatic differentiation in machine learning: a survey. arXiv:1502.05767 (2015, preprint)

    Google Scholar 

  6. Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in Advances in Neural Information Processing Systems (2007), pp. 153–160

    Google Scholar 

  7. J. Berkson, Application of the logistic function to bio-assay. J. Am. Stat. Assoc. 39(227), 357–365 (1944)

    Google Scholar 

  8. A.L. Caterini, D.E. Chang, A geometric framework for convolutional neural networks. arXiv:1608.04374 (2016, preprint)

    Google Scholar 

  9. A.L. Caterini, D.E. Chang, A novel representation of neural networks. arXiv:1610.01549 (2016, preprint)

    Google Scholar 

  10. D. Cireşan, U. Meier, L. Gambardella, J. Schmidhuber, Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)

    Google Scholar 

  11. D. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUs). arXiv:1511.07289 (2015, preprint)

    Google Scholar 

  12. G. Cybenko, Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Google Scholar 

  13. Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, Y. Bengio, Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, in Advances in Neural Information Processing Systems (2014), pp. 2933–2941

    Google Scholar 

  14. R. Eldan, O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory (2016), pp. 907–940

    Google Scholar 

  15. K. Fukushima, S. Miyake, Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition, in Competition and Cooperation in Neural Nets (Springer, Berlin, 1982), pp. 267–285

    Google Scholar 

  16. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems (2014), pp. 2672–2680

    Google Scholar 

  17. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016), http://www.deeplearningbook.org.

  18. B. Graham, Fractional max-pooling. arXiv:1412.6071 (2014, preprint)

    Google Scholar 

  19. A. Graves, Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013, preprint)

    Google Scholar 

  20. R. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R. Douglas, H. Seung, Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951 (2000)

    Google Scholar 

  21. K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on imagenet classification, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1026–1034

    Google Scholar 

  22. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778

    Google Scholar 

  23. K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN. arXiv:1703.06870 (2017, preprint)

    Google Scholar 

  24. G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Google Scholar 

  25. S. Hochreiter, Untersuchungen zu dynamischen neuronalen netzen, Diploma, Technische Universität München, 91, 1991

    Google Scholar 

  26. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Google Scholar 

  27. S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)

    Google Scholar 

  28. K. Hornik, Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)

    Google Scholar 

  29. A. Ivakhnenko, V. Lapa, Cybernetic predicting devices, Technical report, DTIC Document, 1966

    Google Scholar 

  30. L. Kanal, Perceptron, in Encyclopedia of Computer Science (Wiley, Chichester, 2003)

    Google Scholar 

  31. Y. LeCun, D. Touresky, G. Hinton, T. Sejnowski, A theoretical framework for back-propagation, in The Connectionist Models Summer School, vol. 1 (1988), pp. 21–28

    Google Scholar 

  32. Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems (1990), pp. 396–404

    Google Scholar 

  33. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  34. Y. LeCun, C. Cortes, C. Burges, Mnist handwritten digit database. AT&T Labs [Online]. http://yann.lecun.com/exdb/mnist, 2 (2010)

  35. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  36. H. Lin, M. Tegmark, Why does deep and cheap learning work so well? arXiv:1608.08225 (2016, preprint)

    Google Scholar 

  37. H. Lutkepohl, Handbook of Matrices (Wiley, Hoboken, 1997)

    MATH  Google Scholar 

  38. A. Maas, A. Hannun, A. Ng, Rectifier nonlinearities improve neural network acoustic models, in Proceedings of ICML, vol. 30 (2013)

    Google Scholar 

  39. W. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol. 5(4), 115–133 (1943)

    MathSciNet  MATH  Google Scholar 

  40. M. Minsky, S. Papert, Perceptrons (MIT press, Cambridge, 1969)

    MATH  Google Scholar 

  41. V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller et al., Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  42. G. Montufar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of deep neural networks, in Advances in Neural Information Processing Systems (2014), pp. 2924–2932

    Google Scholar 

  43. V. Nair, G. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010), pp. 807–814

    Google Scholar 

  44. R. Pascanu, G. Montufar, Y. Bengio, On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv:1312.6098 (2013, preprint)

    Google Scholar 

  45. A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 (2015, preprint)

    Google Scholar 

  46. F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)

    Google Scholar 

  47. D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation, Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985

    Google Scholar 

  48. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, in Advances in Neural Information Processing Systems (2016), pp. 2226–2234

    Google Scholar 

  49. G. Saon, T. Sercu, S.J. Rennie, H. Jeff Kuo, The IBM 2016 English conversational telephone speech recognition system. arXiv:1604.08242 (2016, preprint)

    Google Scholar 

  50. J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  51. U. Shaham, A. Cloninger, R. Coifman, Provable approximation properties for deep neural networks. Appl. Comput. Harmon. Anal. 44(3), 537–557 (2018)

    Article  MathSciNet  Google Scholar 

  52. D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al., Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  53. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014, preprint)

    Google Scholar 

  54. S. Vallender, Calculation of the wasserstein distance between probability distributions on the line. Theory Prob. Appl. 18(4), 784–786 (1974)

    Article  Google Scholar 

  55. O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 3156–3164

    Google Scholar 

  56. P. Werbos, Applications of advances in nonlinear sensitivity analysis, in System Modeling and Optimization (Springer, Berlin, 1982), pp. 762–770

    Google Scholar 

  57. B. Widrow, M. Hoff, Associative storage and retrieval of digital information in networks of adaptive “neurons”, in Biological Prototypes and Synthetic Systems (Springer, Berlin, 1962), pp. 160–160

    Google Scholar 

  58. R. Williams, D. Zipser, A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)

    Article  Google Scholar 

  59. Z. Xie, A. Avati, N. Arivazhagan, D. Jurafsky, A. Ng, Neural language correction with character-based attention. arXiv:1603.09727 (2016, preprint)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 The Author(s)

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Caterini, A.L., Chang, D.E. (2018). Introduction and Motivation. In: Deep Neural Networks in a Mathematical Framework. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-75304-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75304-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75303-4

  • Online ISBN: 978-3-319-75304-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics