Abstract
This chapter serves as a basic introduction to neural networks, including their history and some applications in which they have achieved state-of-the-art results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
E.g. self-driving cars, finance, other important systems.
- 2.
Although the perceptron is just a specific case of logistic regression, which has roots from 1944 and earlier; see [7], for example.
- 3.
We have generally two main classes of deep networks: supervised networks, requiring a specific target for each input, and unsupervised networks, which have no specific targets and only look to find structure within the input data. We can also have semi-supervised learning, in which some proportion of the training examples have targets, but this is not as common. Finally, another category called reinforcement learning exists, in which an autonomous agent attempts to learn a task, but the neural networks used within this are still often supervised—they attempt to predict the value of an action given the current state.
- 4.
MNIST is from [34].
- 5.
E.g. Wikipedia articles, LaTeX documents.
- 6.
Although there were other major contributions to the first so-called A.I. winter, including over-promising to grant agencies when the current technology could not deliver; see [30] for more.
- 7.
Perceptrons have no hidden layers.
- 8.
This was also inspired by biological function, as the ReLU activation function is a realistic description of neuron firing [20].
References
D. Ackley, G. Hinton, T. Sejnowski, A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)
M. Arjovsky, L. Bottou, Towards principled methods for training generative adversarial networks. arXiv:1701.04862 (2017, preprint)
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN. arXiv:1701.07875 (2017, preprint)
D. Ballard, Modular learning in neural networks, in AAAI (1987), pp. 279–284.
A. Baydin, B. Pearlmutter, A. Radul, J. Siskind, Automatic differentiation in machine learning: a survey. arXiv:1502.05767 (2015, preprint)
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in Advances in Neural Information Processing Systems (2007), pp. 153–160
J. Berkson, Application of the logistic function to bio-assay. J. Am. Stat. Assoc. 39(227), 357–365 (1944)
A.L. Caterini, D.E. Chang, A geometric framework for convolutional neural networks. arXiv:1608.04374 (2016, preprint)
A.L. Caterini, D.E. Chang, A novel representation of neural networks. arXiv:1610.01549 (2016, preprint)
D. Cireşan, U. Meier, L. Gambardella, J. Schmidhuber, Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)
D. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUs). arXiv:1511.07289 (2015, preprint)
G. Cybenko, Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, Y. Bengio, Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, in Advances in Neural Information Processing Systems (2014), pp. 2933–2941
R. Eldan, O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory (2016), pp. 907–940
K. Fukushima, S. Miyake, Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition, in Competition and Cooperation in Neural Nets (Springer, Berlin, 1982), pp. 267–285
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems (2014), pp. 2672–2680
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016), http://www.deeplearningbook.org.
B. Graham, Fractional max-pooling. arXiv:1412.6071 (2014, preprint)
A. Graves, Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013, preprint)
R. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R. Douglas, H. Seung, Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951 (2000)
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on imagenet classification, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1026–1034
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN. arXiv:1703.06870 (2017, preprint)
G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
S. Hochreiter, Untersuchungen zu dynamischen neuronalen netzen, Diploma, Technische Universität München, 91, 1991
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
K. Hornik, Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
A. Ivakhnenko, V. Lapa, Cybernetic predicting devices, Technical report, DTIC Document, 1966
L. Kanal, Perceptron, in Encyclopedia of Computer Science (Wiley, Chichester, 2003)
Y. LeCun, D. Touresky, G. Hinton, T. Sejnowski, A theoretical framework for back-propagation, in The Connectionist Models Summer School, vol. 1 (1988), pp. 21–28
Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems (1990), pp. 396–404
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Y. LeCun, C. Cortes, C. Burges, Mnist handwritten digit database. AT&T Labs [Online]. http://yann.lecun.com/exdb/mnist, 2 (2010)
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
H. Lin, M. Tegmark, Why does deep and cheap learning work so well? arXiv:1608.08225 (2016, preprint)
H. Lutkepohl, Handbook of Matrices (Wiley, Hoboken, 1997)
A. Maas, A. Hannun, A. Ng, Rectifier nonlinearities improve neural network acoustic models, in Proceedings of ICML, vol. 30 (2013)
W. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol. 5(4), 115–133 (1943)
M. Minsky, S. Papert, Perceptrons (MIT press, Cambridge, 1969)
V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller et al., Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
G. Montufar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of deep neural networks, in Advances in Neural Information Processing Systems (2014), pp. 2924–2932
V. Nair, G. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010), pp. 807–814
R. Pascanu, G. Montufar, Y. Bengio, On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv:1312.6098 (2013, preprint)
A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 (2015, preprint)
F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)
D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation, Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, in Advances in Neural Information Processing Systems (2016), pp. 2226–2234
G. Saon, T. Sercu, S.J. Rennie, H. Jeff Kuo, The IBM 2016 English conversational telephone speech recognition system. arXiv:1604.08242 (2016, preprint)
J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
U. Shaham, A. Cloninger, R. Coifman, Provable approximation properties for deep neural networks. Appl. Comput. Harmon. Anal. 44(3), 537–557 (2018)
D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al., Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014, preprint)
S. Vallender, Calculation of the wasserstein distance between probability distributions on the line. Theory Prob. Appl. 18(4), 784–786 (1974)
O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 3156–3164
P. Werbos, Applications of advances in nonlinear sensitivity analysis, in System Modeling and Optimization (Springer, Berlin, 1982), pp. 762–770
B. Widrow, M. Hoff, Associative storage and retrieval of digital information in networks of adaptive “neurons”, in Biological Prototypes and Synthetic Systems (Springer, Berlin, 1962), pp. 160–160
R. Williams, D. Zipser, A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)
Z. Xie, A. Avati, N. Arivazhagan, D. Jurafsky, A. Ng, Neural language correction with character-based attention. arXiv:1603.09727 (2016, preprint)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 The Author(s)
About this chapter
Cite this chapter
Caterini, A.L., Chang, D.E. (2018). Introduction and Motivation. In: Deep Neural Networks in a Mathematical Framework. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-75304-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-75304-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75303-4
Online ISBN: 978-3-319-75304-1
eBook Packages: Computer ScienceComputer Science (R0)