Introduction and Motivation

Caterini, Anthony L.; Chang, Dong Eui

doi:10.1007/978-3-319-75304-1_1

Anthony L. Caterini¹⁶ &
Dong Eui Chang¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

4042 Accesses

Abstract

This chapter serves as a basic introduction to neural networks, including their history and some applications in which they have achieved state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
E.g. self-driving cars, finance, other important systems.
2.
Although the perceptron is just a specific case of logistic regression, which has roots from 1944 and earlier; see [7], for example.
3.
We have generally two main classes of deep networks: supervised networks, requiring a specific target for each input, and unsupervised networks, which have no specific targets and only look to find structure within the input data. We can also have semi-supervised learning, in which some proportion of the training examples have targets, but this is not as common. Finally, another category called reinforcement learning exists, in which an autonomous agent attempts to learn a task, but the neural networks used within this are still often supervised—they attempt to predict the value of an action given the current state.
4.
MNIST is from [34].
5.
E.g. Wikipedia articles, LaTeX documents.
6.
Although there were other major contributions to the first so-called A.I. winter, including over-promising to grant agencies when the current technology could not deliver; see [30] for more.
7.
Perceptrons have no hidden layers.
8.
This was also inspired by biological function, as the ReLU activation function is a realistic description of neuron firing [20].

References

D. Ackley, G. Hinton, T. Sejnowski, A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)
Google Scholar
M. Arjovsky, L. Bottou, Towards principled methods for training generative adversarial networks. arXiv:1701.04862 (2017, preprint)
Google Scholar
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN. arXiv:1701.07875 (2017, preprint)
Google Scholar
D. Ballard, Modular learning in neural networks, in AAAI (1987), pp. 279–284.
Google Scholar
A. Baydin, B. Pearlmutter, A. Radul, J. Siskind, Automatic differentiation in machine learning: a survey. arXiv:1502.05767 (2015, preprint)
Google Scholar
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in Advances in Neural Information Processing Systems (2007), pp. 153–160
Google Scholar
J. Berkson, Application of the logistic function to bio-assay. J. Am. Stat. Assoc. 39(227), 357–365 (1944)
Google Scholar
A.L. Caterini, D.E. Chang, A geometric framework for convolutional neural networks. arXiv:1608.04374 (2016, preprint)
Google Scholar
A.L. Caterini, D.E. Chang, A novel representation of neural networks. arXiv:1610.01549 (2016, preprint)
Google Scholar
D. Cireşan, U. Meier, L. Gambardella, J. Schmidhuber, Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)
Google Scholar
D. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUs). arXiv:1511.07289 (2015, preprint)
Google Scholar
G. Cybenko, Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Google Scholar
Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, Y. Bengio, Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, in Advances in Neural Information Processing Systems (2014), pp. 2933–2941
Google Scholar
R. Eldan, O. Shamir, The power of depth for feedforward neural networks, in Conference on Learning Theory (2016), pp. 907–940
Google Scholar
K. Fukushima, S. Miyake, Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition, in Competition and Cooperation in Neural Nets (Springer, Berlin, 1982), pp. 267–285
Google Scholar
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems (2014), pp. 2672–2680
Google Scholar
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016), http://www.deeplearningbook.org.
B. Graham, Fractional max-pooling. arXiv:1412.6071 (2014, preprint)
Google Scholar
A. Graves, Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013, preprint)
Google Scholar
R. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R. Douglas, H. Seung, Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951 (2000)
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on imagenet classification, in Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1026–1034
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778
Google Scholar
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN. arXiv:1703.06870 (2017, preprint)
Google Scholar
G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Google Scholar
S. Hochreiter, Untersuchungen zu dynamischen neuronalen netzen, Diploma, Technische Universität München, 91, 1991
Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Google Scholar
S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)
Google Scholar
K. Hornik, Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Google Scholar
A. Ivakhnenko, V. Lapa, Cybernetic predicting devices, Technical report, DTIC Document, 1966
Google Scholar
L. Kanal, Perceptron, in Encyclopedia of Computer Science (Wiley, Chichester, 2003)
Google Scholar
Y. LeCun, D. Touresky, G. Hinton, T. Sejnowski, A theoretical framework for back-propagation, in The Connectionist Models Summer School, vol. 1 (1988), pp. 21–28
Google Scholar
Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems (1990), pp. 396–404
Google Scholar
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Y. LeCun, C. Cortes, C. Burges, Mnist handwritten digit database. AT&T Labs [Online]. http://yann.lecun.com/exdb/mnist, 2 (2010)
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
H. Lin, M. Tegmark, Why does deep and cheap learning work so well? arXiv:1608.08225 (2016, preprint)
Google Scholar
H. Lutkepohl, Handbook of Matrices (Wiley, Hoboken, 1997)
MATH Google Scholar
A. Maas, A. Hannun, A. Ng, Rectifier nonlinearities improve neural network acoustic models, in Proceedings of ICML, vol. 30 (2013)
Google Scholar
W. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol. 5(4), 115–133 (1943)
MathSciNet MATH Google Scholar
M. Minsky, S. Papert, Perceptrons (MIT press, Cambridge, 1969)
MATH Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller et al., Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
G. Montufar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of deep neural networks, in Advances in Neural Information Processing Systems (2014), pp. 2924–2932
Google Scholar
V. Nair, G. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010), pp. 807–814
Google Scholar
R. Pascanu, G. Montufar, Y. Bengio, On the number of response regions of deep feed forward networks with piece-wise linear activations. arXiv:1312.6098 (2013, preprint)
Google Scholar
A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 (2015, preprint)
Google Scholar
F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)
Google Scholar
D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation, Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985
Google Scholar
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training GANs, in Advances in Neural Information Processing Systems (2016), pp. 2226–2234
Google Scholar
G. Saon, T. Sercu, S.J. Rennie, H. Jeff Kuo, The IBM 2016 English conversational telephone speech recognition system. arXiv:1604.08242 (2016, preprint)
Google Scholar
J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
U. Shaham, A. Cloninger, R. Coifman, Provable approximation properties for deep neural networks. Appl. Comput. Harmon. Anal. 44(3), 537–557 (2018)
Article MathSciNet Google Scholar
D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al., Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014, preprint)
Google Scholar
S. Vallender, Calculation of the wasserstein distance between probability distributions on the line. Theory Prob. Appl. 18(4), 784–786 (1974)
Article Google Scholar
O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 3156–3164
Google Scholar
P. Werbos, Applications of advances in nonlinear sensitivity analysis, in System Modeling and Optimization (Springer, Berlin, 1982), pp. 762–770
Google Scholar
B. Widrow, M. Hoff, Associative storage and retrieval of digital information in networks of adaptive “neurons”, in Biological Prototypes and Synthetic Systems (Springer, Berlin, 1962), pp. 160–160
Google Scholar
R. Williams, D. Zipser, A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)
Article Google Scholar
Z. Xie, A. Avati, N. Arivazhagan, D. Jurafsky, A. Ng, Neural language correction with character-based attention. arXiv:1603.09727 (2016, preprint)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Oxford, Oxford, Oxfordshire, UK
Anthony L. Caterini
School of Electrical Engineering Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Dong Eui Chang

Authors

Anthony L. Caterini
View author publications
You can also search for this author in PubMed Google Scholar
Dong Eui Chang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Caterini, A.L., Chang, D.E. (2018). Introduction and Motivation. In: Deep Neural Networks in a Mathematical Framework. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-75304-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-75304-1_1
Published: 23 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75303-4
Online ISBN: 978-3-319-75304-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics