Abstract
This chapter introduces feedforward neural networks, and introduces the basic terminology of deep learning. It also presents a discussion on how to represent these abstract and graphical objects as mathematical objects (vectors, matrices and tensors). Rosenblatt’s perceptron rule is also presented in detail, which makes it clear that a multilayered perceptron is impossible. The Delta rule, as an alternative, is presented, and the idea of iterative procedures of rule updates are also presented in this chapter with abundant examples to form both an abstract intuition and a numerical intuition. Backpropagation is explained in great detail with all the calculations for a simple example carried out. Error functions and their role in the whole system is explained in great detail. This chapter introduces the first example of Python code in Keras, with all the details of running Python and Keras explained in detail (imports, Keras-specific functions and regular Python functions).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
These models are called linear neurons.
- 2.
From linear neurons we still want to use the same notation but we set \(y_{23}:=z_{23}\).
- 3.
Formally speaking, all units using the perceptron rule should be called perceptrons, not just binary threshold units.
- 4.
The target is also called expected value or true label, and it is usually denoted by t.
- 5.
As a simple application, think of an image recognition system for security cameras, where one needs to classify numbers seen regardless of their orientation.
- 6.
This is a modified version of an example given by Geoffrey Hinton.
- 7.
For example, if we only buy chicken, then it would be easy to get the price of the chicken analytically as \(total=price\cdot quantity\), and we get \(price=\frac{total}{quantity}\).
- 8.
In practical terms this might seem far more complicated than simply asking the person serving you lunch the price per kilogram for components, but you can imagine that the person is the soup vendor from the soup kitchen from the TV show Seinfeld (116th episode, or S07E06).
- 9.
A guessed estimate. We use this term just to note that for now, we should keep things intuitive an not guess an initial value of, e.g. 12000, 4533233456, 0.0000123, not because it will be impossible to solve it, but because it will need much more steps to assume a form where we could see the regularities appear.
- 10.
Not in the sense that they are the same formula, but that they refer to the same process and that one can be derived from the other.
- 11.
For the sake of easy readability, we deliberately combine Newton and Leibniz notation in the rules, since some of them are more intuitive in one, while some of them are more intuitive in the second. We refer the reader back to Chap. 1 where all the formulations in both notations were given.
- 12.
Strictly speaking, we would need \(\frac{\partial E}{\partial y^{(n)}}\) but this generalization is trivial and we chose the simplification since we wanted to improve readability.
- 13.
A definition is circular if the same term occurs in both the definiendum (what is being defined) and definiens (with which it is defined), i.e. on both sides of \(=\) (or more precisely of \(:=\)) and in our case this term could be w. A recursive definition has the same term on both sides, but on the defining side (definiens) it has to be ‘smaller’ so that one could resolve the definition by going back to the starting point.
- 14.
If you recall, the perceptron rule also qualifies as a ‘simpler’ way of learning weights, but it had the major drawback that it cannot be generalized to multiple layers.
- 15.
Although it must be said that the whole field of deep learning is centered around overcoming the problems with gradient descent that arise when using it in deep networks.
- 16.
Cf. G. Hinton’s Coursera course, where this method is elaborated.
- 17.
We must then use the gradient, not individual partial derivatives.
- 18.
This is a modified version of the example by Matt Mazur available at https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/.
- 19.
The only difference is the step for \(\frac{\partial z_F}{\partial w_5}\), where there is a 0 now for \(w_5\) and a 1 for \(w_6\).
- 20.
Which we discussed earlier, but we will restate it here: \(w_k^{new} = w_k^{old} - \eta \frac{\partial E}{\partial w_k}\).
- 21.
Or full-batch if we use the whole training set.
- 22.
Which is equal to using a mini-batch of size 1.
References
M. Hassoun, Fundamentals of Artificial Neural Networks (MIT Press, Cambridge, 2003)
I.N. da Silva, D.H. Spatti, R.A. Flauzino, L.H.B. Liboni, S.F. dos Reis Alves, Artificial Neural Networks: A Practical Course (Springer, New York, 2017)
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016)
G. Montavon, G. Orr, K.R. Müller, Neural Networks: Tricks of the Trade (Springer, New York, 2012)
M. Minsky, S. Papert, Perceptrons: An Introduction to Computational Geometry (MIT Press, Cambridge, 1969)
P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences (Harvard University, Cambridge, 1975)
D.B. Parker, Learning-logic. Technical Report-47 (MIT Center for Computational Research in Economics and Management Science, Cambridge, 1985)
Y. LeCun, Une procédure d’apprentissage pour réseau a seuil asymmetrique. Proc. Cogn. 85, 599–604 (1985)
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation. Parallel Distrib. Process. 1, 318–362 (1986)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Skansi, S. (2018). Feedforward Neural Networks. In: Introduction to Deep Learning. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-73004-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-73004-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73003-5
Online ISBN: 978-3-319-73004-2
eBook Packages: Computer ScienceComputer Science (R0)