Feedforward Neural Networks

Skansi, Sandro

doi:10.1007/978-3-319-73004-2_4

Feedforward Neural Networks

Sandro Skansi ORCID: orcid.org/0000-0002-3851-1186¹¹

Chapter
First Online: 06 February 2018

402k Accesses
2 Citations

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

Abstract

This chapter introduces feedforward neural networks, and introduces the basic terminology of deep learning. It also presents a discussion on how to represent these abstract and graphical objects as mathematical objects (vectors, matrices and tensors). Rosenblatt’s perceptron rule is also presented in detail, which makes it clear that a multilayered perceptron is impossible. The Delta rule, as an alternative, is presented, and the idea of iterative procedures of rule updates are also presented in this chapter with abundant examples to form both an abstract intuition and a numerical intuition. Backpropagation is explained in great detail with all the calculations for a simple example carried out. Error functions and their role in the whole system is explained in great detail. This chapter introduces the first example of Python code in Keras, with all the details of running Python and Keras explained in detail (imports, Keras-specific functions and regular Python functions).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
These models are called linear neurons.
2.
From linear neurons we still want to use the same notation but we set \(y_{23}:=z_{23}\).
3.
Formally speaking, all units using the perceptron rule should be called perceptrons, not just binary threshold units.
4.
The target is also called expected value or true label, and it is usually denoted by t.
5.
As a simple application, think of an image recognition system for security cameras, where one needs to classify numbers seen regardless of their orientation.
6.
This is a modified version of an example given by Geoffrey Hinton.
7.
For example, if we only buy chicken, then it would be easy to get the price of the chicken analytically as \(total=price\cdot quantity\), and we get \(price=\frac{total}{quantity}\).
8.
In practical terms this might seem far more complicated than simply asking the person serving you lunch the price per kilogram for components, but you can imagine that the person is the soup vendor from the soup kitchen from the TV show Seinfeld (116th episode, or S07E06).
9.
A guessed estimate. We use this term just to note that for now, we should keep things intuitive an not guess an initial value of, e.g. 12000, 4533233456, 0.0000123, not because it will be impossible to solve it, but because it will need much more steps to assume a form where we could see the regularities appear.
10.
Not in the sense that they are the same formula, but that they refer to the same process and that one can be derived from the other.
11.
For the sake of easy readability, we deliberately combine Newton and Leibniz notation in the rules, since some of them are more intuitive in one, while some of them are more intuitive in the second. We refer the reader back to Chap. 1 where all the formulations in both notations were given.
12.
Strictly speaking, we would need \(\frac{\partial E}{\partial y^{(n)}}\) but this generalization is trivial and we chose the simplification since we wanted to improve readability.
13.
A definition is circular if the same term occurs in both the definiendum (what is being defined) and definiens (with which it is defined), i.e. on both sides of \(=\) (or more precisely of \(:=\)) and in our case this term could be w. A recursive definition has the same term on both sides, but on the defining side (definiens) it has to be ‘smaller’ so that one could resolve the definition by going back to the starting point.
14.
If you recall, the perceptron rule also qualifies as a ‘simpler’ way of learning weights, but it had the major drawback that it cannot be generalized to multiple layers.
15.
Although it must be said that the whole field of deep learning is centered around overcoming the problems with gradient descent that arise when using it in deep networks.
16.
Cf. G. Hinton’s Coursera course, where this method is elaborated.
17.
We must then use the gradient, not individual partial derivatives.
18.
This is a modified version of the example by Matt Mazur available at https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/.
19.
The only difference is the step for \(\frac{\partial z_F}{\partial w_5}\), where there is a 0 now for \(w_5\) and a 1 for \(w_6\).
20.
Which we discussed earlier, but we will restate it here: \(w_k^{new} = w_k^{old} - \eta \frac{\partial E}{\partial w_k}\).
21.
Or full-batch if we use the whole training set.
22.
Which is equal to using a mini-batch of size 1.

References

M. Hassoun, Fundamentals of Artificial Neural Networks (MIT Press, Cambridge, 2003)
MATH Google Scholar
I.N. da Silva, D.H. Spatti, R.A. Flauzino, L.H.B. Liboni, S.F. dos Reis Alves, Artificial Neural Networks: A Practical Course (Springer, New York, 2017)
Google Scholar
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016)
MATH Google Scholar
G. Montavon, G. Orr, K.R. Müller, Neural Networks: Tricks of the Trade (Springer, New York, 2012)
Book Google Scholar
M. Minsky, S. Papert, Perceptrons: An Introduction to Computational Geometry (MIT Press, Cambridge, 1969)
MATH Google Scholar
P.J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences (Harvard University, Cambridge, 1975)
Google Scholar
D.B. Parker, Learning-logic. Technical Report-47 (MIT Center for Computational Research in Economics and Management Science, Cambridge, 1985)
Google Scholar
Y. LeCun, Une procédure d’apprentissage pour réseau a seuil asymmetrique. Proc. Cogn. 85, 599–604 (1985)
Google Scholar
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation. Parallel Distrib. Process. 1, 318–362 (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Zagreb, Zagreb, Croatia
Sandro Skansi

Authors

Sandro Skansi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandro Skansi .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Skansi, S. (2018). Feedforward Neural Networks. In: Introduction to Deep Learning. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-73004-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-73004-2_4
Published: 06 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73003-5
Online ISBN: 978-3-319-73004-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics