Abstract
In this chapter, four different types of neural networks are described: Radial Basis Functions-RBF, Self-Organizing Maps-SOM, the Hopfield, and the deep neural networks. RBF uses a different approach in the design of a neural network based on the hidden layer (unique in the network) composed of neurons in which radial basis functions are defined, hence the name of Radial Basis Functions, and which performs a nonlinear transformation of the input data supplied to the network. The SOM network, on the other hand, has an unsupervised learning model and has the originality of autonomously grouping input data on the basis of their similarity without evaluating the convergence error with external information on the data, but evaluating the quantization error on map network. With the Hopfield network, the learning model is supervised and with the ability to store information and retrieve it through even partial content of the original information. The network is associated with an energy function to be minimized during its evolution with a succession of states until reaching a final state corresponding to the minimum of the energy function. This feature allows it to be used to solve and set up an optimization problem in terms of the objective function to be associated with an energy function. The chapter concludes with the description of the foundation of the newly fashionable methods based on convolutional neural networks (CNN), by now the most widespread since 2012, based on the deep learning architecture (deep learning).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Always satisfying the theorem of Cover described above, with reasonably we want to indicate a size correlated with the computational complexity of the entire architecture.
- 2.
The D operator derives from the Z transform applied to the discrete signals \( y (n): n = 0,1,2,3, \ldots \) to obtain analytical solutions to the difference equations. The delay unit is introduced simply to delay the activation signal until the next iteration.
- 3.
Ising’s model (from the name of the physicist Ernst Ising who proposed it) is a physical-mathematical model initially devised to describe a magnetized body starting from its elementary constituents. The model was then used to model variegated phenomena, united by the presence of single components that, interacting in pairs, produce collective effects.
- 4.
In the context of neural networks, an attractor is the final configuration achieved by a neural network that, starting from an initial state, reaches a stable state after a certain time. Once an attractor is known, the set of initial states that determine evolution of the network that ends with that attractor is called the attraction basin.
- 5.
Normally different memory devices store and retrieve the information by referring to the memory location addresses. Consequently, this mode of access to information often becomes a limiting factor for systems that require quick access to information. The time required to find an item stored in memory can be considerably reduced if the object can be identified for access through its contents rather than by memory addresses. A memory accessed in this way is called addressable memory for content or CAM-Content-Addressable Memory. CAM offers an advantage in terms of performance on other search algorithms, such as binary tree or look-up table based searches, comparing the desired information against the entire list of pre-stored memory location addresses.
- 6.
Lyapunov functions, named after the Russian mathematician Aleksandr Mikhailovich Lyapunov, are scalar functions that are used to study the stability of an equilibrium point of an ordinary autonomous differential equation, which normally describes a dynamic system. For dynamic physical systems, conservation laws often provide candidate Lyapunov functions.
- 7.
Recall the nonlinearity property of the sigmoid function \( \sigma (t) \) ensuring the limited definition range. In this context the following property is exploited
$$\begin{aligned} \frac{d\sigma (t)}{\text {d}t}=\sigma (t)(1-\sigma (t)) \end{aligned}$$a polynomial relation between derivative and function itself very simple to calculate.
- 8.
This stochastic optimization method that attempts to find a global minimum in the presence of local minima, is known as simulated annealing. In essence, the physical process that is adopted in the heat treatment (heating then slow cooling and then fast cooling) of the ferrous materials is simulated to make them more resistant and less fragile. At high temperatures the atoms of these materials are excited but during the slow cooling phase they have the time to assume an optimal crystalline configuration such that the material is free of irregularities reaching an overall minimum. This heat quenching treatment of the material can avoid local minimum of the lattice energy because the dynamics of the particles include a temperature-dependent component. In fact, during cooling, the particles lose energy but sometimes they acquire energy, thus entering states of higher energy. This phenomenon avoids the system from reaching less deep minimums.
- 9.
We recall that the adaptation of the weights in the MLP occurs by deriving the objective function that uses the sigmoid function with the value of the derivative less than 1. Application of the chain rule derivation leads to multiplying many terms less than 1 with the consequent problem of considerably reducing the gradient values as you proceed toward the layers furthest from the output.
- 10.
In the field of machine learning, there are two types of parameters, those that are learned during the learning process (for example, the weights of a logistic regression or a synaptic connection between neurons), and the intrinsic parameters of an algorithm of a learning model whose optimization takes place separately. The latter is known as the hyperparameters, or optimization parameters associated with a model (which can be a regularization parameter, the depth of a decision tree, or how in the context of deep neural networks the number of neuronal layers and other parameters that define the architecture of the network).
- 11.
The heuristic of the dropout is better understood through the following analogy. Let’s imagine that in a patent office the expert is a single employee. As often happens, if this expert is always present all the other employees of the office would have no incentive to acquire skills on patent procedures. But if the expert decides every day, in a stochastic way (for example, by throwing a coin), whether to go to work or not, the other employees, unable to block the activities of the office, are forced, even occasionally, to adapt by acquiring skills. Therefore, the office cannot rely only on the only experienced employee. All other employees are forced to acquire these skills. Therefore, a sort of collaboration between the various employees is generated, if necessary, without the number of the same being predefined. This makes the office much more flexible as a whole, increasing the quality and competence of employees. In the jargon of neural networks, we would say that the network generalizes better.
References
R.L. Hardy, Multiquadric equations of topography and other irregular surfaces. J. Geophys. Res. 76(8), 1905–1915 (1971)
J.D. Powell, Radial basis function approximations to polynomials, in Numerical Analysis, eds. by D.F. Griffiths, G.A. Watson (Longman Publishing Group White Plains, New York, NY, USA, 1987), pp. 223–241
Lanczos Cornelius, A precision approximation of the gamma function. SIAM J. Numer. Anal. Ser. B 1, 86–96 (1964)
T. Kohonen, Selforganized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)
C. Von der Malsburg, Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14, 85–100 (1973)
S. Amari, Dynamics of pattern formation in lateral inhibition type neural fields. Biol. Cybern. 27, 77–87 (1973)
T. Kohonen, Self-Organizing Maps, 3rd edn. ISBN 3540679219 (Springer-Verlag New York, Inc. Secaucus, NJ, USA, 2001)
T. Kohonen, Learning vector quantization. Neural Netw. 1, 303 (1988)
T. Kohonen, Improved versions of learning vector quantization. Proc. Int. Joint Conf. Neural Netw. (IJCNN) 1, 545–550 (1990)
J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci 79, 2554–2558 (1982)
R.P. Lippmann, B. Gold, M.L. Malpass, A comparison of hamming and hopfield neural nets for pattern classification. Technical Report ESDTR-86-131,769 (MIT, Lexington, MA, 1987)
J.J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Nat. Acad. Sci 81, 3088–3092 (1984)
D.H. Ackley, G.E. Hinton, T.J. Sejnowski, A learning algorithm for boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)
J.A. Hertz, A.S. Krogh, R. Palmer, Introduction to the Theory of Neural Computation. (Addison-Wesley, Redwood City, CA, USA, 1991). ISBN 0-201-50395-6
R. Ra\(\acute{e}\)l, Neural Networks: A Systematic Introduction (Springer Science and Business Media, Berlin, 1996)
S. Paul, Information processing in dynamical systems: Foundations of harmony theory, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, eds. by D.E. Rumelhart, J.L. McLelland, vol. 1, Chapter 6 (MIT Press, Cambridge, 1986), pp. 194–281
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
D.C. Ciresan, U. Meier, J. Masci, L.M. Gambardella, J. Schmidhuber, Flexible, high performance convolutional neural networks for image classification, in In International Joint Conference on Artificial Intelligence, pp. 1237–1242, 2011
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, eds. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger, vol. 25 (Curran Associates, Inc., Red Hook, NY, 2012), pp. 1097–1105
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
M. Tomáš, Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012
Y. Lecun, Y. Bengio, G. Hinton. Nature, 521(7553), 436–444 (2015). ISSN 0028-0836. http://doi.org/10.1038/nature14539
L. Yann, B.E. Boser, J.S. Denker, D. Henderson, R.E. Howard, W.E. Hubbard, L.D. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems, ed. by D.S. Touretzky, vol. 2 (Morgan-Kaufmann, Burlington, 1990), pp. 396–404
L. Yann, B. Leon, Y. Bengio, H. Patrick, Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network acoustic models. Int. Conf. Mach. Learn. 30, 3 (2013)
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in International Conference on Computer Vision ICCV, 2015a
D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), in ICLR, 2016
X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks. AISTATS 9, 249–256 (2010)
D. Mishkin, J. Matas, All you need is a good init. CoRR (2015). http://arxiv.org/abs/1511.06422
P. Krähenbühl, C. Doersch, J. Donahue, T. Darrell, Data-dependent initializations of convolutional neural networks, CoRR (2015). http://arxiv.org/abs/1511.06856
D. Sussillo, Random walks: Training very deep nonlinear feed-forward networks with smart initialization. CoRR (2014). http://arxiv.org/abs/1412.6558
Q. Liao, T. Poggio, Theory of deep learning ii: Landscape of the empirical risk in deep learning. Technical Report Memo No. 066 (Center for Brains, Minds and Machines (CBMM), 2017)
D. John, H. Elad, S. Yoram, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=1953048.2021068
S. Ruder, An overview of gradient descent optimization algorithms. CoRR (2016). http://arxiv.org/abs/1609.04747
D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, CoRR (2014). http://arxiv.org/abs/1412.6980
V. Patel, Kalman-based stochastic gradient method with stop condition and insensitivity to conditioning. SIAM J. Optim. 26(4), 2620–2648 (2016). https://doi.org/10.1137/15M1048239
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, CoRR (2015). http://arxiv.org/abs/1502.03167
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, CoRR (2013). http://arxiv.org/abs/1311.2901
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, In Computer Vision and Pattern Recognition (CVPR) (IEEE, Boston, MA, 2015). http://arxiv.org/abs/1409.4842
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. CoRR (2014). http://arxiv.org/abs/1409.1556
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. CoRR, 2015b. http://arxiv.org/abs/1512.03385
M. Del Coco, P. Carcagn, M. Leo, P. Spagnolo, P. L. Mazzeo, C. Distante, Multi-branch cnn for multi-scale age estimation, in International Conference on Image Analysis and Processing, pp. 234–244, 2017
M. Del Coco, P. Carcagn, M. Leo, P. L. Mazzeo, P. Spagnolo, C. Distante, Assessment of deep learning for gender classification on traditional datasets, in In Advanced Video and Signal Based Surveillance (AVSS), pp. 271–277, 2016
K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, CoRR, 2016. http://arxiv.org/abs/1603.05027
S. Xie, R.B. Girshick, P. Dollár, Z. Tu, K. He, Aggregated residual transformations for deep neural networks. CoRR (2016). http://arxiv.org/abs/1611.05431
G. Huang, Z. Liu, K.Q. Weinberger, Densely connected convolutional networks, CoRR, 2016a. http://arxiv.org/abs/1608.06993
G. Huang, Y. Sun, Z. Liu, D. Sedra, K.Q. Weinberger, Deep networks with stochastic depth, CoRR, 2016b. http://arxiv.org/abs/1603.09382
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Distante, A., Distante, C. (2020). RBF, SOM, Hopfield, and Deep Neural Networks. In: Handbook of Image Processing and Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-42378-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-42378-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42377-3
Online ISBN: 978-3-030-42378-0
eBook Packages: Computer ScienceComputer Science (R0)