RBF, SOM, Hopfield, and Deep Neural Networks

Distante, Arcangelo; Distante, Cosimo

doi:10.1007/978-3-030-42378-0_2

1722 Accesses

Abstract

In this chapter, four different types of neural networks are described: Radial Basis Functions-RBF, Self-Organizing Maps-SOM, the Hopfield, and the deep neural networks. RBF uses a different approach in the design of a neural network based on the hidden layer (unique in the network) composed of neurons in which radial basis functions are defined, hence the name of Radial Basis Functions, and which performs a nonlinear transformation of the input data supplied to the network. The SOM network, on the other hand, has an unsupervised learning model and has the originality of autonomously grouping input data on the basis of their similarity without evaluating the convergence error with external information on the data, but evaluating the quantization error on map network. With the Hopfield network, the learning model is supervised and with the ability to store information and retrieve it through even partial content of the original information. The network is associated with an energy function to be minimized during its evolution with a succession of states until reaching a final state corresponding to the minimum of the energy function. This feature allows it to be used to solve and set up an optimization problem in terms of the objective function to be associated with an energy function. The chapter concludes with the description of the foundation of the newly fashionable methods based on convolutional neural networks (CNN), by now the most widespread since 2012, based on the deep learning architecture (deep learning).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Always satisfying the theorem of Cover described above, with reasonably we want to indicate a size correlated with the computational complexity of the entire architecture.
2.
The D operator derives from the Z transform applied to the discrete signals $ y (n): n = 0,1,2,3, \ldots $ to obtain analytical solutions to the difference equations. The delay unit is introduced simply to delay the activation signal until the next iteration.
3.
Ising’s model (from the name of the physicist Ernst Ising who proposed it) is a physical-mathematical model initially devised to describe a magnetized body starting from its elementary constituents. The model was then used to model variegated phenomena, united by the presence of single components that, interacting in pairs, produce collective effects.
4.
In the context of neural networks, an attractor is the final configuration achieved by a neural network that, starting from an initial state, reaches a stable state after a certain time. Once an attractor is known, the set of initial states that determine evolution of the network that ends with that attractor is called the attraction basin.
5.
Normally different memory devices store and retrieve the information by referring to the memory location addresses. Consequently, this mode of access to information often becomes a limiting factor for systems that require quick access to information. The time required to find an item stored in memory can be considerably reduced if the object can be identified for access through its contents rather than by memory addresses. A memory accessed in this way is called addressable memory for content or CAM-Content-Addressable Memory. CAM offers an advantage in terms of performance on other search algorithms, such as binary tree or look-up table based searches, comparing the desired information against the entire list of pre-stored memory location addresses.
6.
Lyapunov functions, named after the Russian mathematician Aleksandr Mikhailovich Lyapunov, are scalar functions that are used to study the stability of an equilibrium point of an ordinary autonomous differential equation, which normally describes a dynamic system. For dynamic physical systems, conservation laws often provide candidate Lyapunov functions.
7.
Recall the nonlinearity property of the sigmoid function $ \sigma (t) $ ensuring the limited definition range. In this context the following property is exploited
$$\begin{aligned} \frac{d\sigma (t)}{\text {d}t}=\sigma (t)(1-\sigma (t)) \end{aligned}$$
a polynomial relation between derivative and function itself very simple to calculate.
8.
This stochastic optimization method that attempts to find a global minimum in the presence of local minima, is known as simulated annealing. In essence, the physical process that is adopted in the heat treatment (heating then slow cooling and then fast cooling) of the ferrous materials is simulated to make them more resistant and less fragile. At high temperatures the atoms of these materials are excited but during the slow cooling phase they have the time to assume an optimal crystalline configuration such that the material is free of irregularities reaching an overall minimum. This heat quenching treatment of the material can avoid local minimum of the lattice energy because the dynamics of the particles include a temperature-dependent component. In fact, during cooling, the particles lose energy but sometimes they acquire energy, thus entering states of higher energy. This phenomenon avoids the system from reaching less deep minimums.
9.
We recall that the adaptation of the weights in the MLP occurs by deriving the objective function that uses the sigmoid function with the value of the derivative less than 1. Application of the chain rule derivation leads to multiplying many terms less than 1 with the consequent problem of considerably reducing the gradient values as you proceed toward the layers furthest from the output.
10.
In the field of machine learning, there are two types of parameters, those that are learned during the learning process (for example, the weights of a logistic regression or a synaptic connection between neurons), and the intrinsic parameters of an algorithm of a learning model whose optimization takes place separately. The latter is known as the hyperparameters, or optimization parameters associated with a model (which can be a regularization parameter, the depth of a decision tree, or how in the context of deep neural networks the number of neuronal layers and other parameters that define the architecture of the network).
11.
The heuristic of the dropout is better understood through the following analogy. Let’s imagine that in a patent office the expert is a single employee. As often happens, if this expert is always present all the other employees of the office would have no incentive to acquire skills on patent procedures. But if the expert decides every day, in a stochastic way (for example, by throwing a coin), whether to go to work or not, the other employees, unable to block the activities of the office, are forced, even occasionally, to adapt by acquiring skills. Therefore, the office cannot rely only on the only experienced employee. All other employees are forced to acquire these skills. Therefore, a sort of collaboration between the various employees is generated, if necessary, without the number of the same being predefined. This makes the office much more flexible as a whole, increasing the quality and competence of employees. In the jargon of neural networks, we would say that the network generalizes better.

References

R.L. Hardy, Multiquadric equations of topography and other irregular surfaces. J. Geophys. Res. 76(8), 1905–1915 (1971)
Article Google Scholar
J.D. Powell, Radial basis function approximations to polynomials, in Numerical Analysis, eds. by D.F. Griffiths, G.A. Watson (Longman Publishing Group White Plains, New York, NY, USA, 1987), pp. 223–241
Google Scholar
Lanczos Cornelius, A precision approximation of the gamma function. SIAM J. Numer. Anal. Ser. B 1, 86–96 (1964)
Google Scholar
T. Kohonen, Selforganized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)
Article MathSciNet Google Scholar
C. Von der Malsburg, Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14, 85–100 (1973)
Article Google Scholar
S. Amari, Dynamics of pattern formation in lateral inhibition type neural fields. Biol. Cybern. 27, 77–87 (1973)
Article MathSciNet Google Scholar
T. Kohonen, Self-Organizing Maps, 3rd edn. ISBN 3540679219 (Springer-Verlag New York, Inc. Secaucus, NJ, USA, 2001)
Google Scholar
T. Kohonen, Learning vector quantization. Neural Netw. 1, 303 (1988)
Article Google Scholar
T. Kohonen, Improved versions of learning vector quantization. Proc. Int. Joint Conf. Neural Netw. (IJCNN) 1, 545–550 (1990)
Google Scholar
J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci 79, 2554–2558 (1982)
Article MathSciNet Google Scholar
R.P. Lippmann, B. Gold, M.L. Malpass, A comparison of hamming and hopfield neural nets for pattern classification. Technical Report ESDTR-86-131,769 (MIT, Lexington, MA, 1987)
Google Scholar
J.J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Nat. Acad. Sci 81, 3088–3092 (1984)
Article Google Scholar
D.H. Ackley, G.E. Hinton, T.J. Sejnowski, A learning algorithm for boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)
Google Scholar
J.A. Hertz, A.S. Krogh, R. Palmer, Introduction to the Theory of Neural Computation. (Addison-Wesley, Redwood City, CA, USA, 1991). ISBN 0-201-50395-6
Google Scholar
R. Ra$\acute{e}$l, Neural Networks: A Systematic Introduction (Springer Science and Business Media, Berlin, 1996)
Google Scholar
S. Paul, Information processing in dynamical systems: Foundations of harmony theory, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, eds. by D.E. Rumelhart, J.L. McLelland, vol. 1, Chapter 6 (MIT Press, Cambridge, 1986), pp. 194–281
Google Scholar
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
D.C. Ciresan, U. Meier, J. Masci, L.M. Gambardella, J. Schmidhuber, Flexible, high performance convolutional neural networks for image classification, in In International Joint Conference on Artificial Intelligence, pp. 1237–1242, 2011
Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, eds. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger, vol. 25 (Curran Associates, Inc., Red Hook, NY, 2012), pp. 1097–1105
Google Scholar
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
M. Tomáš, Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012
Google Scholar
Y. Lecun, Y. Bengio, G. Hinton. Nature, 521(7553), 436–444 (2015). ISSN 0028-0836. http://doi.org/10.1038/nature14539
L. Yann, B.E. Boser, J.S. Denker, D. Henderson, R.E. Howard, W.E. Hubbard, L.D. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems, ed. by D.S. Touretzky, vol. 2 (Morgan-Kaufmann, Burlington, 1990), pp. 396–404
Google Scholar
L. Yann, B. Leon, Y. Bengio, H. Patrick, Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Google Scholar
A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network acoustic models. Int. Conf. Mach. Learn. 30, 3 (2013)
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in International Conference on Computer Vision ICCV, 2015a
Google Scholar
D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), in ICLR, 2016
Google Scholar
X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks. AISTATS 9, 249–256 (2010)
Google Scholar
D. Mishkin, J. Matas, All you need is a good init. CoRR (2015). http://arxiv.org/abs/1511.06422
P. Krähenbühl, C. Doersch, J. Donahue, T. Darrell, Data-dependent initializations of convolutional neural networks, CoRR (2015). http://arxiv.org/abs/1511.06856
D. Sussillo, Random walks: Training very deep nonlinear feed-forward networks with smart initialization. CoRR (2014). http://arxiv.org/abs/1412.6558
Q. Liao, T. Poggio, Theory of deep learning ii: Landscape of the empirical risk in deep learning. Technical Report Memo No. 066 (Center for Brains, Minds and Machines (CBMM), 2017)
Google Scholar
D. John, H. Elad, S. Yoram, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=1953048.2021068
S. Ruder, An overview of gradient descent optimization algorithms. CoRR (2016). http://arxiv.org/abs/1609.04747
D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, CoRR (2014). http://arxiv.org/abs/1412.6980
V. Patel, Kalman-based stochastic gradient method with stop condition and insensitivity to conditioning. SIAM J. Optim. 26(4), 2620–2648 (2016). https://doi.org/10.1137/15M1048239
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, CoRR (2015). http://arxiv.org/abs/1502.03167
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, CoRR (2013). http://arxiv.org/abs/1311.2901
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, In Computer Vision and Pattern Recognition (CVPR) (IEEE, Boston, MA, 2015). http://arxiv.org/abs/1409.4842
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. CoRR (2014). http://arxiv.org/abs/1409.1556
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. CoRR, 2015b. http://arxiv.org/abs/1512.03385
M. Del Coco, P. Carcagn, M. Leo, P. Spagnolo, P. L. Mazzeo, C. Distante, Multi-branch cnn for multi-scale age estimation, in International Conference on Image Analysis and Processing, pp. 234–244, 2017
Google Scholar
M. Del Coco, P. Carcagn, M. Leo, P. L. Mazzeo, P. Spagnolo, C. Distante, Assessment of deep learning for gender classification on traditional datasets, in In Advanced Video and Signal Based Surveillance (AVSS), pp. 271–277, 2016
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, CoRR, 2016. http://arxiv.org/abs/1603.05027
S. Xie, R.B. Girshick, P. Dollár, Z. Tu, K. He, Aggregated residual transformations for deep neural networks. CoRR (2016). http://arxiv.org/abs/1611.05431
G. Huang, Z. Liu, K.Q. Weinberger, Densely connected convolutional networks, CoRR, 2016a. http://arxiv.org/abs/1608.06993
G. Huang, Y. Sun, Z. Liu, D. Sedra, K.Q. Weinberger, Deep networks with stochastic depth, CoRR, 2016b. http://arxiv.org/abs/1603.09382

Download references

Author information

Authors and Affiliations

Institute of Applied Sciences and Intelligent Systems, Consiglio Nazionale delle Ricerche, Lecce, Italy
Arcangelo Distante & Cosimo Distante

Authors

Arcangelo Distante
View author publications
You can also search for this author in PubMed Google Scholar
Cosimo Distante
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arcangelo Distante .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Distante, A., Distante, C. (2020). RBF, SOM, Hopfield, and Deep Neural Networks. In: Handbook of Image Processing and Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-42378-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-42378-0_2
Published: 09 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42377-3
Online ISBN: 978-3-030-42378-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics