Skip to main content

RBF, SOM, Hopfield, and Deep Neural Networks

  • Chapter
  • First Online:
Handbook of Image Processing and Computer Vision

Abstract

In this chapter, four different types of neural networks are described: Radial Basis Functions-RBF, Self-Organizing Maps-SOM, the Hopfield, and the deep neural networks. RBF uses a different approach in the design of a neural network based on the hidden layer (unique in the network) composed of neurons in which radial basis functions are defined, hence the name of Radial Basis Functions, and which performs a nonlinear transformation of the input data supplied to the network. The SOM network, on the other hand, has an unsupervised learning model and has the originality of autonomously grouping input data on the basis of their similarity without evaluating the convergence error with external information on the data, but evaluating the quantization error on map network. With the Hopfield network, the learning model is supervised and with the ability to store information and retrieve it through even partial content of the original information. The network is associated with an energy function to be minimized during its evolution with a succession of states until reaching a final state corresponding to the minimum of the energy function. This feature allows it to be used to solve and set up an optimization problem in terms of the objective function to be associated with an energy function. The chapter concludes with the description of the foundation of the newly fashionable methods based on convolutional neural networks (CNN), by now the most widespread since 2012, based on the deep learning architecture (deep learning).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Always satisfying the theorem of Cover described above, with reasonably we want to indicate a size correlated with the computational complexity of the entire architecture.

  2. 2.

    The D operator derives from the Z transform applied to the discrete signals \( y (n): n = 0,1,2,3, \ldots \) to obtain analytical solutions to the difference equations. The delay unit is introduced simply to delay the activation signal until the next iteration.

  3. 3.

    Ising’s model (from the name of the physicist Ernst Ising who proposed it) is a physical-mathematical model initially devised to describe a magnetized body starting from its elementary constituents. The model was then used to model variegated phenomena, united by the presence of single components that, interacting in pairs, produce collective effects.

  4. 4.

    In the context of neural networks, an attractor is the final configuration achieved by a neural network that, starting from an initial state, reaches a stable state after a certain time. Once an attractor is known, the set of initial states that determine evolution of the network that ends with that attractor is called the attraction basin.

  5. 5.

    Normally different memory devices store and retrieve the information by referring to the memory location addresses. Consequently, this mode of access to information often becomes a limiting factor for systems that require quick access to information. The time required to find an item stored in memory can be considerably reduced if the object can be identified for access through its contents rather than by memory addresses. A memory accessed in this way is called addressable memory for content or CAM-Content-Addressable Memory. CAM offers an advantage in terms of performance on other search algorithms, such as binary tree or look-up table based searches, comparing the desired information against the entire list of pre-stored memory location addresses.

  6. 6.

    Lyapunov functions, named after the Russian mathematician Aleksandr Mikhailovich Lyapunov, are scalar functions that are used to study the stability of an equilibrium point of an ordinary autonomous differential equation, which normally describes a dynamic system. For dynamic physical systems, conservation laws often provide candidate Lyapunov functions.

  7. 7.

    Recall the nonlinearity property of the sigmoid function \( \sigma (t) \) ensuring the limited definition range. In this context the following property is exploited

    $$\begin{aligned} \frac{d\sigma (t)}{\text {d}t}=\sigma (t)(1-\sigma (t)) \end{aligned}$$

    a polynomial relation between derivative and function itself very simple to calculate.

  8. 8.

    This stochastic optimization method that attempts to find a global minimum in the presence of local minima, is known as simulated annealing. In essence, the physical process that is adopted in the heat treatment (heating then slow cooling and then fast cooling) of the ferrous materials is simulated to make them more resistant and less fragile. At high temperatures the atoms of these materials are excited but during the slow cooling phase they have the time to assume an optimal crystalline configuration such that the material is free of irregularities reaching an overall minimum. This heat quenching treatment of the material can avoid local minimum of the lattice energy because the dynamics of the particles include a temperature-dependent component. In fact, during cooling, the particles lose energy but sometimes they acquire energy, thus entering states of higher energy. This phenomenon avoids the system from reaching less deep minimums.

  9. 9.

    We recall that the adaptation of the weights in the MLP occurs by deriving the objective function that uses the sigmoid function with the value of the derivative less than 1. Application of the chain rule derivation leads to multiplying many terms less than 1 with the consequent problem of considerably reducing the gradient values as you proceed toward the layers furthest from the output.

  10. 10.

    In the field of machine learning, there are two types of parameters, those that are learned during the learning process (for example, the weights of a logistic regression or a synaptic connection between neurons), and the intrinsic parameters of an algorithm of a learning model whose optimization takes place separately. The latter is known as the hyperparameters, or optimization parameters associated with a model (which can be a regularization parameter, the depth of a decision tree, or how in the context of deep neural networks the number of neuronal layers and other parameters that define the architecture of the network).

  11. 11.

    The heuristic of the dropout is better understood through the following analogy. Let’s imagine that in a patent office the expert is a single employee. As often happens, if this expert is always present all the other employees of the office would have no incentive to acquire skills on patent procedures. But if the expert decides every day, in a stochastic way (for example, by throwing a coin), whether to go to work or not, the other employees, unable to block the activities of the office, are forced, even occasionally, to adapt by acquiring skills. Therefore, the office cannot rely only on the only experienced employee. All other employees are forced to acquire these skills. Therefore, a sort of collaboration between the various employees is generated, if necessary, without the number of the same being predefined. This makes the office much more flexible as a whole, increasing the quality and competence of employees. In the jargon of neural networks, we would say that the network generalizes better.

References

  1. R.L. Hardy, Multiquadric equations of topography and other irregular surfaces. J. Geophys. Res. 76(8), 1905–1915 (1971)

    Article  Google Scholar 

  2. J.D. Powell, Radial basis function approximations to polynomials, in Numerical Analysis, eds. by D.F. Griffiths, G.A. Watson (Longman Publishing Group White Plains, New York, NY, USA, 1987), pp. 223–241

    Google Scholar 

  3. Lanczos Cornelius, A precision approximation of the gamma function. SIAM J. Numer. Anal. Ser. B 1, 86–96 (1964)

    Google Scholar 

  4. T. Kohonen, Selforganized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)

    Article  MathSciNet  Google Scholar 

  5. C. Von der Malsburg, Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14, 85–100 (1973)

    Article  Google Scholar 

  6. S. Amari, Dynamics of pattern formation in lateral inhibition type neural fields. Biol. Cybern. 27, 77–87 (1973)

    Article  MathSciNet  Google Scholar 

  7. T. Kohonen, Self-Organizing Maps, 3rd edn. ISBN 3540679219 (Springer-Verlag New York, Inc. Secaucus, NJ, USA, 2001)

    Google Scholar 

  8. T. Kohonen, Learning vector quantization. Neural Netw. 1, 303 (1988)

    Article  Google Scholar 

  9. T. Kohonen, Improved versions of learning vector quantization. Proc. Int. Joint Conf. Neural Netw. (IJCNN) 1, 545–550 (1990)

    Google Scholar 

  10. J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci 79, 2554–2558 (1982)

    Article  MathSciNet  Google Scholar 

  11. R.P. Lippmann, B. Gold, M.L. Malpass, A comparison of hamming and hopfield neural nets for pattern classification. Technical Report ESDTR-86-131,769 (MIT, Lexington, MA, 1987)

    Google Scholar 

  12. J.J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Nat. Acad. Sci 81, 3088–3092 (1984)

    Article  Google Scholar 

  13. D.H. Ackley, G.E. Hinton, T.J. Sejnowski, A learning algorithm for boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)

    Google Scholar 

  14. J.A. Hertz, A.S. Krogh, R. Palmer, Introduction to the Theory of Neural Computation. (Addison-Wesley, Redwood City, CA, USA, 1991). ISBN 0-201-50395-6

    Google Scholar 

  15. R. Ra\(\acute{e}\)l, Neural Networks: A Systematic Introduction (Springer Science and Business Media, Berlin, 1996)

    Google Scholar 

  16. S. Paul, Information processing in dynamical systems: Foundations of harmony theory, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, eds. by D.E. Rumelhart, J.L. McLelland, vol. 1, Chapter 6 (MIT Press, Cambridge, 1986), pp. 194–281

    Google Scholar 

  17. G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  18. D.C. Ciresan, U. Meier, J. Masci, L.M. Gambardella, J. Schmidhuber, Flexible, high performance convolutional neural networks for image classification, in In International Joint Conference on Artificial Intelligence, pp. 1237–1242, 2011

    Google Scholar 

  19. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, eds. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger, vol. 25 (Curran Associates, Inc., Red Hook, NY, 2012), pp. 1097–1105

    Google Scholar 

  20. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  21. M. Tomáš, Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012

    Google Scholar 

  22. Y. Lecun, Y. Bengio, G. Hinton. Nature, 521(7553), 436–444 (2015). ISSN 0028-0836. http://doi.org/10.1038/nature14539

  23. L. Yann, B.E. Boser, J.S. Denker, D. Henderson, R.E. Howard, W.E. Hubbard, L.D. Jackel, Handwritten digit recognition with a back-propagation network, in Advances in Neural Information Processing Systems, ed. by D.S. Touretzky, vol. 2 (Morgan-Kaufmann, Burlington, 1990), pp. 396–404

    Google Scholar 

  24. L. Yann, B. Leon, Y. Bengio, H. Patrick, Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Google Scholar 

  25. A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network acoustic models. Int. Conf. Mach. Learn. 30, 3 (2013)

    Google Scholar 

  26. K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in International Conference on Computer Vision ICCV, 2015a

    Google Scholar 

  27. D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (elus), in ICLR, 2016

    Google Scholar 

  28. X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks. AISTATS 9, 249–256 (2010)

    Google Scholar 

  29. D. Mishkin, J. Matas, All you need is a good init. CoRR (2015). http://arxiv.org/abs/1511.06422

  30. P. Krähenbühl, C. Doersch, J. Donahue, T. Darrell, Data-dependent initializations of convolutional neural networks, CoRR (2015). http://arxiv.org/abs/1511.06856

  31. D. Sussillo, Random walks: Training very deep nonlinear feed-forward networks with smart initialization. CoRR (2014). http://arxiv.org/abs/1412.6558

  32. Q. Liao, T. Poggio, Theory of deep learning ii: Landscape of the empirical risk in deep learning. Technical Report Memo No. 066 (Center for Brains, Minds and Machines (CBMM), 2017)

    Google Scholar 

  33. D. John, H. Elad, S. Yoram, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). ISSN 1532-4435. http://dl.acm.org/citation.cfm?id=1953048.2021068

  34. S. Ruder, An overview of gradient descent optimization algorithms. CoRR (2016). http://arxiv.org/abs/1609.04747

  35. D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, CoRR (2014). http://arxiv.org/abs/1412.6980

  36. V. Patel, Kalman-based stochastic gradient method with stop condition and insensitivity to conditioning. SIAM J. Optim. 26(4), 2620–2648 (2016). https://doi.org/10.1137/15M1048239

  37. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html

  38. S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, CoRR (2015). http://arxiv.org/abs/1502.03167

  39. M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, CoRR (2013). http://arxiv.org/abs/1311.2901

  40. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, In Computer Vision and Pattern Recognition (CVPR) (IEEE, Boston, MA, 2015). http://arxiv.org/abs/1409.4842

  41. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. CoRR (2014). http://arxiv.org/abs/1409.1556

  42. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. CoRR, 2015b. http://arxiv.org/abs/1512.03385

  43. M. Del Coco, P. Carcagn, M. Leo, P. Spagnolo, P. L. Mazzeo, C. Distante, Multi-branch cnn for multi-scale age estimation, in International Conference on Image Analysis and Processing, pp. 234–244, 2017

    Google Scholar 

  44. M. Del Coco, P. Carcagn, M. Leo, P. L. Mazzeo, P. Spagnolo, C. Distante, Assessment of deep learning for gender classification on traditional datasets, in In Advanced Video and Signal Based Surveillance (AVSS), pp. 271–277, 2016

    Google Scholar 

  45. K. He, X. Zhang, S. Ren, J. Sun, Identity mappings in deep residual networks, CoRR, 2016. http://arxiv.org/abs/1603.05027

  46. S. Xie, R.B. Girshick, P. Dollár, Z. Tu, K. He, Aggregated residual transformations for deep neural networks. CoRR (2016). http://arxiv.org/abs/1611.05431

  47. G. Huang, Z. Liu, K.Q. Weinberger, Densely connected convolutional networks, CoRR, 2016a. http://arxiv.org/abs/1608.06993

  48. G. Huang, Y. Sun, Z. Liu, D. Sedra, K.Q. Weinberger, Deep networks with stochastic depth, CoRR, 2016b. http://arxiv.org/abs/1603.09382

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arcangelo Distante .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Distante, A., Distante, C. (2020). RBF, SOM, Hopfield, and Deep Neural Networks. In: Handbook of Image Processing and Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-42378-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-42378-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-42377-3

  • Online ISBN: 978-3-030-42378-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics