Advertisement

Neural Network Learning Algorithms

  • Terrence J. Sejnowski
Part of the Springer Study Edition book series (volume 41)

Abstract

The earliest network models of associative memory were based on correlations between input and output patterns of activity in linear processing units. These models have several features that make them attractive: The synaptic strengths are computed from information available locally at each synapse in a single trial; the information is distributed in a large number of connection strengths, the recall of stored information is associative, and the network can generalize to new input patterns that are similar to stored patterns. There are also severe limitations with this class of linear associative matrix models, including interference between stored items, especially between ones that are related, and inability to make decisions that are contingent on several inputs. New neural network models and neural network learning algorithms have been introduced recently that overcome some of the shortcomings of the associative matrix models of memory. These learning algorithms require many training examples to create the internal representations needed to perform a difficult task and generalize properly. They share some properties with human skill acquisition.

Keywords

Input Pattern Associative Memory Hide Unit Output Unit Synaptic Strength 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackley, D. H., Hinton, G. E. &Sejnowski, T. J., 1985. A learning algorithm for Boltzmann machines, Cognitive Science 9, 147–169.CrossRefGoogle Scholar
  2. Alspector, J. &Allen, R. B., A VLSI model of neural nets, Bellcore Technical Memorandum TM ARH002688.Google Scholar
  3. Anderson, J. R., 1987, Development of an analog neural network model of computation, University of Texas Department of Computer Science Technical Report TR-87-15.Google Scholar
  4. Arbib, M. A., 1987, Brains, Machines &Mathematics,2nd edition, New York: McGraw-Hill PressMATHGoogle Scholar
  5. Anderson, J. A., 1970, Two models for memory organization using interacting traces, Mathematical Biosciences 8, 137–160.CrossRefGoogle Scholar
  6. Anderson, J. A. &Mozer, M. C, 1981, Categorization and selective neurons, In: Parallel models of associative memory, Hinton, G. E. &Anderson, J. A., (Eds.) Hillsdale, N. J.: Erlbaum Associates.Google Scholar
  7. Ballard, D. H., Hinton, G. E., &Sejnowski, T. J., 1983. Parallel visual computation, Nature 306: 21–26.CrossRefGoogle Scholar
  8. Barto, A. G., 1985, Learning by statistical cooperation of self-interested neuron-like computing elements, Human Neurobiology 4, 229–256.Google Scholar
  9. Bienenstock, E. L., Cooper, L. N. &Munro, P. W., 1982, Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex, Journal of Neuroscience 2, 32–48.Google Scholar
  10. Chauvet, G., 1986, Habituation rules for a theory of the cerebellar cortex, Biol. Cybernetics 55, 201–209.MATHGoogle Scholar
  11. Churchland, P. S., Koch, C. &Sejnowski, T. J., 1988, What is computational neuroscience?, In: Computational Neuroscience, E. Schwartz (Ed.), Cambridge: MIT Press.Google Scholar
  12. Cohen, M. A. &Grossberg, S., 1983, Absolute stability of global pattern formation and parallel memory storage by competitive neural networks, IEEE Transaction on Systems, Man and Cybernetics, 13, 815–825.MathSciNetMATHGoogle Scholar
  13. Cooper, L. N., Liberman, F. &Oja, E., 1979, A theory for the acquisition and loss of neuron specificity in visual cortex, Biological Cybernetics 33, 9–28.CrossRefMathSciNetMATHGoogle Scholar
  14. Divko, R. &Schulten, K., 1986, Stochastic spin models for pattern recognition, In: Neural Networks for Computing, AJP conference Proceedings 151, J. S. Denker, Ed., New York: American Institute of Physics, 129–134.Google Scholar
  15. Feldman, J. A., 1982, Dynamic connections in neural networks, Biological Cybernetics 46, 27–39.CrossRefGoogle Scholar
  16. Feldman, J. A., 1986, Neural representation of conceptual knowledge, Technical Report TR-189, University of Rochester Department of Computer Science.Google Scholar
  17. Feldman, J. A. &Ballard, D. H., 1982. Connectionist models and their properties, Cog. Sci. 6: 205–254.CrossRefGoogle Scholar
  18. Finkel, L. H. &Edelman, G. M., 1985, Interaction of synaptic modification rules within population of neurons, Proceedings of the National Academy of Sciences USA, 82, 1291–1295.CrossRefGoogle Scholar
  19. Gamba, A. L., Gamberini, G., Palmieri, G., &Sanna, R., 1961, Further experiments with PAPA, Nuovo Cimento Suppl., No. 2, 20, 221–231.CrossRefMATHGoogle Scholar
  20. Geman, S. &Geman, D., 1984, Stochastic relaxation, Gibbs distributions, and the Baysian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 79–92.Google Scholar
  21. Gluck, M. A. &Bower, G. H., 1987, From conditioning to category learning: An adaptive network model, in preparation.Google Scholar
  22. Gluck, M. A. &Thompson, R. F., 1987, Modeling the neural substrates of associative learning and memory: A computational approach, Psychological ReviewGoogle Scholar
  23. Golden, R. M., 1987, The “brain-state-in-a-box” neural model is a gradient descent algorithm, Journal of Mathematical Psychology, in press.Google Scholar
  24. Gorman, R. P. &Sejnowski, T. J., 1988, Learned classification of sonar targets using a massively-parallel network, IEEE Trans. Acous. Speech Signal Proc. (submitted).Google Scholar
  25. Grossberg, S., 1976, Adaptive pattern classification and universal recoding: I: Parallel development and coding of neural feature detectors. Biological Cybernetics 23, 121–134.CrossRefMathSciNetMATHGoogle Scholar
  26. Hebb, D. O., 1949, Organization of Behavior, New York: John Wiley &Sons.Google Scholar
  27. Hinton, G. E., 1986, Learning distributed representations of concepts, Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Hillsdale, New Jersey: Erl-baum, 1–12.Google Scholar
  28. Hinton, G. E. &Anderson, J. A., 1981, Parallel models of associative memory, Hillsdale, N. J.: Erlbaum Associates.Google Scholar
  29. Hinton, G. E. &Sejnowski, T. J., 1983. Optimal perceptual inference, Proceedings of the IEEE Computer Society Conference on Computer Vision &Pattern Recognition, Washington, D. C, 448–453.Google Scholar
  30. Hinton, G. E. &Sejnowski, T. J., 1986, Learning and relearning in Boltzmann machines, In: McClelland, J. L. &Rumelhart, D. E., 1986, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 2: Psychological and Biological Models. Cambridge: MIT Press, 282–317.Google Scholar
  31. Hopfield, J. J., 1982, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences USA 79: 2554–2558.CrossRefMathSciNetGoogle Scholar
  32. Hopfield, J. J., 1984, Neurons with graded response have collective computation abilities, Proceedings of the National Academy of Sciences USA 81: 3088–3092.CrossRefGoogle Scholar
  33. Hopfield, J. J. &Tank, D., 1985. “Neural” computation of decision in optimization problems, Biol. Cybernetics, 52, 141–152.MathSciNetMATHGoogle Scholar
  34. Hopfield, J. J. & Tank, D., 1986, Computing with neural circuits: A model, Science 233, 624–633.CrossRefGoogle Scholar
  35. Kahneman, D. &Tversky, A., 1972, Subjective probability: A judgement of representativeness, Cognitive Psychology 3, 430–454.CrossRefGoogle Scholar
  36. Kelso, S. R., Ganong, A. H., &Brown, T. H., 1986, Hebbian synapses in hippocampus, Proceedings of the National Academy of Sciences USA, 83 5326–5330.CrossRefGoogle Scholar
  37. Kienker, P. K., Sejnowski, T. J., Hinton, G. E. &Schumacher, L. E., 1986, Separating figure from ground with a parallel network, Perception 15, 197–216.CrossRefGoogle Scholar
  38. Klopf, A. H., 1986, A drive-reinforcement model of single neuron function: An alternative to the Hebbian neuronal model, In: Neural Networks for Computing, J. S. Denker (Ed.), New York: American Institute of Physics, 265–270.Google Scholar
  39. Kohonen, T., 1970, Correlation matrix memories, IEEE Transactions on Computers, C-21, 353–359.CrossRefGoogle Scholar
  40. Kohonen, T., 1984, Self-Organization and Associative Memory, New York: Springer VerlagMATHGoogle Scholar
  41. Le Cun, Y., 1985, A learning procedure for asymmetric network, Proceedings of Cog-nitiva 85, 599–604. Paris.Google Scholar
  42. Levy, W. B., Anderson, J. A. &Lehmkuhle, W., 1984, Synaptic Change in the Nervous System, Hillsdale, New Jersey: Erlbaum.Google Scholar
  43. Levy, W. B., Brassel, S. E., &Moore, S. D., 1983, Partial quantification of the associative synaptic learning rule of the dentate gyrus, Neuroscience 8, 799–808.CrossRefGoogle Scholar
  44. Linsker, R., 1986, From basic network principles to neural architecture: Emergence of orientation columns, Proceedings of the National Academy of Sciences USA, 83, 8779–8783.CrossRefGoogle Scholar
  45. Lynch, G., 1986, Synapses, Circuits, and the Beginnings of Memory, Cambridge: MIT Press.Google Scholar
  46. McClelland, J. L. &Rumelhart, D. E., 1986, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 2: Psychological and Biological Models. Cambridge: MIT Press.Google Scholar
  47. Marr, D. &Poggio, T, 1976, Cooperative computation of stereo disparity, Science 194, 283–287.CrossRefGoogle Scholar
  48. McCulloch, W. S. &Pitts, W. H., 1943, A logical calculus of ideas immanent in nervous activity, Bull. Math. Biophysics, 5, 115–133.CrossRefMathSciNetMATHGoogle Scholar
  49. Minsky, M. &Papert, S., 1969. Perceptrons, Cambridge: MIT Press.MATHGoogle Scholar
  50. Palm, G., 1979, On representation and approximation of nonlinear systems, Part II: Discrete time, Biological Cybernetics, 34, 49–52.CrossRefMathSciNetMATHGoogle Scholar
  51. Parker, D. B, 1986, A comparison of algorithms for neuron-like cells, In: Neural Networks for Computing, J. S. Denker (Ed.), New York: American Institute of Physics, 327–332.Google Scholar
  52. Pearlmutter, B. A. &Hinton, G. E., 1986, G-Maximization: An unsupervised learning procedure for discovering regularities, In: Neural Networks for Computing, J. S. Denker (Ed.), New York: American Institute of Physics, 333–338.Google Scholar
  53. Prager, R. W., Harrison, T. D, &Fallside, F., 1986, Boltzmann machines for speech recognition, Computer Speech and Language 1, 3–27 (1987)Google Scholar
  54. Qian, N. &Sejnowski, T. J., 1988, Predicting the secondary structure of globular proteins using neural network models, J. Molec. Biol. (submitted).Google Scholar
  55. Rescorla, R. A. &Wagner, A. R., 1972, A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. In: A. H. Black & W. F.Prokasy (Eds.), Classical Conditioning II: Current Research and Theory, New York: Appleton-Crofts.Google Scholar
  56. Rolls, E. T., 1986, Information representation, processing and storage in the brain: Analysis at the single neuron level, In: Neural and Molecular Mechanisms of Learning, Berlin: Springer-Verlag.Google Scholar
  57. Rosenberg, C. R. &Sejnowski, T. J., 1986, The spacing effect on NETtalk, a massively-parallel network, Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Hillsdale, New Jersey: Lawrence Erlbaum Associates 72–89.Google Scholar
  58. Rosenblatt, F., 1959, Principles of Neurodynamics, New York: Spartan Books.Google Scholar
  59. Rumelhart, D. E., Hinton, G. E. &Williams, R, J., 1986. Learning internal representations by error propagation, In: Rumelhart, D. E. &McClelland, J. L., Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations. Cambridge: MIT Press.Google Scholar
  60. Rumelhart, D. E. &McClelland, J. L., 1986, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations. Cambridge: MIT Press.Google Scholar
  61. Rumelhart, D. E. &Zipser, D., 1985, Feature discovery by competitive learning, Cognitive Science 9, 75–112.CrossRefGoogle Scholar
  62. Sejnowski, T. J., 1977a, Statistical constraints on synaptic plasticity, J. Math. Biology 69, 385–389.Google Scholar
  63. Sejnowski, T. J., 1977b, Storing covariance with nonlinearly interacting neurons, J. Math. Biology 4, 303–321.CrossRefGoogle Scholar
  64. Sejnowski, T. J., 1981, Skeleton filters in the brain, In: Parallel models of associative memory, Hinton, G. E. &Anderson, J. A., (Eds.) Hillsdale, N. J.: Erlbaum Associates.Google Scholar
  65. Sejnowski, T. J., 1986, Open questions about computation in cerebral cortex, In: McClelland, J. L. &Rumelhart, D. E., Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 2: Psychological and Biological Models. Cambridge: MIT Press, 372–389.Google Scholar
  66. Sejnowski, T. J. &Hinton, G. E., 1987, Separating figure from ground with a Boltzmann Machine, In: Vision, Brain & Cooperative Computation, Eds. M. A. Arbib &A. R. Hanson (MIT Press, Cambridge).Google Scholar
  67. Sejnowski, T. J., Kienker, P. K. &Hinton, G. E., 1986, Learning symmetry groups with hidden units: Beyond the perceptron, Physica 22D, 260–275.MathSciNetGoogle Scholar
  68. Sejnowski, T. J. & Rosenberg, C. R., 1986, NETtalk: A parallel network that learns to read aloud, Johns Hopkins University Department of Electrical Engineering and Computer Science Technical Report 86/01Google Scholar
  69. Smolensky, P., 1983, Schema selection and stochastic inference in modular environments, In: Proceedings of the National Conference on Artificial Intelligence, Los Altos, California: William Kauffman.Google Scholar
  70. Steinbuch, K., 1961, Die lernmatrix, Kybernetik 1, 36–45.CrossRefMATHGoogle Scholar
  71. Sutton, R. S. &Barto, A. G., 1981, Toward a modern theory of adaptive networks: Expectation and prediction, Psychological Review 88, 135–170.CrossRefGoogle Scholar
  72. Tesauro, G., 1986, Simple neural models of classical conditioning, Biological Cybernetics 55, 187–200.MathSciNetGoogle Scholar
  73. Tesauro, G. &Sejnowski, T. J., 1987, A parallel network that learns to play backgammon, Artificial Intelligence (submitted).Google Scholar
  74. Toulouse, G., Dehaene, S., &Changeux, J.-P., 1986, Spin glass model of learning by selection, Proceedings of the National Academy of Sciences USA, 83, 1695–1698.CrossRefMathSciNetGoogle Scholar
  75. von der Malsburg, C., &Bienenstock, E., 1986, A neural network for the retrieval of superimposed connection patterns, In: Disordered Systems and Biological Organization, F. Fogelman, F. Weisbuch, &E. Bienenstock, Eds., Springer-Verlag: Berlin.Google Scholar
  76. Widrow, G. &Hoff, M. E., 1960, Adaptive switching circuits, Institute of Radio Engineers Western Electronic Show and Convention, Convention Record 4, 96–194.Google Scholar
  77. Wilshaw, D., 1981, Holography, associative memory, and inductive generalization, In: Hinton, G. E. &Anderson, J. A., Parallel Models of Associative Memory, Hillsdale, New Jersey: Lawrence Erlbaum Associates.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1989

Authors and Affiliations

  • Terrence J. Sejnowski
    • 1
  1. 1.Department of BiophysicsJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations