ICANN ’93 pp 533-538 | Cite as

Guaranteed Convergence of Learning in Neural Networks

  • Tom M. Heskes
Conference paper


This paper describes schedules for the learning parameter that guarantee convergence to the optimal solution. It focuses on the diifference between local and global optimization, i.e., learning in the presence of just one minimum and learning in the presence of several minima. In case of one minimum, the fastest possible cooling is an algebraic function of the number of learning steps, whereas in case of several minima the cooling must be “exponentially slow”.


Neural Network Network State Master Equation Physical Review Error Potential 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    T. Heskes and B. Kappen. Learning processes in neural networks. Physical Review A, 44: 2718–2726, 1991.CrossRefGoogle Scholar
  2. [2]
    T. Heskes and B. Kappen. Learning-parameter adjustment in neural networks. Physical Review A, 45: 8885–8893, 1992.CrossRefGoogle Scholar
  3. [3]
    T. Heskes, E. Slijpen, and B. Kappen. Learning in neural networks with local minima. Physical Review A, 46: 5221–5231, 1992.CrossRefGoogle Scholar
  4. [4]
    T. Heskes, E. Slijpen, and B. Kappen. Cooling schedules for learning in neural networks. Physical Review E, 1993.Google Scholar
  5. [5]
    D. Rumelhart, G. Hinton, and R. Williams. Learning representations by backpropagating errors. Nature, 323: 533–536, 1986.CrossRefGoogle Scholar
  6. [6]
    T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43: 59–69, 1982.MathSciNetCrossRefMATHGoogle Scholar
  7. [7]
    C. Darken and J. Moody. Note on learning rate schedules for stochastic optimization. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 832–838, San Mateo, 1990. Morgan Kaufmann.Google Scholar
  8. [8]
    Y. Kasbashima and S. Shinomoto. Learning a decision boundary from stochastic examples: incremental algorithms with and without queries. Preprint Kyoto University, 1992.Google Scholar
  9. [9]
    N. van Kampen. Stochastic processes in physics and chemistry. North-Holland, Amsterdam, 1981.MATHGoogle Scholar
  10. [10]
    H. Kushner. Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: global minimization via Monte Carlo. SIAM Journal of Applied Mathematics, 47: 169–185, 1987.MathSciNetCrossRefMATHGoogle Scholar
  11. [11]
    S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220: 671–680, 1983.MathSciNetCrossRefMATHGoogle Scholar
  12. [12]
    B. Hajek. Cooling schedules for optimal annealing. Mathematics of Operations Research, 13: 311–329, 1988.MathSciNetCrossRefMATHGoogle Scholar
  13. [13]
    T. Heskes and B. Kappen. On-line learning processes in artificial neural networks. In J. Taylor, editor, Mathematical Foundations of Neural Networks. Elsevier, Amsterdam, 1993.Google Scholar

Copyright information

© Springer-Verlag London Limited 1993

Authors and Affiliations

  • Tom M. Heskes
    • 1
  1. 1.Department of Medical Physics and BiophysicsUniversity of NijmegenNijmegenThe Netherlands

Personalised recommendations