Skip to main content

Centering Neural Network Gradient Factors

  • Chapter
  • First Online:
Neural Networks: Tricks of the Trade

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

Abstract

It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [15]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network’s generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Anderson and E. Rosenfeld, editors. Neurocomputing: Foundations of Research. MIT Press, Cambridge, 1988.

    Google Scholar 

  2. R. Battiti. Accelerated back-propagation learning: Two optimization methods. Complex Systems, 3:331–342, 1989.

    MATH  Google Scholar 

  3. R. Battiti. First-and second-order methods for learning: Between steepest descent and Newton’s method. Neural Computation, 4(2):141–166, 1992.

    Article  Google Scholar 

  4. E. Bienenstock, L. Cooper, and P. Munro. Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2, 1982. Reprinted in [1].

    Google Scholar 

  5. D. H. Deterding. Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge, 1989.

    Google Scholar 

  6. M. Finke and K.-R. Müller. Estimating a-posteriori probabilities using stochastic network models. In M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Boulder, CO, 1994. Lawrence Erlbaum Associates, Hillsdale, NJ.

    Google Scholar 

  7. T. J. Hastie and R. J. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):607–616, 1996.

    Article  Google Scholar 

  8. M. Herrmann. On the merits of topography in neural maps. In T. Kohonen, editor, Proceedings of the Workshop on Self-Organizing Maps, pages 112–117. Helsinki University of Technology, 1997.

    Google Scholar 

  9. S. Hochreiter and J. Schmidhuber. Feature extraction through lococode. To appear in Neural Computation, 1998.

    Google Scholar 

  10. N. Intrator. Feature extraction using an unsupervised neural network. Neural Computation, 4(1):98–107, 1992.

    Article  MathSciNet  Google Scholar 

  11. A. Lapedes and R. Farber. A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition. Physica, D 22:247–259, 1986.

    MathSciNet  Google Scholar 

  12. Y. LeCun, I. Kanter, and S. A. Solla. Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters, 66(18):2396–2399, 1991.

    Article  Google Scholar 

  13. A. J. Robinson. Dynamic Error Propagation Networks. PhD thesis, University of Cambridge, 1989.

    Google Scholar 

  14. N. N. Schraudolph and T. J. Sejnowski. Unsupervised discrimination of clustered data via optimization of binary information gain. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 499–506. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  15. N. N. Schraudolph and T. J. Sejnowski. Tempering backpropagation networks: Not all weights are created equal. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 563–569. The MIT Press, Cambridge, MA, 1996.

    Google Scholar 

  16. T. J. Sejnowski. Storing covariance with nonlinearly interacting neurons. Journal of Mathematical Biology, 4:303–321, 1977.

    Article  Google Scholar 

  17. S. Shah, F. Palmieri, and M. Datum. Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Networks, 5:779–787, 1992.

    Article  Google Scholar 

  18. J. B. Tenenbaum and W. T. Freeman. Separating style and content. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, pages 662–668. The MIT Press, Cambridg, MA, 1997.

    Google Scholar 

  19. P. D. Turney. Exploiting context when learning to classify. In Proceedings of the European Conference on Machine Learning, pages 402–407, 1993.

    Google Scholar 

  20. P. D. Turney. Robust classification with context-sensitive features. In Proceedings of the Sixth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pages 268–276, 1993.

    Google Scholar 

  21. T. P. Vogl, J. K. Mangis, A. K. Rigler, W. T. Zink, and D. L. Alkon. Accelerating the convergence of the back-propagation method. Biological Cybernetics, 59:257–263, 1988.

    Article  Google Scholar 

  22. B. Widrow, J. M. McCool, M. G. Larimore, and C. R. Johnson, Jr. Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proceedings of the IEEE, 64(8):1151–1162, 1976.

    Article  MathSciNet  Google Scholar 

  23. H. G. Zimmermann. Neuronale Netze als Entscheidungskalkül. In H. Rehkugler and H. G. Zimmermann, editors, Neuronale Netze in der ökonomie: Grundlagen und finanzwirtschaftliche Anwendungen, pages 1–87. Vahlen Verlag, Munich, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Schraudolph, N.N. (1998). Centering Neural Network Gradient Factors. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-49430-8_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65311-0

  • Online ISBN: 978-3-540-49430-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics