Centering Neural Network Gradient Factors

Schraudolph, Nicol N.

doi:10.1007/3-540-49430-8_11

Nicol N. Schraudolph⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

5446 Accesses
15 Citations
2 Altmetric

Abstract

It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [15]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network’s generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Anderson and E. Rosenfeld, editors. Neurocomputing: Foundations of Research. MIT Press, Cambridge, 1988.
Google Scholar
R. Battiti. Accelerated back-propagation learning: Two optimization methods. Complex Systems, 3:331–342, 1989.
MATH Google Scholar
R. Battiti. First-and second-order methods for learning: Between steepest descent and Newton’s method. Neural Computation, 4(2):141–166, 1992.
Article Google Scholar
E. Bienenstock, L. Cooper, and P. Munro. Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2, 1982. Reprinted in [1].
Google Scholar
D. H. Deterding. Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge, 1989.
Google Scholar
M. Finke and K.-R. Müller. Estimating a-posteriori probabilities using stochastic network models. In M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Boulder, CO, 1994. Lawrence Erlbaum Associates, Hillsdale, NJ.
Google Scholar
T. J. Hastie and R. J. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):607–616, 1996.
Article Google Scholar
M. Herrmann. On the merits of topography in neural maps. In T. Kohonen, editor, Proceedings of the Workshop on Self-Organizing Maps, pages 112–117. Helsinki University of Technology, 1997.
Google Scholar
S. Hochreiter and J. Schmidhuber. Feature extraction through lococode. To appear in Neural Computation, 1998.
Google Scholar
N. Intrator. Feature extraction using an unsupervised neural network. Neural Computation, 4(1):98–107, 1992.
Article MathSciNet Google Scholar
A. Lapedes and R. Farber. A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition. Physica, D 22:247–259, 1986.
MathSciNet Google Scholar
Y. LeCun, I. Kanter, and S. A. Solla. Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters, 66(18):2396–2399, 1991.
Article Google Scholar
A. J. Robinson. Dynamic Error Propagation Networks. PhD thesis, University of Cambridge, 1989.
Google Scholar
N. N. Schraudolph and T. J. Sejnowski. Unsupervised discrimination of clustered data via optimization of binary information gain. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 499–506. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
N. N. Schraudolph and T. J. Sejnowski. Tempering backpropagation networks: Not all weights are created equal. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 563–569. The MIT Press, Cambridge, MA, 1996.
Google Scholar
T. J. Sejnowski. Storing covariance with nonlinearly interacting neurons. Journal of Mathematical Biology, 4:303–321, 1977.
Article Google Scholar
S. Shah, F. Palmieri, and M. Datum. Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Networks, 5:779–787, 1992.
Article Google Scholar
J. B. Tenenbaum and W. T. Freeman. Separating style and content. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, pages 662–668. The MIT Press, Cambridg, MA, 1997.
Google Scholar
P. D. Turney. Exploiting context when learning to classify. In Proceedings of the European Conference on Machine Learning, pages 402–407, 1993.
Google Scholar
P. D. Turney. Robust classification with context-sensitive features. In Proceedings of the Sixth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pages 268–276, 1993.
Google Scholar
T. P. Vogl, J. K. Mangis, A. K. Rigler, W. T. Zink, and D. L. Alkon. Accelerating the convergence of the back-propagation method. Biological Cybernetics, 59:257–263, 1988.
Article Google Scholar
B. Widrow, J. M. McCool, M. G. Larimore, and C. R. Johnson, Jr. Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proceedings of the IEEE, 64(8):1151–1162, 1976.
Article MathSciNet Google Scholar
H. G. Zimmermann. Neuronale Netze als Entscheidungskalkül. In H. Rehkugler and H. G. Zimmermann, editors, Neuronale Netze in der ökonomie: Grundlagen und finanzwirtschaftliche Anwendungen, pages 1–87. Vahlen Verlag, Munich, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

IDSIA, Corso Elvezia 36, 6900, Lugano, Switzerland
Nicol N. Schraudolph

Authors

Nicol N. Schraudolph
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Willamette University, Salem, OR, 97301, USA
Genevieve B. Orr
GMD First (Forschungszentrum Informationstechnik), Rudower Chaussee 5, D-12489, Berlin, Germany
Klaus-Robert Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schraudolph, N.N. (1998). Centering Neural Network Gradient Factors. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_11

Download citation

DOI: https://doi.org/10.1007/3-540-49430-8_11
Published: 28 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics