Abstract
It has long been known that neural networks can learn faster when their input and hidden unit activities are centered about zero; recently we have extended this approach to also encompass the centering of error signals [15]. Here we generalize this notion to all factors involved in the network’s gradient, leading us to propose centering the slope of hidden unit activation functions as well. Slope centering removes the linear component of backpropagated error; this improves credit assignment in networks with shortcut connections. Benchmark results show that this can speed up learning significantly without adversely affecting the trained network’s generalization ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. Anderson and E. Rosenfeld, editors. Neurocomputing: Foundations of Research. MIT Press, Cambridge, 1988.
R. Battiti. Accelerated back-propagation learning: Two optimization methods. Complex Systems, 3:331–342, 1989.
R. Battiti. First-and second-order methods for learning: Between steepest descent and Newton’s method. Neural Computation, 4(2):141–166, 1992.
E. Bienenstock, L. Cooper, and P. Munro. Theory for the development of neuron selectivity: Orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2, 1982. Reprinted in [1].
D. H. Deterding. Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge, 1989.
M. Finke and K.-R. Müller. Estimating a-posteriori probabilities using stochastic network models. In M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Boulder, CO, 1994. Lawrence Erlbaum Associates, Hillsdale, NJ.
T. J. Hastie and R. J. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):607–616, 1996.
M. Herrmann. On the merits of topography in neural maps. In T. Kohonen, editor, Proceedings of the Workshop on Self-Organizing Maps, pages 112–117. Helsinki University of Technology, 1997.
S. Hochreiter and J. Schmidhuber. Feature extraction through lococode. To appear in Neural Computation, 1998.
N. Intrator. Feature extraction using an unsupervised neural network. Neural Computation, 4(1):98–107, 1992.
A. Lapedes and R. Farber. A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition. Physica, D 22:247–259, 1986.
Y. LeCun, I. Kanter, and S. A. Solla. Eigenvalues of covariance matrices: Application to neural-network learning. Physical Review Letters, 66(18):2396–2399, 1991.
A. J. Robinson. Dynamic Error Propagation Networks. PhD thesis, University of Cambridge, 1989.
N. N. Schraudolph and T. J. Sejnowski. Unsupervised discrimination of clustered data via optimization of binary information gain. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 499–506. Morgan Kaufmann, San Mateo, CA, 1993.
N. N. Schraudolph and T. J. Sejnowski. Tempering backpropagation networks: Not all weights are created equal. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 563–569. The MIT Press, Cambridge, MA, 1996.
T. J. Sejnowski. Storing covariance with nonlinearly interacting neurons. Journal of Mathematical Biology, 4:303–321, 1977.
S. Shah, F. Palmieri, and M. Datum. Optimal filtering algorithms for fast learning in feedforward neural networks. Neural Networks, 5:779–787, 1992.
J. B. Tenenbaum and W. T. Freeman. Separating style and content. In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, volume 9, pages 662–668. The MIT Press, Cambridg, MA, 1997.
P. D. Turney. Exploiting context when learning to classify. In Proceedings of the European Conference on Machine Learning, pages 402–407, 1993.
P. D. Turney. Robust classification with context-sensitive features. In Proceedings of the Sixth International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pages 268–276, 1993.
T. P. Vogl, J. K. Mangis, A. K. Rigler, W. T. Zink, and D. L. Alkon. Accelerating the convergence of the back-propagation method. Biological Cybernetics, 59:257–263, 1988.
B. Widrow, J. M. McCool, M. G. Larimore, and C. R. Johnson, Jr. Stationary and nonstationary learning characteristics of the LMS adaptive filter. Proceedings of the IEEE, 64(8):1151–1162, 1976.
H. G. Zimmermann. Neuronale Netze als Entscheidungskalkül. In H. Rehkugler and H. G. Zimmermann, editors, Neuronale Netze in der ökonomie: Grundlagen und finanzwirtschaftliche Anwendungen, pages 1–87. Vahlen Verlag, Munich, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Schraudolph, N.N. (1998). Centering Neural Network Gradient Factors. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_11
Download citation
DOI: https://doi.org/10.1007/3-540-49430-8_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65311-0
Online ISBN: 978-3-540-49430-0
eBook Packages: Springer Book Archive