Principal Components Identify MLP Hidden Layer Size for Optimal Generalisation Performance

  • M. Girolami
Conference paper


One of the major concerns when implementing a supervised artificial neural network solution to a classification or prediction problem, is the network’s performance on unseen data. The phenomenon of the network overfitting the training data, is understood and reported in the literature. Most researchers recommend a ‘trial and error’ approach to selecting the optimal number of weights for the network, which is time consuming, or start with a large network and prune to an optimal size. Current pruning techniques based on approximations of the Hessian matrix of the error surface are computationally intensive and prone to severe approximation errors if a suitable minimal training error has not been achieved. We propose a novel and simple design heuristic for a three layer multi-layer perceptron (MLP) based on an eigenvalue decomposition of the covariance matrix of the middle layer output. This technique identifies the neurons which are contributing to the redundancy of data through the network and as such are additional effective network parameters which have a deleterious effect on the classifier surface smoothness. This technique identifies redundancy in the network data and so is not dependant on the network training having reached a minimal error value making the Levenberg-Marquardt approximation valid. We report on simulations using the double-convex benchmark which show the utility of the proposed method.


Middle Layer Hessian Matrix Error Surface Robust Principal Component Analysis Saliency Measure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    C. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.Google Scholar
  2. [2]
    C. Bishop. Regularization and complexity control in feedforward networks. In International Conference on Artificial Neural Networks, volume 1, pages 141–148, 1995.MathSciNetGoogle Scholar
  3. [3]
    B. Hassibi, D.G. Stork, and G. Wolff. Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks, volume 1, pages 293–299, 1992.Google Scholar
  4. [4]
    S. Haykin. Neural Networks: A Comprehensive Foundation. MacMillan Publishing, 1995.Google Scholar
  5. [5]
    J. Karhunen and J. Joutsensalo. Generalisations of principal component analysis, optimisation problems and neural networks. Neural Networks, 8(4):549–562, 1995.CrossRefGoogle Scholar
  6. [6]
    Y. Le Cun, J.S. Denker, and S.A. Solla. Optimal brain damage. Advances in Neural Information Processing Systems, 2:598–605, 1990.Google Scholar
  7. [7]
    J.E. Moody. The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems. In Advances in Neural Informations Processing Systems, pages 847–854. Morgan Kauffmann, 1992.Google Scholar
  8. [8]
    J.E. Moody, A.U. Leen, and T.K. Leen. Fast pruning using principal components. In Advances In Neural Information Processing, volume 6. Morgan Kauffmann, 1994.Google Scholar
  9. [9]
    R. Shiavi. Introduction to Applied Statistical Signal Analysis. Aksen Associates Incorporated Publishers, Irwin, 1991.Google Scholar

Copyright information

© Springer-Verlag Wien 1998

Authors and Affiliations

  • M. Girolami
    • 1
  1. 1.Department of Computing and Information SystemsUniversity of PaisleyPaisleyScotland

Personalised recommendations