Principal Components Identify MLP Hidden Layer Size for Optimal Generalisation Performance
One of the major concerns when implementing a supervised artificial neural network solution to a classification or prediction problem, is the network’s performance on unseen data. The phenomenon of the network overfitting the training data, is understood and reported in the literature. Most researchers recommend a ‘trial and error’ approach to selecting the optimal number of weights for the network, which is time consuming, or start with a large network and prune to an optimal size. Current pruning techniques based on approximations of the Hessian matrix of the error surface are computationally intensive and prone to severe approximation errors if a suitable minimal training error has not been achieved. We propose a novel and simple design heuristic for a three layer multi-layer perceptron (MLP) based on an eigenvalue decomposition of the covariance matrix of the middle layer output. This technique identifies the neurons which are contributing to the redundancy of data through the network and as such are additional effective network parameters which have a deleterious effect on the classifier surface smoothness. This technique identifies redundancy in the network data and so is not dependant on the network training having reached a minimal error value making the Levenberg-Marquardt approximation valid. We report on simulations using the double-convex benchmark which show the utility of the proposed method.
KeywordsMiddle Layer Hessian Matrix Error Surface Robust Principal Component Analysis Saliency Measure
Unable to display preview. Download preview PDF.
- C. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995.Google Scholar
- B. Hassibi, D.G. Stork, and G. Wolff. Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks, volume 1, pages 293–299, 1992.Google Scholar
- S. Haykin. Neural Networks: A Comprehensive Foundation. MacMillan Publishing, 1995.Google Scholar
- Y. Le Cun, J.S. Denker, and S.A. Solla. Optimal brain damage. Advances in Neural Information Processing Systems, 2:598–605, 1990.Google Scholar
- J.E. Moody. The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems. In Advances in Neural Informations Processing Systems, pages 847–854. Morgan Kauffmann, 1992.Google Scholar
- J.E. Moody, A.U. Leen, and T.K. Leen. Fast pruning using principal components. In Advances In Neural Information Processing, volume 6. Morgan Kauffmann, 1994.Google Scholar
- R. Shiavi. Introduction to Applied Statistical Signal Analysis. Aksen Associates Incorporated Publishers, Irwin, 1991.Google Scholar