Overtraining in Single-Layer Perceptrons
The “overtraining” takes origin in different surfaces of learning-set and test-set cost functions. Essential peculiarities of non-linear single-layer perceptron training are the growth of magnitude of weights with an increase in number of iterations and, as a consequence, a change of the criterion used to find the weights. Therefore in non-linear SLP training one can obtain various statistical classification rules of different complexity and, the overtraining preventing problem can be analyzed as a problem of selecting the proper type of statistical classifier. Which classifier is the best one for a given situation depends on the number of features, data size and its configuration. In order to obtain a wider range of classifiers in non-linear SLP training, several new complexity control procedures are suggested.
Key wordsovertraining optimization criterion maximal margin targets scissors effect regularization anti-regularization generalization learning set size dimensionality.
Unable to display preview. Download preview PDF.
- 1.Rumelhart D.E., G.E.Hinton and R.J.Williams: Learning Internal Representations by Error Propagation, in: Parallel distributed processing: Explorations in the microstructure of cognition, vol. I, Bradford Books, Cambridge, MA, 1986, 318–362.Google Scholar
- 2.Cramer, G.: Mathematical methods of statistics, Princeton University Press, Princeton, New York 1946.Google Scholar
- 3.Raudys S.: Linear classifiers in perceptron design. Proceedings 13th ICPR, Track D, August 25–29, 1996, Viena, IEEE Publication.Google Scholar
- 4.Sebestyen, G.S.: Decision-making process in pattern recognition, Mcmillian, NY 1962.Google Scholar
- 5.McLachlan G.: Discriminant Analysis and Statistical Pattern recognition, Willey, NY 1992.Google Scholar
- 6.Raudys S.: A negative weight decay or antiregularization, Proc. ICANN’95, Oct. 1995, Paris, Vol. 2, 449–454.Google Scholar
- 9.Raudys, S.: On determining training sample size of a linear classifier, in: Computing Systems, 28 (Ed. N. Zagoruiko ), Institute of Mathematics, Academy of Sciences USSR, Nauka, Novosibirsk, (1967), 79–87 (in Russian).Google Scholar
- 11.Raudys, S.: On the amount of a priori information in designing the classification algorithm, Proc. Acad. of Sciences of the USSR, Technical. Cybernetics, N4, Nauka, Moscow, (1972), 168–174 (in Russian).Google Scholar
- 12.Raudys, S.: On the problems of sample size in pattern recognition, Proc. 2nd All-Union. Conf. Statistical Methods in Control Theory (Ed. V.S. Pugatchev ), Nauka, Moscow, (1970), 64–67 (in Russian).Google Scholar
- 14.Jain, A. Chandrasekaran, B.: Dimensionality and sample size considerations in pattern recognition practice, Handbook of Statistics, 2, North Holland, 1982, 835–855.Google Scholar
- 15.Raudys, S. and A.K. Jain: Small sample size effects in statistical pattern recognition: Recommendations for practitioners, IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-13, (1991), 252–264.Google Scholar
- 16.Raudys, S., M.Skurikhina, T.Cibas, P.Gallinari: Ridge estimates of the covariance matrix and regularization of an artificial neural network classifier, Pattern Recognition and Image Processing, Int. J. of Russian Academy of Sciences, M.scow, No 4, 1995.Google Scholar
- 17.Raudys, S. and V. Diciunas V.: Expected error of minimum empirical error and maximal margin classifiers, Proceedings 13th ICPR, Track B, August 1996, Wien. IEEE Publ.Google Scholar
- 18.Mao, J. and A.Jain: Regularization techniques in artificial neural networks, Proc. World Congress on Neural Networks, July 1993, Portland.Google Scholar
- 20.Bishop C.M.: Regularization and complexity control in feed-forward networks. Proceedings ICANN’95, 1, Oct. 1995, Paris, (1995), 141–148.Google Scholar
- 21.Canu S.: Apprentissage et approximation: les techniques de regularisation. Cours de DEA, Chapitre 1, Univ. Technologie de Compiegne, 1995.Google Scholar
- 23.Raudys, S. and F.Fogelman-Soulie: Means of controling the complexity in ANN classifier design. An overview. Submitted to print, 1996.Google Scholar