Overtraining in Single-Layer Perceptrons

  • S. Raudys
Conference paper
Part of the International Centre for Mechanical Sciences book series (CISM, volume 382)


The “overtraining” takes origin in different surfaces of learning-set and test-set cost functions. Essential peculiarities of non-linear single-layer perceptron training are the growth of magnitude of weights with an increase in number of iterations and, as a consequence, a change of the criterion used to find the weights. Therefore in non-linear SLP training one can obtain various statistical classification rules of different complexity and, the overtraining preventing problem can be analyzed as a problem of selecting the proper type of statistical classifier. Which classifier is the best one for a given situation depends on the number of features, data size and its configuration. In order to obtain a wider range of classifiers in non-linear SLP training, several new complexity control procedures are suggested.

Key words

overtraining optimization criterion maximal margin targets scissors effect regularization anti-regularization generalization learning set size dimensionality. 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rumelhart D.E., G.E.Hinton and R.J.Williams: Learning Internal Representations by Error Propagation, in: Parallel distributed processing: Explorations in the microstructure of cognition, vol. I, Bradford Books, Cambridge, MA, 1986, 318–362.Google Scholar
  2. 2.
    Cramer, G.: Mathematical methods of statistics, Princeton University Press, Princeton, New York 1946.Google Scholar
  3. 3.
    Raudys S.: Linear classifiers in perceptron design. Proceedings 13th ICPR, Track D, August 25–29, 1996, Viena, IEEE Publication.Google Scholar
  4. 4.
    Sebestyen, G.S.: Decision-making process in pattern recognition, Mcmillian, NY 1962.Google Scholar
  5. 5.
    McLachlan G.: Discriminant Analysis and Statistical Pattern recognition, Willey, NY 1992.Google Scholar
  6. 6.
    Raudys S.: A negative weight decay or antiregularization, Proc. ICANN’95, Oct. 1995, Paris, Vol. 2, 449–454.Google Scholar
  7. 7.
    Randles, R.H., J.D. Brofitt, I.S. Ramberg and R.V. Hogg: Generalised linear and quadratic discriminant functions using robust estimates. J. of American Statistical Association, 73, (1978), 564–568.CrossRefzbMATHGoogle Scholar
  8. 8.
    Cortes, C. and V.Vapnik: Support-vector networks, Machine Learning, 20, (1995), 273–297.zbMATHGoogle Scholar
  9. 9.
    Raudys, S.: On determining training sample size of a linear classifier, in: Computing Systems, 28 (Ed. N. Zagoruiko ), Institute of Mathematics, Academy of Sciences USSR, Nauka, Novosibirsk, (1967), 79–87 (in Russian).Google Scholar
  10. 10.
    Deev, A.D.: Representation of statistics of discriminant analysis and asymptotic expansions in dimensionalities comparable with sample size, Reports of Academy of Sciences of the USSR, 195, No. 4, (1970), 756–762, (in Russian).MathSciNetGoogle Scholar
  11. 11.
    Raudys, S.: On the amount of a priori information in designing the classification algorithm, Proc. Acad. of Sciences of the USSR, Technical. Cybernetics, N4, Nauka, Moscow, (1972), 168–174 (in Russian).Google Scholar
  12. 12.
    Raudys, S.: On the problems of sample size in pattern recognition, Proc. 2nd All-Union. Conf. Statistical Methods in Control Theory (Ed. V.S. Pugatchev ), Nauka, Moscow, (1970), 64–67 (in Russian).Google Scholar
  13. 13.
    Kanal L. and B.Chandrasekaran: On dimensionality and sample size in statistical pattern recognition, Pattern Recognition, 3, (1971), 238–255.CrossRefGoogle Scholar
  14. 14.
    Jain, A. Chandrasekaran, B.: Dimensionality and sample size considerations in pattern recognition practice, Handbook of Statistics, 2, North Holland, 1982, 835–855.Google Scholar
  15. 15.
    Raudys, S. and A.K. Jain: Small sample size effects in statistical pattern recognition: Recommendations for practitioners, IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-13, (1991), 252–264.Google Scholar
  16. 16.
    Raudys, S., M.Skurikhina, T.Cibas, P.Gallinari: Ridge estimates of the covariance matrix and regularization of an artificial neural network classifier, Pattern Recognition and Image Processing, Int. J. of Russian Academy of Sciences, M.scow, No 4, 1995.Google Scholar
  17. 17.
    Raudys, S. and V. Diciunas V.: Expected error of minimum empirical error and maximal margin classifiers, Proceedings 13th ICPR, Track B, August 1996, Wien. IEEE Publ.Google Scholar
  18. 18.
    Mao, J. and A.Jain: Regularization techniques in artificial neural networks, Proc. World Congress on Neural Networks, July 1993, Portland.Google Scholar
  19. 19.
    Reed R.: Pruning Algorithms–A Survey, IEEE Trans on Neural Networks, 4, (1993), 740–747.CrossRefGoogle Scholar
  20. 20.
    Bishop C.M.: Regularization and complexity control in feed-forward networks. Proceedings ICANN’95, 1, Oct. 1995, Paris, (1995), 141–148.Google Scholar
  21. 21.
    Canu S.: Apprentissage et approximation: les techniques de regularisation. Cours de DEA, Chapitre 1, Univ. Technologie de Compiegne, 1995.Google Scholar
  22. 22.
    Reed R., R.J. Marks II, S.Oh: Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter, IEEE Transaction on Neural Networks, 6, (1995), 529–538.CrossRefGoogle Scholar
  23. 23.
    Raudys, S. and F.Fogelman-Soulie: Means of controling the complexity in ANN classifier design. An overview. Submitted to print, 1996.Google Scholar

Copyright information

© Springer-Verlag Wien 1997

Authors and Affiliations

  • S. Raudys
    • 1
  1. 1.Vilnius Gediminas Technical UniversityVilniusLithuania

Personalised recommendations