Skip to main content

Measures of Data and Classifier Complexity and the Training Sample Size

  • Chapter

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Summary

The size of the training set is important in characterizing data complexity. If a standard Fisher linear discriminant function or an Euclidean distance classifier is used to classify two multivariate Gaussian populations sharing a common covariance matrix, several measures of data complexity play an important role. The types of potential classification rules cannot be ignored while determining the data complexity. The three factors — sample size, data complexity, and classifier complexity—are mutually dependent. In situations where many classifiers are potentially useful, exact characterization of the data complexity requires a greater number of characteristics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Amari, N. Fujita, S. Shinomoto. Four types of learning curves. Neural Computation, 4, 605–618, 1992.

    Google Scholar 

  2. M. Basu, T.K. Ho. The learning behavior of single neuron classifiers on linearly separable or nonseparable input. Proc. of IEEE Intl. Joint Conf. on Neural Networks, July 10–16, 1999, Washington, DC.

    Google Scholar 

  3. J. Cid-Sueiro, J.L. Sancho-Gomez. Saturated perceptrons for maximum margin and minimum misclassification error. Neural Processing Letters, 14, 217–226, 2001.

    Article  Google Scholar 

  4. R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification and Scene Analysis. 2nd ed. New York: John Wiley, 2000.

    Google Scholar 

  5. K. Fukunaga. Introduction to Statistical Pattern Recognition. 2nd ed. New York: Academic Press, 1990.

    MATH  Google Scholar 

  6. T.K. Ho, M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 289–300, 2002.

    Article  Google Scholar 

  7. Y.S. Huang, C.Y. Suen. A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(1), 90–94, 1995.

    Article  Google Scholar 

  8. M. Li, P. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications. New York: Springer, 1993.

    MATH  Google Scholar 

  9. S. Raudys. On the problems of sample size in pattern recognition. In V. S. Pugatchiov, ed. Detection, Pattern Recognition and Experiment Design, volume 2, pages 64–76. Proc. of the 2nd All-Union Conference Statistical Methods in Control Theory. Moscow: Nauka, 1970 (in Russian).

    Google Scholar 

  10. S. Raudys. On dimensionality, sample size and classification error of nonparametric linear classification algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 669–671, 1997.

    Article  Google Scholar 

  11. S. Raudys. Evolution and generalization of a single neuron. I. SLP as seven statistical classifiers. Neural Networks, 11, 283–296, 1998.

    Article  Google Scholar 

  12. S. Raudys. Statistical and Neural Classifiers: An Integrated Approach to Design. New York: Springer-Verlag, 2001.

    MATH  Google Scholar 

  13. S. Raudys, A. Saudargiene. Tree type dependency model and sample size-dimensionality properties. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23, 233–239, 2001.

    Article  Google Scholar 

  14. S. Raudys. Integration of statistical and neural methods to design classifiers in case of unequal covariance matrices. Lecture Notes in Computer Science, New York: Springer, 3238, 270–280, 2004.

    Google Scholar 

  15. S. Raudys, D. Young. Results in statistical discriminant analysis: A review of the former Soviet Union literature. Journal of Multivariate Analysis, 89, 1–35, 2004.

    Article  MathSciNet  Google Scholar 

  16. A. Saudargiene. Structurization of the covariance matrix by process type and block diagonal models in the classifier design. Informatica 10(2), 245–269, 1999.

    Google Scholar 

  17. V. N. Vapnik. The Nature of Statistical Learning Theory. New York: Springer, 1995.

    MATH  Google Scholar 

  18. V.I. Zarudskij. The use of models of simple dependence problems of classification. In S. Raudys, ed. Statistical Problems of Control, volume 38, pages 33–75, Vilnius: Institute of Mathematics and Informatics, 1979 (in Russian).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Verlag London Limited

About this chapter

Cite this chapter

Raudys, Š. (2006). Measures of Data and Classifier Complexity and the Training Sample Size. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-172-3_3

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-171-6

  • Online ISBN: 978-1-84628-172-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics