Measures of Data and Classifier Complexity and the Training Sample Size

Raudys, Šarūnas

doi:10.1007/978-1-84628-172-3_3

Measures of Data and Classifier Complexity and the Training Sample Size

Šarūnas Raudys³

Chapter

1115 Accesses
2 Citations

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Summary

The size of the training set is important in characterizing data complexity. If a standard Fisher linear discriminant function or an Euclidean distance classifier is used to classify two multivariate Gaussian populations sharing a common covariance matrix, several measures of data complexity play an important role. The types of potential classification rules cannot be ignored while determining the data complexity. The three factors — sample size, data complexity, and classifier complexity—are mutually dependent. In situations where many classifiers are potentially useful, exact characterization of the data complexity requires a greater number of characteristics.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Amari, N. Fujita, S. Shinomoto. Four types of learning curves. Neural Computation, 4, 605–618, 1992.
Google Scholar
M. Basu, T.K. Ho. The learning behavior of single neuron classifiers on linearly separable or nonseparable input. Proc. of IEEE Intl. Joint Conf. on Neural Networks, July 10–16, 1999, Washington, DC.
Google Scholar
J. Cid-Sueiro, J.L. Sancho-Gomez. Saturated perceptrons for maximum margin and minimum misclassification error. Neural Processing Letters, 14, 217–226, 2001.
Article Google Scholar
R.O. Duda, P.E. Hart, D.G. Stork. Pattern Classification and Scene Analysis. 2nd ed. New York: John Wiley, 2000.
Google Scholar
K. Fukunaga. Introduction to Statistical Pattern Recognition. 2nd ed. New York: Academic Press, 1990.
MATH Google Scholar
T.K. Ho, M. Basu. Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 289–300, 2002.
Article Google Scholar
Y.S. Huang, C.Y. Suen. A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(1), 90–94, 1995.
Article Google Scholar
M. Li, P. Vitanyi. An Introduction to Kolmogorov Complexity and its Applications. New York: Springer, 1993.
MATH Google Scholar
S. Raudys. On the problems of sample size in pattern recognition. In V. S. Pugatchiov, ed. Detection, Pattern Recognition and Experiment Design, volume 2, pages 64–76. Proc. of the 2nd All-Union Conference Statistical Methods in Control Theory. Moscow: Nauka, 1970 (in Russian).
Google Scholar
S. Raudys. On dimensionality, sample size and classification error of nonparametric linear classification algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 669–671, 1997.
Article Google Scholar
S. Raudys. Evolution and generalization of a single neuron. I. SLP as seven statistical classifiers. Neural Networks, 11, 283–296, 1998.
Article Google Scholar
S. Raudys. Statistical and Neural Classifiers: An Integrated Approach to Design. New York: Springer-Verlag, 2001.
MATH Google Scholar
S. Raudys, A. Saudargiene. Tree type dependency model and sample size-dimensionality properties. IEEE Transaction on Pattern Analysis and Machine Intelligence, 23, 233–239, 2001.
Article Google Scholar
S. Raudys. Integration of statistical and neural methods to design classifiers in case of unequal covariance matrices. Lecture Notes in Computer Science, New York: Springer, 3238, 270–280, 2004.
Google Scholar
S. Raudys, D. Young. Results in statistical discriminant analysis: A review of the former Soviet Union literature. Journal of Multivariate Analysis, 89, 1–35, 2004.
Article MathSciNet Google Scholar
A. Saudargiene. Structurization of the covariance matrix by process type and block diagonal models in the classifier design. Informatica 10(2), 245–269, 1999.
Google Scholar
V. N. Vapnik. The Nature of Statistical Learning Theory. New York: Springer, 1995.
MATH Google Scholar
V.I. Zarudskij. The use of models of simple dependence problems of classification. In S. Raudys, ed. Statistical Problems of Control, volume 38, pages 33–75, Vilnius: Institute of Mathematics and Informatics, 1979 (in Russian).
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematics and Informatics, Akademijos 4, Vilnius, 08663, Lithuania
Šarūnas Raudys

Authors

Šarūnas Raudys
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical Engineering Department, City College, City University of New York, USA
Mitra Basu PhD
Bell Laboratories, Lucent Technologies, New Jersey, USA
Tin Kam Ho BBA, MS, PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Raudys, Š. (2006). Measures of Data and Classifier Complexity and the Training Sample Size. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-84628-172-3_3
Publisher Name: Springer, London
Print ISBN: 978-1-84628-171-6
Online ISBN: 978-1-84628-172-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics