Skip to main content

Variable Selection for Discrimination of More Than Two Classes Where Data are Sparse

  • Conference paper
From Data and Information Analysis to Knowledge Engineering

Abstract

In classification, with an increasing number of variables, the required number of observations grows drastically. In this paper we present an approach to put into effect the maximal possible variable selection, by splitting a K class classification problem into pairwise problems. The principle makes use of the possibility that a variable that discriminates two classes will not necessarily do so for all such class pairs.

We further present the construction of a classification rule based on the pairwise solutions by the Pairwise Coupling algorithm according to Hastie and Tibshirani (1998). The suggested proceedure can be applied to any classification method. Finally, situations with lack of data in multidimensional spaces are investigated on different simulated data sets to illustrate the problem and the possible gain. The principle is compared to the classical approach of linear and quadratic discriminant analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 159.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • BISHOP, Y., FIENBERG, S. and HOLLAND, P. (1975): Discrete multivariate analysis, MIT Press, Cambridge.

    Google Scholar 

  • BRADLEY, R. and TERRY, M. (1952): The rank analysis of incomplete block designs, i. the method of paired comparisons, Bimometrics, 324–345.

    Google Scholar 

  • BREIMAN, L. FRIEDMAN, J., OLSHEN, R. and STONE, C. (1984): Classification and regression trees. Chapman & Hall, NY.

    Google Scholar 

  • HAJEK, J. (1969): A course in nonparametric statistics. Holden Day, San Francisco.

    Google Scholar 

  • HASTIE, T. and TIBSHIRANI, R. (1998): Classification by Pairwise Coupling. Annals of Statistics, 26(1), 451–471.

    MathSciNet  Google Scholar 

  • SCOTT, D. (1992): Multivariate Density Estimation Wiley, NY.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Berlin · Heidelberg

About this paper

Cite this paper

Szepannek, G., Weihs, C. (2006). Variable Selection for Discrimination of More Than Two Classes Where Data are Sparse. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_86

Download citation

Publish with us

Policies and ethics