Abstract
It could be said as a tautology that, if we want to make a discriminant analysis between two or more populations and if we are able to divide these populations and training sets into some homogeneous subsets, it will be more efficient to make it on each of these subsets and then to combine the results. This can be done using one or two variables highly correlated with the one we want to predict. Our point of view will be a bit different: we will use a classification tree on all the available variables. We will first recall the first attempt (presented at IFCS2002 in Krakow). This one allowed us to obtain on an example of prediction of failure of the enterprises a gain of 5% of well classified data, using, after and before stratification, the classical Fisher’s linear discriminant rule or the logistic regression. We intend to present a new method, still a classification tree, but with a multivariate criterion and in an agglomerative way. We compare both methods. In the same conditions and with the same data set, the gain is as high as 20%! Results will obviously also be presented when the methods are applied to test sets. Finally, we will conclude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BOCK, H.H. (1989): Probabilistic Aspects in Cluster Analysis. Conceptual and Numerical Analysis of Data, 12–44.
BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984): Clas-sification and Regression Trees. Belmont, Wadsworth.
CHAVENT, M. (1997): Analyse des données symboliques. Une méthode divisive de classification. PhD thesis. Universit de Paris IX, Dauphine.
DAUDIN, J-J., MASSON, J-P., TOMASSONE, R., and DANZART, M. (1988): Discrimination et classement. Masson, Paris.
SILVERMAN, B.W. (1986): Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
WILLIAMS, W.T. and LAMBERT, J.M. (1959): Multivariate Method in Plant Ecology. Journal of Ecology, 47, 83–101.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Rasson, JP., Pirçon, JY., Roland, F. (2005). Stratification Before Discriminant Analysis: A Must?. In: Baier, D., Wernecke, KD. (eds) Innovations in Classification, Data Science, and Information Systems. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26981-9_7
Download citation
DOI: https://doi.org/10.1007/3-540-26981-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23221-6
Online ISBN: 978-3-540-26981-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)