Robust impurity measures in decision trees

  • Tomàs Aluja-Banet
  • Eduard Nafria
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Tree-based methods are a statistical procedure for automatic learning from data, their main characteristic being the simplicity of the results obtained. Their virtue is also their defect since the tree growing process is very dependent on data; small fluctuations in data may cause a big change in the tree growing process. Our main objective was to define data diagnostics to prevent internal instability in the tree growing process before a particular split has been made. We present a general formulation for the impurity of a node, a function of the proximity between the individuals in the node and its representative. Then, we compute a stability measure of a split and hence we can define more robust splits. Also. we have studied the theoretical complexity of this algorithm and its applicability to large data sets.


Regression Tree Child Node Classification Tree Convex Polygon Gini Index 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aluja T., Nafria E. (1995). Generalised impurity measures and data diagnostics in decision trees. Visualising Categorical Data. Cologne.Google Scholar
  2. Breiman L., Friedman J.H., Olshen RA., and Stone C.J. (1984). Classification and Regression Trees. Waldsworth International Group, Belmont, California.Google Scholar
  3. Celeux G., Lechevallier Y. (1982). Méthodes de Segementation non Paramétriques. Revue de Statistique Appliquée, XXX (4), 39–53.Google Scholar
  4. Ciampi A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis, 12, 57–78. North Holland.MathSciNetMATHCrossRefGoogle Scholar
  5. Greenacre M. (1984). Theory and Application of Correspondence Analysis. Academic Press.Google Scholar
  6. Gueguen A., Nakache J.P. (1988). Méthode de discrimination basée sur la construction d’un arbre de décision binaire. Revue de Statistique Appliquée, XXXVI (1), 19–38.Google Scholar
  7. Kass G.V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, 29, n 2, pp. 119–127.CrossRefGoogle Scholar
  8. Mola F., Siciliano R. (1992). A two-stage predictive splitting algorithm in binary segmentation. Computational Statistics. vol. 1. Y. Dodge and J. Whittaker ed. Physica Verlag.Google Scholar
  9. Sonquist J.A., Morgan J.N. (1964). The Detection of Interaction Effects. Ann Arbor: Institute for Social Research. University of Michigan.Google Scholar

Copyright information

© Springer Japan 1998

Authors and Affiliations

  • Tomàs Aluja-Banet
    • 1
  • Eduard Nafria
    • 1
  1. 1.Dept. of Statistics and Operational ResearchUniversitat Politcnica de CatalunyaBarcelonaSpain

Personalised recommendations