Abstract
The framework of this paper is supervised statistical learning in data mining. In particular, multiple sets of inputs are used to predict an output on the basis of a training set. A typical data mining problem is to deal with large sets of within-groups correlated inputs compared to the number of observed objects. Standard tree-based procedures offer unstable and not interpretable solutions especially in case of complex relationships. For that multiple splits defined upon a suitable combination of inputs are required. This paper provides a methodology to build up a tree-based model which nodes splitting is due to factorial multiple splitting variables. A recursive partitioning algorithm is introduced considering a two-stage splitting criterion based on linear discriminant functions. As a result, an automated and fast procedure allows to look for factorial multiple splits able to capture suitable directions in the variability among the sets of inputs. Real world applications are discussed and the results of a simulation study are shown to describe fruitful properties of the proposed methodology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A.I.: Fast discovery of association rules, (1995), Advances in Knowledge Discovery and Data Mining, Chapter 12, pages 307–328, AAAI/MIT Press, Menlo Park, CA.
Aria, M., Mola, F., Siciliano, R., Growing and Visualizing Prediction PathsTrees in Market Basket Analysis, Proceedings of COMPSTAT, (2002), Berlin (August 24–28), Germany, Physica Verlag.
Breiman, L.: Bagging Predictors. Machine Learning, 26, (1996), 46–59.
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J.: Classification and Regression Trees, Belmont C.A. Wadsworth, (1984).
Conversano, C., Mola, F., Siciliano, R.: Partitioning Algorithms and Combined Model Integration for Data Mining, Computational Statistics, (2001), 16, 323–339.
Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining, (2001), The MIT Press.
Hastie, T.J., Tibshirani, R.J., Friedman, J.: The Elements of Statistical Learning. Springer Verlag, (2001).
Kim, H., Loh, W.Y.: Classification Trees with Unbiased Multiway Splits, Journal of the American Statistical Association, (2001), 96, 454, 589–604.
Loh, W.Y., Vanichsetakul, N.: Tree-Structured Classification Via Generalized Discriminant Analysis, Journal of the American Statistical Association, (1988), 83,403, 715–728.
Mola, F., Siciliano, R.: A two-stage predictive splitting algorithm in binary segmentation, in Y. Dodge, J. Whittaker. (Eds.): Computational Statistics: COMPSTAT’ 92, 1, Physica Verlag, Heidelberg (D), (1992), 179–184.
Mola, F., Siciliano, R.: A Fast Splitting Procedure for Classification Trees, Statistics and Computing, 7, (1997), 208–216.
Siciliano, R., Mola, F.: Ternary Classification Trees: a Factorial Approach, in Blasius, J. and Greenacre, M. (Eds.): Visualization of categorical data (1998), New York: Academic Press.
Siciliano, R., Mola, F: Multivariate Data Analysis through Classification and Regression Trees, Computational Statistics and Data Analysis, 32, Elsevier Science, (2000), 285–301.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mola, F., Siciliano, R. (2002). Discriminant Analysis and Factorial Multiple Splits in Recursive Partitioning for Data Mining. In: Roli, F., Kittler, J. (eds) Multiple Classifier Systems. MCS 2002. Lecture Notes in Computer Science, vol 2364. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45428-4_12
Download citation
DOI: https://doi.org/10.1007/3-540-45428-4_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43818-2
Online ISBN: 978-3-540-45428-1
eBook Packages: Springer Book Archive