Parallel MARS Algorithm Based on B-splines
- 151 Downloads
We investigate one of the possible ways for improving Friedman’s Multivariate Adaptive Regression Splines (MARS) algorithm designed for flexible modelling of high-dimensional data. In our version of MARS called BMARS we use B-splines instead of truncated power basis functions. The fact that B-splines have compact support allows us to introduce the notion of a “scale” of a basis function. The algorithm starts building up models by using large-scale basis functions and switches over to a smaller scale after the fitting ability of the large scale splines has been exhausted. The process is repeated until the prespecified number of basis functions has been produced. In addition, we discuss a parallelisation of BMARS as well as an application of the algorithm to processing of a large commercial data set. The results demonstrate the computational efficiency of our algorithm and its ability to generate models competitive with those of the original MARS.
KeywordsMARS B-splines Data Mining Parallel Algorithms
We are most grateful to Prof J.H. Friedman for suggesting the idea of the experiment involving the synthetic data set and to Dr B. Turlach for very fruitful discussions. Our thanks are also due to the anonymous referees for their constructive comments which greatly helped to improve the quality of this paper. The research of S. Bakin was supported by the Australian Government (Overseas Postgraduate Research Scholarship), by the Australian National University (ANU PhD Scholarship) and, also, by the Advanced Computational Systems CRC (ACSys), Australia.
- Chen, Z. (1990), Beyond additive models: interactions by smoothing spline methods, Technical Report SMS-009-90, The Australian National University.Google Scholar
- Cox, M.G. (1981), Practical spline approximation, Topics in Numerical Analysis, Lancaster, 79–112.Google Scholar
- Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996), From Data Mining to Knowledge Discovery: An Overview, in ‘Advances in Knowledge Discovery and Data Mining’, pp. 1–36.Google Scholar
- Friedman, J.H. (1981), Estimating functions of mixed ordinal and categorical variables, Technical Report 108, Stanford University.Google Scholar
- Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. & Sunderam, V. (1994), PVM: Parallel Virtual Machine, MIT Press.Google Scholar
- McCullagh, P. & Neider, J.A. (1983), Generalized Linear Models, Chapman and Hall.Google Scholar
- Miller, A.J. (1990), Subset Selection in Regression, Chapman and Hall.Google Scholar
- Stone, G. (1997), Analysis of Motor Vehicle Claims Data using Statistical Data Mining, CMIS Confidential Report CMIS-97/73, CSIRO, Australia.Google Scholar