Abstract
Classification and regression are fundamental data mining techniques. The goal of such techniques is to build predictors based on a training dataset and use them to predict the properties of new data. For a wide range of techniques, combining predictors built on samples from the training dataset provides lower error rates, faster construction, or both, than a predictor built from the entire training dataset. This provides a natural parallelization strategy in which predictors based on samples are built independently and hence concurrently. We discuss the performance implications for two subclasses: those in which predictors are independent, and those in which knowing a set of predictors reduces the difficulty of finding a new one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995. 193
P.S. Bradley, U.M. Fayyad, and O.L. Mangasarian. Mathematical programming for data mining: Formulations and challenges. INFORMS Journal of Computing, 11:217–238, 1999. 190
L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996. 192, 194
L. Breiman. Arcing classifiers. Annals of Statistics, 26(3):801–849, 1998. 194
L. Breiman. Pasting bites together for prediction in large data sets and on-line. Machine Learning, 36(1&2), 1999. 192
L. Breiman and N. Shang. Born again trees. Technical report, Department of Statistics, University of California, Berkeley, 1996. 192
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, pages 148–156, 1996. 194
S. Muggleton. Inductive logic programming: Issues, results and the LLL challenge. Artificial Intelligence, 1999. 193
S. Muggleton. Scientific knowledge discovery using inductive logic programming. Communications of the ACM, 1999. 193
R.O. Rogers and D.B. Skillicorn. Using the BSP cost model for optimal parallel neural network training. Future Generation Computer Systems, 14:409–424, 1998. 193, 196
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Skillicorn, D.B. (2002). Parallel Predictor Generation. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_9
Download citation
DOI: https://doi.org/10.1007/3-540-46502-2_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive