Parallel Predictor Generation
- 614 Downloads
Classification and regression are fundamental data mining techniques. The goal of such techniques is to build predictors based on a training dataset and use them to predict the properties of new data. For a wide range of techniques, combining predictors built on samples from the training dataset provides lower error rates, faster construction, or both, than a predictor built from the entire training dataset. This provides a natural parallelization strategy in which predictors based on samples are built independently and hence concurrently. We discuss the performance implications for two subclasses: those in which predictors are independent, and those in which knowing a set of predictors reduces the difficulty of finding a new one.
KeywordsTraining Dataset Linear Speedup Sequential Algorithm Inductive Logic Inductive Logic Programming
Unable to display preview. Download preview PDF.
- 1.C. Bishop. Neural networks for pattern recognition. Oxford University Press, 1995. 193Google Scholar
- 5.L. Breiman. Pasting bites together for prediction in large data sets and on-line. Machine Learning, 36(1&2), 1999. 192Google Scholar
- 6.L. Breiman and N. Shang. Born again trees. Technical report, Department of Statistics, University of California, Berkeley, 1996. 192Google Scholar
- 7.Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, pages 148–156, 1996. 194Google Scholar
- 8.S. Muggleton. Inductive logic programming: Issues, results and the LLL challenge. Artificial Intelligence, 1999. 193Google Scholar
- 9.S. Muggleton. Scientific knowledge discovery using inductive logic programming. Communications of the ACM, 1999. 193Google Scholar