Scalable Decision Tree Construction
Scalable classification tree construction; Scalable top-down decision tree construction; Tree-structured classifier
Decision trees are popular classification models. Decision trees are usually contructed greedily top-down from a training dataset. In many modern applications, the training dataset is very large and thus decision tree construction algorithms that scale with the size of the training dataset are needed.
Decision trees, in particular classification trees, have a long history both in the statistics  and the machine learning communities [12, 13]. Scalability was not much a concern until the advent of data mining brought training datasets that were orders of magnitude larger than in traditional applications in machine learning and statistics.
Scalability concerns in classification started with the work by Agrawal et al. who presented an interval classfier that generated classification functions that distinguishes the different groups...
- 1.Agrawal R, Ghosh SP, Imielinski T, Iyer BR, Swami AN. An interval classifier for database mining applications. In: Proceedings of the 18th International Conference on Very Large Data Bases; 1992. p. 560–73.Google Scholar
- 3.Alsabti K, Ranka S, Singh V. Clouds: a decision tree classifier for large datasets. In: Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. 1998. p. 2–8.Google Scholar
- 7.Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining; 1998. p. 204–8.Google Scholar
- 9.Mehta M, Rissanen J, Agrawal R. MDL-based decision tree pruning. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining; 1995.Google Scholar
- 10.Mehta M, Agrawal R, Rissanen J. SLIQ: a fast scalable classifier for data mining. In: Advances in Database Technology, Proceedings of the 5th International Conference on Extending Database Technology; 1996.Google Scholar
- 12.Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106.Google Scholar
- 13.Quinlan JR. C4.5: programs for machine learning. San Mateo: Morgan Kaufman; 1993.Google Scholar
- 14.Rastogi R, Shim K. PUBLIC: a decision tree classifier that integrates building and pruning. In: Proceedings of the 24th International Conference on Very Large Data Bases; 1998. p. 404–15.Google Scholar
- 15.Shafer J, Agrawal R, Mehta M. SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases; 1996.Google Scholar
- 16.Sreenivas MK, AlSabti K, Ranka S. Parallel out-of-core decision tree classiers. In: Kargupta H, Chan P, editors. Advances in distributed and parallel knowledge discovery. Cambridge, MA: AAAI; 2000. p. 317–36.Google Scholar
- 17.Srivastava A, Han E, Kumar V, Singh V. Parallel formulations of decision-tree classication algorithms. Data Min Knowl Disc. 1999;3(3):237–261.Google Scholar