Synonyms
Scalable classification tree construction; Scalable top-down decision tree construction; Tree-structured classifier
Definition
Decision trees are popular classification models. Decision trees are usually contructed greedily top-down from a training dataset. In many modern applications, the training dataset is very large and thus decision tree construction algorithms that scale with the size of the training dataset are needed.
Historical Background
Decision trees, in particular classification trees, have a long history both in the statistics [4] and the machine learning communities [12, 13]. Scalability was not much a concern until the advent of data mining brought training datasets that were orders of magnitude larger than in traditional applications in machine learning and statistics.
Scalability concerns in classification started with the work by Agrawal et al. who presented an interval classfier that generated classification functions that distinguishes the different groups...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Agrawal R, Ghosh SP, Imielinski T, Iyer BR, Swami AN. An interval classifier for database mining applications. In: Proceedings of the 18th International Conference on Very Large Data Bases; 1992. p. 560–73.
Agrawal R, Imielinski T, Swami AN. Database mining: a performance perspective. IEEE Trans Knowl Data Eng. 1993;5(6):914–25.
Alsabti K, Ranka S, Singh V. Clouds: a decision tree classifier for large datasets. In: Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. 1998. p. 2–8.
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Wadsworth: Belmont; 1984.
Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y. BOAT – optimistic decision tree construction. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 169–80.
Gehrke J, Ramakrishnan R, Ganti V. Rainforest – a framework for fast decision tree construction of large datasets. Data Min Knowl Dis. 2000;4(2/3):127–62.
Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining; 1998. p. 204–8.
Lim T-S, Loh W-Y, Shih Y-S. A comparison of prediction accuracy, complexity, and training time of 33 old and new classification algorithms. Mach Learn. 2000;40(3):203–28.
Mehta M, Rissanen J, Agrawal R. MDL-based decision tree pruning. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining; 1995.
Mehta M, Agrawal R, Rissanen J. SLIQ: a fast scalable classifier for data mining. In: Advances in Database Technology, Proceedings of the 5th International Conference on Extending Database Technology; 1996.
Murthy SK. Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Dis. 1998;2(4):345–89.
Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106.
Quinlan JR. C4.5: programs for machine learning. San Mateo: Morgan Kaufman; 1993.
Rastogi R, Shim K. PUBLIC: a decision tree classifier that integrates building and pruning. In: Proceedings of the 24th International Conference on Very Large Data Bases; 1998. p. 404–15.
Shafer J, Agrawal R, Mehta M. SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases; 1996.
Sreenivas MK, AlSabti K, Ranka S. Parallel out-of-core decision tree classiers. In: Kargupta H, Chan P, editors. Advances in distributed and parallel knowledge discovery. Cambridge, MA: AAAI; 2000. p. 317–36.
Srivastava A, Han E, Kumar V, Singh V. Parallel formulations of decision-tree classication algorithms. Data Min Knowl Disc. 1999;3(3):237–261.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Gehrke, J. (2018). Scalable Decision Tree Construction. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_555
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_555
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering