Scalable Decision Tree Construction

Gehrke, Johannes

doi:10.1007/978-1-4614-8265-9_555

Johannes Gehrke³

44 Accesses

Synonyms

Scalable classification tree construction; Scalable top-down decision tree construction; Tree-structured classifier

Definition

Decision trees are popular classification models. Decision trees are usually contructed greedily top-down from a training dataset. In many modern applications, the training dataset is very large and thus decision tree construction algorithms that scale with the size of the training dataset are needed.

Historical Background

Decision trees, in particular classification trees, have a long history both in the statistics [4] and the machine learning communities [12, 13]. Scalability was not much a concern until the advent of data mining brought training datasets that were orders of magnitude larger than in traditional applications in machine learning and statistics.

Scalability concerns in classification started with the work by Agrawal et al. who presented an interval classfier that generated classification functions that distinguishes the different groups...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 4,499.99; Price excludes VAT (USA)

Hardcover Book: USD 6,499.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

Agrawal R, Ghosh SP, Imielinski T, Iyer BR, Swami AN. An interval classifier for database mining applications. In: Proceedings of the 18th International Conference on Very Large Data Bases; 1992. p. 560–73.
Google Scholar
Agrawal R, Imielinski T, Swami AN. Database mining: a performance perspective. IEEE Trans Knowl Data Eng. 1993;5(6):914–25.
Article Google Scholar
Alsabti K, Ranka S, Singh V. Clouds: a decision tree classifier for large datasets. In: Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. 1998. p. 2–8.
Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Wadsworth: Belmont; 1984.
MATH Google Scholar
Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y. BOAT – optimistic decision tree construction. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 169–80.
Article Google Scholar
Gehrke J, Ramakrishnan R, Ganti V. Rainforest – a framework for fast decision tree construction of large datasets. Data Min Knowl Dis. 2000;4(2/3):127–62.
Article Google Scholar
Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining; 1998. p. 204–8.
Google Scholar
Lim T-S, Loh W-Y, Shih Y-S. A comparison of prediction accuracy, complexity, and training time of 33 old and new classification algorithms. Mach Learn. 2000;40(3):203–28.
Article MATH Google Scholar
Mehta M, Rissanen J, Agrawal R. MDL-based decision tree pruning. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining; 1995.
Google Scholar
Mehta M, Agrawal R, Rissanen J. SLIQ: a fast scalable classifier for data mining. In: Advances in Database Technology, Proceedings of the 5th International Conference on Extending Database Technology; 1996.
Google Scholar
Murthy SK. Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Dis. 1998;2(4):345–89.
Article Google Scholar
Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106.
Google Scholar
Quinlan JR. C4.5: programs for machine learning. San Mateo: Morgan Kaufman; 1993.
Google Scholar
Rastogi R, Shim K. PUBLIC: a decision tree classifier that integrates building and pruning. In: Proceedings of the 24th International Conference on Very Large Data Bases; 1998. p. 404–15.
Google Scholar
Shafer J, Agrawal R, Mehta M. SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases; 1996.
Google Scholar
Sreenivas MK, AlSabti K, Ranka S. Parallel out-of-core decision tree classiers. In: Kargupta H, Chan P, editors. Advances in distributed and parallel knowledge discovery. Cambridge, MA: AAAI; 2000. p. 317–36.
Google Scholar
Srivastava A, Han E, Kumar V, Singh V. Parallel formulations of decision-tree classication algorithms. Data Min Knowl Disc. 1999;3(3):237–261.
Google Scholar

Download references

Author information

Authors and Affiliations

Cornell University, Ithaca, NY, USA
Johannes Gehrke

Authors

Johannes Gehrke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Gehrke .

Editor information

Editors and Affiliations

Georgia Institute of Technology College of Computing, Atlanta, GA, USA
Ling Liu
University of Waterloo School of Computer Science, Waterloo, ON, Canada
M. Tamer Özsu

Section Editor information

School of Elec. Eng. and Computer Science, Seoul National Univ., Seoul, Republic of Korea
Kyuseok Shim

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Gehrke, J. (2018). Scalable Decision Tree Construction. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_555

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8265-9_555
Published: 07 December 2018
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics