Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Scalable Decision Tree Construction

  • Johannes GehrkeEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_555


Scalable classification tree construction; Scalable top-down decision tree construction; Tree-structured classifier


Decision trees are popular classification models. Decision trees are usually contructed greedily top-down from a training dataset. In many modern applications, the training dataset is very large and thus decision tree construction algorithms that scale with the size of the training dataset are needed.

Historical Background

Decision trees, in particular classification trees, have a long history both in the statistics [4] and the machine learning communities [12, 13]. Scalability was not much a concern until the advent of data mining brought training datasets that were orders of magnitude larger than in traditional applications in machine learning and statistics.

Scalability concerns in classification started with the work by Agrawal et al. who presented an interval classfier that generated classification functions that distinguishes the different groups...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Agrawal R, Ghosh SP, Imielinski T, Iyer BR, Swami AN. An interval classifier for database mining applications. In: Proceedings of the 18th International Conference on Very Large Data Bases; 1992. p. 560–73.Google Scholar
  2. 2.
    Agrawal R, Imielinski T, Swami AN. Database mining: a performance perspective. IEEE Trans Knowl Data Eng. 1993;5(6):914–25.CrossRefGoogle Scholar
  3. 3.
    Alsabti K, Ranka S, Singh V. Clouds: a decision tree classifier for large datasets. In: Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. 1998. p. 2–8.Google Scholar
  4. 4.
    Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Wadsworth: Belmont; 1984.zbMATHGoogle Scholar
  5. 5.
    Gehrke J, Ganti V, Ramakrishnan R, Loh W-Y. BOAT – optimistic decision tree construction. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1999. p. 169–80.CrossRefGoogle Scholar
  6. 6.
    Gehrke J, Ramakrishnan R, Ganti V. Rainforest – a framework for fast decision tree construction of large datasets. Data Min Knowl Dis. 2000;4(2/3):127–62.CrossRefGoogle Scholar
  7. 7.
    Graefe G, Fayyad U, Chaudhuri S. On the efficient gathering of sufficient statistics for classification from large SQL databases. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining; 1998. p. 204–8.Google Scholar
  8. 8.
    Lim T-S, Loh W-Y, Shih Y-S. A comparison of prediction accuracy, complexity, and training time of 33 old and new classification algorithms. Mach Learn. 2000;40(3):203–28.zbMATHCrossRefGoogle Scholar
  9. 9.
    Mehta M, Rissanen J, Agrawal R. MDL-based decision tree pruning. In: Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining; 1995.Google Scholar
  10. 10.
    Mehta M, Agrawal R, Rissanen J. SLIQ: a fast scalable classifier for data mining. In: Advances in Database Technology, Proceedings of the 5th International Conference on Extending Database Technology; 1996.Google Scholar
  11. 11.
    Murthy SK. Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Dis. 1998;2(4):345–89.CrossRefGoogle Scholar
  12. 12.
    Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106.Google Scholar
  13. 13.
    Quinlan JR. C4.5: programs for machine learning. San Mateo: Morgan Kaufman; 1993.Google Scholar
  14. 14.
    Rastogi R, Shim K. PUBLIC: a decision tree classifier that integrates building and pruning. In: Proceedings of the 24th International Conference on Very Large Data Bases; 1998. p. 404–15.Google Scholar
  15. 15.
    Shafer J, Agrawal R, Mehta M. SPRINT: a scalable parallel classifier for data mining. In: Proceedings of the 22th International Conference on Very Large Data Bases; 1996.Google Scholar
  16. 16.
    Sreenivas MK, AlSabti K, Ranka S. Parallel out-of-core decision tree classiers. In: Kargupta H, Chan P, editors. Advances in distributed and parallel knowledge discovery. Cambridge, MA: AAAI; 2000. p. 317–36.Google Scholar
  17. 17.
    Srivastava A, Han E, Kumar V, Singh V. Parallel formulations of decision-tree classication algorithms. Data Min Knowl Disc. 1999;3(3):237–261.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Cornell UniversityIthacaUSA

Section editors and affiliations

  • Kyuseok Shim
    • 1
  1. 1.School of Elec. Eng. and Computer ScienceSeoul National Univ.SeoulRepublic of Korea