Skip to main content

Part of the book series: Modeling and Optimization in Science and Technologies ((MOST,volume 4))

Abstract

Data mining methods are widely used across many disciplines to identify patterns, rules, or associations among huge volumes of data. While in the past mostly black box methods, such as neural nets and support vector machines, have been heavily used for the prediction of pattern, classes, or events, methods that have explanation capability such as decision tree induction methods are seldom preferred. Therefore, we give in this chapter an introduction to decision tree induction. The basic principle, the advantageous properties of decision tree induction methods, and a description of the representation of decision trees so that a user can understand and describe the tree in a common way is given first. The overall decision tree induction algorithm is explained as well as different methods for the most important functions of a decision tree induction algorithm, such as attribute selection, attribute discretization, and pruning, developed by us and others. We explain how the learnt model can be fitted to the expert´s knowledge and how the classification performance can be improved. The problem of feature subset selection by decision tree induction is described. The quality of the learnt model is not only to be checked based on the overall accuracy, but also more specific measure are explained that describe the performance of the model in more detail. We present a new quantitative measures that can describe changes in the structure of a tree in order to help the expert to interpret the differences of two learnt trees from the same domain. Finally, we summarize our chapter and give an outlook.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dougherty, J., Kohavi, R., Sahamin, M.: Supervised and Unsupervised Discretization of Continuous Features. In: 14th IJCAI Machine Learning, pp. 194–202 (1995)

    Google Scholar 

  2. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1998)

    Google Scholar 

  3. Kerber, R.: ChiMerge: Discretization of Numeric Attributes. In: AAAI 1992 Learning: Inductive, pp. 123–128 (1992)

    Google Scholar 

  4. Breiman, L., Friedman, J.H., Olshen, R.A.: Classification and Regression Trees. The Wadsworth Statistics/Probability Series, Belmont California (1984)

    Google Scholar 

  5. Quinlan, J.R.: Decision trees and multivalued attributes. In: Hayes, J.E., Michie, D., Richards, J. (eds.) Machine Intelligence 11. Oxford University Press (1988)

    Google Scholar 

  6. de Mantaras, R.L.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)

    Article  Google Scholar 

  7. Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous Valued Attributes for Classification Learning. In: 13th IJCAI Machine Learning, vol. 2, pp. 1022–1027. Morgan Kaufmann, Chambery (1993)

    Google Scholar 

  8. Perner, P., Trautzsch, S.: Multinterval Discretization for Decision Tree Learning. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 475–482. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  9. Quinlan, J.R.: Simplifying decision trees. Machine Learning 27, 221–234 (1987)

    Google Scholar 

  10. Niblett, T., Bratko, I.: Construction decision trees in noisy domains. In: Bratko, I., Lavrac, N. (eds.) Progress in Machine Learning, pp. 67–78. Sigma Press, England (1987)

    Google Scholar 

  11. Philipow, E.: Handbuch der Elektrotechnik, Bd 2 Grundlagen der Informati-onstechnik, pp. 158–171. Technik Verlag, Berlin (1987)

    Google Scholar 

  12. Quinlan, J.R.: Decision trees and multivalued attributes. In: Hayes, J.E., Michie, D., Richards, J. (eds.) Machine Intelligence 11. Oxford University Press (1988)

    Google Scholar 

  13. Copersmith, D., Hong, S.J., Hosking, J.: Partitioning nominal attributes in decision trees. Journal of Data Mining and Knowledge Discovery 3(2), 100–200 (1999)

    Google Scholar 

  14. White, A.P., Lui, W.Z.: Bias in information-based measures in decision tree induction. Machine Learning 15, 321–329 (1994)

    MATH  Google Scholar 

  15. Kohonen, T.: Self-Organizing Maps. Springer (1995)

    Google Scholar 

  16. Wu, C., Landgrebe, D., Swain, P.: The decision tree approach to classification, School Elec. Eng., Purdue Univ., W. Lafayette, IN, Rep. RE-EE 75-17 (1975)

    Google Scholar 

  17. Perner, P., Belikova, T.B., Yashunskaya, N.I.: Knowledge Acquisition by Decision Tree Induction for Interpretation of Digital Images in Radiology. In: Perner, P., Rosenfeld, A., Wang, P. (eds.) SSPR 1996. LNCS, vol. 1121, pp. 208–219. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  18. Kuusisto, S.: Application of the PMDL Principle to the Induction of Classification Trees. PhD-Thesis, Tampere Finland (1998)

    Google Scholar 

  19. Muggleton, S.: Duce - An Oracle-based Approach to Constructive Induction. In: Proceeding of the Tenth International Join Conference on Artificial Intelligence (IJCAI 1987), pp. 287–292 (1987)

    Google Scholar 

  20. Wu, B., Nevatia, R.: Improving Part based Object Detection by Unsupervised Online Boosting. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8 (2007)

    Google Scholar 

  21. Whiteley, J.R., Davis, J.F.: A similarity-based approach to interpretation of sensor data using adaptive resonance theory. Computers & Chemical Engineering 18(7), 637–661 (1994)

    Article  Google Scholar 

  22. Perner, P.: Prototype-Based Classification. Applied Intelligence 28(3), 238–246 (2008)

    Article  Google Scholar 

  23. Perner, P.: Improving the Accuracy of Decision Tree Induction by Feature Pre-Selection. Applied Artificial Intelligence 15(8), 747–760

    Google Scholar 

  24. PernerZscherpelPerner, P., Zscherpel, U., Jacobsen, C.: A Comparision between Neural Networks and Decision Trees based on Data from Industrial Radiographic Testing. Pattern Recognition Letters 22, 47–54 (2001)

    Article  Google Scholar 

  25. Georg, G., Séroussi, B., Bouaud, J.: Does GEM-Encoding Clinical Practice Guidelines Improve the Quality of Knowledge Bases? A Study with the Rule-Based Formalism. In: AMIA Annu. Symp. Proc. 2003, pp. 254–258 (2003)

    Google Scholar 

  26. Lee, S., Lee, S.H., Lee, K.C., Lee, M.H., Harashima, F.: Intelligent performance management of networks for advanced manufacturing systems. IEEE Transactions on Industrial Electronics 48(4), 731–741 (2001)

    Article  Google Scholar 

  27. Bazijanec, B., Gausmann, O., Turowski, K.: Parsing Effort in a B2B Integration Scenario - An Industrial Case Study. In: Enterprise Interoperability II, Part IX, pp. 783–794. Springer (2007)

    Google Scholar 

  28. Seidelmann, G.: Using Heuristics to Speed Up Induction on Continuous-Valued Attributes. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 390–395. Springer, Heidelberg (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petra Perner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Perner, P. (2015). Decision Tree Induction Methods and Their Application to Big Data. In: Xhafa, F., Barolli, L., Barolli, A., Papajorgji, P. (eds) Modeling and Processing for Next-Generation Big-Data Technologies. Modeling and Optimization in Science and Technologies, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-09177-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09177-8_3

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09176-1

  • Online ISBN: 978-3-319-09177-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics