Decision Tree Induction Methods and Their Application to Big Data

Perner, Petra

doi:10.1007/978-3-319-09177-8_3

Petra Perner⁸

Part of the book series: Modeling and Optimization in Science and Technologies ((MOST,volume 4))

3600 Accesses
12 Citations

Abstract

Data mining methods are widely used across many disciplines to identify patterns, rules, or associations among huge volumes of data. While in the past mostly black box methods, such as neural nets and support vector machines, have been heavily used for the prediction of pattern, classes, or events, methods that have explanation capability such as decision tree induction methods are seldom preferred. Therefore, we give in this chapter an introduction to decision tree induction. The basic principle, the advantageous properties of decision tree induction methods, and a description of the representation of decision trees so that a user can understand and describe the tree in a common way is given first. The overall decision tree induction algorithm is explained as well as different methods for the most important functions of a decision tree induction algorithm, such as attribute selection, attribute discretization, and pruning, developed by us and others. We explain how the learnt model can be fitted to the expert´s knowledge and how the classification performance can be improved. The problem of feature subset selection by decision tree induction is described. The quality of the learnt model is not only to be checked based on the overall accuracy, but also more specific measure are explained that describe the performance of the model in more detail. We present a new quantitative measures that can describe changes in the structure of a tree in order to help the expert to interpret the differences of two learnt trees from the same domain. Finally, we summarize our chapter and give an outlook.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dougherty, J., Kohavi, R., Sahamin, M.: Supervised and Unsupervised Discretization of Continuous Features. In: 14th IJCAI Machine Learning, pp. 194–202 (1995)
Google Scholar
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1998)
Google Scholar
Kerber, R.: ChiMerge: Discretization of Numeric Attributes. In: AAAI 1992 Learning: Inductive, pp. 123–128 (1992)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A.: Classification and Regression Trees. The Wadsworth Statistics/Probability Series, Belmont California (1984)
Google Scholar
Quinlan, J.R.: Decision trees and multivalued attributes. In: Hayes, J.E., Michie, D., Richards, J. (eds.) Machine Intelligence 11. Oxford University Press (1988)
Google Scholar
de Mantaras, R.L.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)
Article Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous Valued Attributes for Classification Learning. In: 13th IJCAI Machine Learning, vol. 2, pp. 1022–1027. Morgan Kaufmann, Chambery (1993)
Google Scholar
Perner, P., Trautzsch, S.: Multinterval Discretization for Decision Tree Learning. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 475–482. Springer, Heidelberg (1998)
Chapter Google Scholar
Quinlan, J.R.: Simplifying decision trees. Machine Learning 27, 221–234 (1987)
Google Scholar
Niblett, T., Bratko, I.: Construction decision trees in noisy domains. In: Bratko, I., Lavrac, N. (eds.) Progress in Machine Learning, pp. 67–78. Sigma Press, England (1987)
Google Scholar
Philipow, E.: Handbuch der Elektrotechnik, Bd 2 Grundlagen der Informati-onstechnik, pp. 158–171. Technik Verlag, Berlin (1987)
Google Scholar
Quinlan, J.R.: Decision trees and multivalued attributes. In: Hayes, J.E., Michie, D., Richards, J. (eds.) Machine Intelligence 11. Oxford University Press (1988)
Google Scholar
Copersmith, D., Hong, S.J., Hosking, J.: Partitioning nominal attributes in decision trees. Journal of Data Mining and Knowledge Discovery 3(2), 100–200 (1999)
Google Scholar
White, A.P., Lui, W.Z.: Bias in information-based measures in decision tree induction. Machine Learning 15, 321–329 (1994)
MATH Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer (1995)
Google Scholar
Wu, C., Landgrebe, D., Swain, P.: The decision tree approach to classification, School Elec. Eng., Purdue Univ., W. Lafayette, IN, Rep. RE-EE 75-17 (1975)
Google Scholar
Perner, P., Belikova, T.B., Yashunskaya, N.I.: Knowledge Acquisition by Decision Tree Induction for Interpretation of Digital Images in Radiology. In: Perner, P., Rosenfeld, A., Wang, P. (eds.) SSPR 1996. LNCS, vol. 1121, pp. 208–219. Springer, Heidelberg (1996)
Chapter Google Scholar
Kuusisto, S.: Application of the PMDL Principle to the Induction of Classification Trees. PhD-Thesis, Tampere Finland (1998)
Google Scholar
Muggleton, S.: Duce - An Oracle-based Approach to Constructive Induction. In: Proceeding of the Tenth International Join Conference on Artificial Intelligence (IJCAI 1987), pp. 287–292 (1987)
Google Scholar
Wu, B., Nevatia, R.: Improving Part based Object Detection by Unsupervised Online Boosting. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2007, pp. 1–8 (2007)
Google Scholar
Whiteley, J.R., Davis, J.F.: A similarity-based approach to interpretation of sensor data using adaptive resonance theory. Computers & Chemical Engineering 18(7), 637–661 (1994)
Article Google Scholar
Perner, P.: Prototype-Based Classification. Applied Intelligence 28(3), 238–246 (2008)
Article Google Scholar
Perner, P.: Improving the Accuracy of Decision Tree Induction by Feature Pre-Selection. Applied Artificial Intelligence 15(8), 747–760
Google Scholar
PernerZscherpelPerner, P., Zscherpel, U., Jacobsen, C.: A Comparision between Neural Networks and Decision Trees based on Data from Industrial Radiographic Testing. Pattern Recognition Letters 22, 47–54 (2001)
Article Google Scholar
Georg, G., Séroussi, B., Bouaud, J.: Does GEM-Encoding Clinical Practice Guidelines Improve the Quality of Knowledge Bases? A Study with the Rule-Based Formalism. In: AMIA Annu. Symp. Proc. 2003, pp. 254–258 (2003)
Google Scholar
Lee, S., Lee, S.H., Lee, K.C., Lee, M.H., Harashima, F.: Intelligent performance management of networks for advanced manufacturing systems. IEEE Transactions on Industrial Electronics 48(4), 731–741 (2001)
Article Google Scholar
Bazijanec, B., Gausmann, O., Turowski, K.: Parsing Effort in a B2B Integration Scenario - An Industrial Case Study. In: Enterprise Interoperability II, Part IX, pp. 783–794. Springer (2007)
Google Scholar
Seidelmann, G.: Using Heuristics to Speed Up Induction on Continuous-Valued Attributes. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 390–395. Springer, Heidelberg (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, Kohlenstr. 2, 04251, Leipzig, Germany
Petra Perner

Authors

Petra Perner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petra Perner .

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
Fatos Xhafa
Fukuoka Institute of Technology (FIT), Fukuoka, Fukuoka, Japan
Leonard Barolli
University of Salerno, Salerno, Italy
Admir Barolli
Canadian Institute of Technology, Tirana, Albania
Petraq Papajorgji

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Perner, P. (2015). Decision Tree Induction Methods and Their Application to Big Data. In: Xhafa, F., Barolli, L., Barolli, A., Papajorgji, P. (eds) Modeling and Processing for Next-Generation Big-Data Technologies. Modeling and Optimization in Science and Technologies, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-09177-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-09177-8_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09176-1
Online ISBN: 978-3-319-09177-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics