Decision Trees

Podgorelec, Vili; Zorman, Milan

doi:10.1007/978-1-4614-1800-9_53

Vili Podgorelec² &
Milan Zorman²

376 Accesses

Article Outline

Glossary

Definition of the Subject

Introduction

The Basics of Decision Trees

Induction of Decision Trees

Evaluation of Quality

Applications and Available Software

Future Directions

Bibliography

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 1,500.00; Price excludes VAT (USA)

Hardcover Book: USD 1,399.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

Accuracy :: The most important quality measure of an induced decision tree classifier.The most general is the overall accuracy, defined as a percentage of correctly classified instances from all instances (correctly classified and not correctly classified). The accuracy is usually measured both for the training set and the testing set.
Attribute :: A feature that describes an aspect of an object (both training and testing) used for a decision tree. An object is typically represented as a vector of attribute values. There are two types of attributes: continuous attributes whose domain is numerical, and discrete attributes whose domain is a set of predetermined values. There is one distinguished attribute called decision class (a dependent attribute). The remaining attributes (the independent attributes) are used to determine the value of the decision class.
Attribute node :: Also called a test node. It is an internal node in the decision tree model that is used to determine a branch from this node based on the value of the corresponding attribute of an object being classified.
Classification :: A process of mapping instances (i. e. training or testing objects) represented by attribute‐value vectors to decision classes. If the predicted decision class of an object is equal to the actual decision class of the object, then the classification of the object is accurate. The aim of classification methods is to classify objects with the highest possible accuracy.
Classifier :: A model built upon the training set used for classification. The input to a classifier is an object (a vector of known values of the attributes) and the output of the classifier is the predicted decision class for this object.
Decision node :: A leaf in a decision tree model (also called a decision) containing one of the possible decision classes. It is used to determine the predicted decision class of an object being classified that arrives to the leaf on its path through the decision tree model.
Instance:: Also called an object (training and testing), represented by attribute‐value vectors. Instances are used to describe the domain data.
Induction :: Inductive inference is the process of moving from concrete examples to general models, where the goal is to learn how to classify objects by analyzing a set of instances (already solved cases) whose classes are known.Instances are typically represented as attribute‐value vectors. Learning input consists of a set of such vectors, each belonging to a known class, and the output consists of a mapping from attribute values to classes. This mapping should accurately classify both the given instances (a training set) and other unseen instances (a testing set).
Split selection :: A method used in the process of decision tree induction for selecting the most appropriate attribute and its splits in each attribute (test) node of the tree. The split selection is usually based on some impurity measures and is considered the most important aspect of decision tree learning.
Training object :: An object that is used for the induction of a decision tree. In a training object both the values of the attributes and the decision class are known.All the training objects together constitute a training set, which is a source of the “domain knowledge” that the decision tree will try to represent.
Testing object :: An object that is used for the evaluation of a decision tree. In a testing object the values of the attributes are known and the decision class is unknown for the decision tree.All the testing objects together constitute a testing set, which is used to test an induced decision tree – to evaluate its quality (regarding the classification accuracy).
Training set :: A prepared set of training objects.
Testing set :: A prepared set of testing objects.

Bibliography

Primary Literature

Babic SH, Kokol P, Stiglic MM (2000) Fuzzy decision trees in the support of breastfeeding. In: Proceedings of the 13th IEEE Symposium on Computer‐Based Medical Systems CBMS'2000, Houston, pp 7–11
Google Scholar
Banerjee A (1994) Initializing neural networks using decision trees In: Proceedings of the International Workshop on Computational Learning Theory and Natural learning Systems, Cambridge, pp 3–15
Google Scholar
Bonner G (2001) Decision making for health care professionals: use of decision trees within the community mental health setting. J Adv Nursing 35:349–356
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MathSciNet MATH Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Cantu-Paz E, Kamath C (2000) Using evolutionary algorithms to induce oblique decision trees. In: Proceedings of the Genetic and Evolutionary Computation Conference GECCO-2000, Las Vegas, pp 1053–1060
Google Scholar
Craven MW, Shavlik JW (1996) Extracting tree‐structured representations of trained networks. In: Advances in Neural Information Processing Systems, vol 8. MIT Press, Cambridge
Google Scholar
Crawford S (1989) Extensions to the CART algorithm.Int J Man‐Mach Stud 31(2):197–217
Article MathSciNet Google Scholar
Cremilleux B, Robert C (1997) A theoretical framework for decision trees in uncertain domains: Application to medical data sets. In: Lecture Notes in Artificial Intelligence, vol 1211. Springer, London, pp 145–156
Google Scholar
Dantchev N (1996) Therapeutic decision frees in psychiatry. Encephale‐Revue Psychiatr Clinique Biol Therap 22(3):205–214
Google Scholar
Dietterich TG, Kong EB (1995) Machine learning bias, statistical bias and statistical variance of decision tree algorithms.Mach Learn, Corvallis
Google Scholar
Feigenbaum EA, Simon HA (1962) A theory of the serial position effect. Br J Psychol 53:307–320
Article Google Scholar
Freund Y (1995) Boosting a weak learning algorithm by majority.Inf Comput 121:256–285
Article MathSciNet MATH Google Scholar
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine Learning: Proc. Thirteenth International Conference. Morgan Kauffman, San Francisco, pp 148–156
Google Scholar
Gambhir SS (1999) Decision analysis in nuclear medicine.J Nucl Med 40(9):1570–1581
Google Scholar
Gehrke J (2003) Decision Tress. In: Nong Y (ed) The Handbook of Data Mining. Lawrence Erlbaum, Mahwah
Google Scholar
Goebel M, Gruenwald L (1999) A survey of data mining software tools. SIGKDD Explor 1(1):20–33
Article Google Scholar
Goldberg DE (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, Reading
MATH Google Scholar
Hand D (1997) Construction and assessment of classification rules. Wiley, Chichester
MATH Google Scholar
Heath D et al (1993) k-DT: A multi-tree learning method.In: Proceedings of the Second International Workshop on Multistrategy Learning, Harpers Fery, pp 138–149
Google Scholar
Heath D et al (1993) Learning Oblique Decision Trees.In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence IJCAI-93, pp 1002–1007
Google Scholar
Ho TK (1998) The Random Subspace Method for Constructing Decision Forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Hunt EB, Marin J, Stone PT (1966) Experiments in Induction. Academic Press, New York, pp 45–69
Google Scholar
Jones JK (2001) The role of data mining technology in the identification of signals of possible adverse drug reactions: Value and limitations. Curr Ther Res‐Clin Exp 62(9):664–672
Article Google Scholar
Kilpatrick S et al (1983) Optimization by Simulated Annealing. Science 220(4598):671–680
Article MathSciNet Google Scholar
Kokol P, Zorman M, Stiglic MM, Malcic I (1998) The limitations of decision trees and automatic learning in real world medical decision making. In: Proceedings of the 9th World Congress on Medical Informatics MEDINFO'98, 52, pp 529–533
Google Scholar
Letourneau S, Jensen L (1998) Impact of a decision tree on chronic wound care. J Wound Ostomy Conti Nurs 25:240–247
Article Google Scholar
Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty‐three old and new classification algorithms. Mach Learn 48:203–228
Article Google Scholar
Murthy KVS (1997) On Growing Better Decision Trees from Data, Ph D dissertation. Johns Hopkins University, Baltimore
Google Scholar
Neapolitan R, Naimipour K (1996) Foundations of Algorithms. DC Heath, Lexington
Google Scholar
Nikolaev N, Slavov V (1998) Inductive genetic programming with decision trees. Intell Data Anal Int J 2(1):31–44
Article Google Scholar
Ohno‐Machado L, Lacson R, Massad E (2000) Decision trees and fuzzy logic: A comparison of models for the selection of measles vaccination strategies in Brazil. Proceedings of AMIA Symposium 2000, Los Angeles, CA, US, pp 625–629
Google Scholar
Paterson A, Niblett TB (1982) ACLS Manual. Intelligent Terminals, Edinburgh
Google Scholar
Podgorelec V (2001) Intelligent systems design and knowledge discovery with automatic programming. Ph D thesis, University of Maribor
Google Scholar
Podgorelec V, Kokol P (1999) Induction f medical decision trees with genetic algorithms. In: Proceedings of the International ICSC Congress on Computational Intelligence Methods and Applications CIMA. Academic Press, Rochester
Google Scholar
Podgorelec V, Kokol P (2001) Towards more optimal medical diagnosing with evolutionary algorithms. J Med Syst 25(3):195–219
Article Google Scholar
Podgorelec V, Kokol P (2001) Evolutionary decision forests – decision making with multiple evolutionary constructed decision trees. In: Problems in Applied Mathematics and Computational Intelligence. WSES Press, pp 97–103
Google Scholar
Quinlan JR (1979) Discovering rules by induction from large collections of examples. In: Michie D (ed) Expert Systems in the Micro Electronic Age, University Press, Edingburgh, pp 168–201
Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Quinlan JR (1987) Simplifying decision trees. Int J Man‐Mach Stud 27:221–234
Article Google Scholar
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco
Google Scholar
Rich E, Knight K (1991) Artificial Intelligence, 2nd edn. McGraw Hill, New York
Google Scholar
Sanders GD, Hagerty CG, Sonnenberg FA, Hlatky MA, Owens DK (2000) Distributed decision support using a web-based interface: prevention of sudden cardiac death. Med Decis Making 19(2):157–166
Article Google Scholar
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Google Scholar
Shannon C, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Champagn
MATH Google Scholar
Shlien S (1992) Multiple binary decision tree classifiers. Pattern Recognit Lett 23(7):757–763
Google Scholar
Sims CJ, Meyn L, Caruana R, Rao RB, Mitchell T, Krohn M (2000) Predicting cesarean delivery with decision tree models.Am J Obstet Gynecol 183:1198–1206
Article Google Scholar
Smyth P, Goodman RM (1991) Rule induction using information theory, In: Piatsky-Scharpiro G, Frawley WJ (eds) Knowledge Discovery in Databases, AAAI Press, Cambridge, pp 159–176
Google Scholar
Sprogar M, Kokol P, Hleb S, Podgorelec V, Zorman M (2000) Vector decision trees. Intell Data Anal 4(3–4):305–321
MATH Google Scholar
Tou JT, Gonzalez RC (1974) Pattern Recognition Principles. Addison‐Wesley, Reading
MATH Google Scholar
Tsien CL, Fraser HSF, Long WJ, Kennedy RL (1998) Using classification tree and logistic regression methods to diagnose myocardial infarction. In: Proceedings of the 9th World Congress on Medical Informatics MEDINFO'98, 52, pp 493–497
Google Scholar
Tsien CL, Kohane IS, McIntosh N (2000) Multiple signal integration by decision tree induction to detect artifacts in the neonatal intensive care unit. Artif Intell Med 19(3):189–202
Article Google Scholar
Utgoff PE (1989) Incremental induction of decision trees.Mach Learn 4(2):161–186
Article Google Scholar
Utgoff PE (1989) Perceptron trees: a case study in hybrid concept representations. Connect Sci 1:377–391
Article Google Scholar
White AP, Liu WZ (1994) Bias in information-based measures in decisions tree induction. Mach Learn 15:321–329
MATH Google Scholar
Zorman M, Hleb S, Sprogar M (1999) Advanced tool for building decision trees MtDecit 2.0. In: Arabnia HR (ed) Proceedings of the International Conference on Artificial Intelligence ICAI-99. Las Vegas
Google Scholar
Zorman M, Kokol P, Podgorelec V (2000) Medical decision making supported by hybrid decision trees. In: Proceedings of the ICSC Symposia on Intelligent Systems & Applications ISA'2000, ICSC Academic Press, Wollongong
Google Scholar
Zorman M, Podgorelec V, Kokol P, Peterson M, Lane J (2000) Decision tree's induction strategies evaluated on a hard real world problem. In: Proceedings of the 13th IEEE Symposium on Computer‐Based Medical Systems CBMS'2000, Houston, pp 19–24
Google Scholar
Zorman M, Sigut JF, de la Rosa SJL, Alayón S, Kokol P, Verliè M (2006) Evolutionary built decision trees for supervised segmentation of follicular lymphoma images. In: Proceedings of the 9th IASTED International conference on Intelligent systems and control, Honolulu, pp 182–187
Google Scholar

Books and Reviews

Breiman L, Friedman JH, Olsen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Han J, Kamber M (2006) Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco
MATH Google Scholar
Hand D, Manilla H, Smyth P (2001) Principles of Data Mining. MIT Press, Cambridge
Google Scholar
Kantardzic M (2003) Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, San Francisco
MATH Google Scholar
Mitchell TM (1997) Machine Learning. McGraw‐Hill, New York
MATH Google Scholar
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco
Google Scholar
Ye N (ed) (2003) The Handbook of Data Mining. Lawrence Erlbaum, Mahwah
Google Scholar

Download references

Author information

Authors and Affiliations

University of Maribor, Maribor, Slovenia
Vili Podgorelec & Milan Zorman

Authors

Vili Podgorelec
View author publications
You can also search for this author in PubMed Google Scholar
Milan Zorman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

RAMTECH LIMITED, 122 Escalle Lane, Larkspur, CA, 94939, USA
Robert A. Meyers Ph. D. (Editor-in-Chief) (Editor-in-Chief)

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Podgorelec, V., Zorman, M. (2012). Decision Trees . In: Meyers, R. (eds) Computational Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1800-9_53

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1800-9_53
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1799-6
Online ISBN: 978-1-4614-1800-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Decision Trees

Article Outline

Access this chapter

Abbreviations

Bibliography

Primary Literature

Books and Reviews

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation