Article Outline
Glossary
Definition of the Subject
Introduction
The Basics of Decision Trees
Induction of Decision Trees
Evaluation of Quality
Applications and Available Software
Future Directions
Bibliography
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- Accuracy :
-
The most important quality measure of an induced decision tree classifier.The most general is the overall accuracy, defined as a percentage of correctly classified instances from all instances (correctly classified and not correctly classified). The accuracy is usually measured both for the training set and the testing set.
- Attribute :
-
A feature that describes an aspect of an object (both training and testing) used for a decision tree. An object is typically represented as a vector of attribute values. There are two types of attributes: continuous attributes whose domain is numerical, and discrete attributes whose domain is a set of predetermined values. There is one distinguished attribute called decision class (a dependent attribute). The remaining attributes (the independent attributes) are used to determine the value of the decision class.
- Attribute node :
-
Also called a test node. It is an internal node in the decision tree model that is used to determine a branch from this node based on the value of the corresponding attribute of an object being classified.
- Classification :
-
A process of mapping instances (i. e. training or testing objects) represented by attribute‐value vectors to decision classes. If the predicted decision class of an object is equal to the actual decision class of the object, then the classification of the object is accurate. The aim of classification methods is to classify objects with the highest possible accuracy.
- Classifier :
-
A model built upon the training set used for classification. The input to a classifier is an object (a vector of known values of the attributes) and the output of the classifier is the predicted decision class for this object.
- Decision node :
-
A leaf in a decision tree model (also called a decision) containing one of the possible decision classes. It is used to determine the predicted decision class of an object being classified that arrives to the leaf on its path through the decision tree model.
- Instance:
-
Also called an object (training and testing), represented by attribute‐value vectors. Instances are used to describe the domain data.
- Induction :
-
Inductive inference is the process of moving from concrete examples to general models, where the goal is to learn how to classify objects by analyzing a set of instances (already solved cases) whose classes are known.Instances are typically represented as attribute‐value vectors. Learning input consists of a set of such vectors, each belonging to a known class, and the output consists of a mapping from attribute values to classes. This mapping should accurately classify both the given instances (a training set) and other unseen instances (a testing set).
- Split selection :
-
A method used in the process of decision tree induction for selecting the most appropriate attribute and its splits in each attribute (test) node of the tree. The split selection is usually based on some impurity measures and is considered the most important aspect of decision tree learning.
- Training object :
-
An object that is used for the induction of a decision tree. In a training object both the values of the attributes and the decision class are known.All the training objects together constitute a training set, which is a source of the “domain knowledge” that the decision tree will try to represent.
- Testing object :
-
An object that is used for the evaluation of a decision tree. In a testing object the values of the attributes are known and the decision class is unknown for the decision tree.All the testing objects together constitute a testing set, which is used to test an induced decision tree – to evaluate its quality (regarding the classification accuracy).
- Training set :
-
A prepared set of training objects.
- Testing set :
-
A prepared set of testing objects.
Bibliography
Primary Literature
Babic SH, Kokol P, Stiglic MM (2000) Fuzzy decision trees in the support of breastfeeding. In: Proceedings of the 13th IEEE Symposium on Computer‐Based Medical Systems CBMS'2000, Houston, pp 7–11
Banerjee A (1994) Initializing neural networks using decision trees In: Proceedings of the International Workshop on Computational Learning Theory and Natural learning Systems, Cambridge, pp 3–15
Bonner G (2001) Decision making for health care professionals: use of decision trees within the community mental health setting. J Adv Nursing 35:349–356
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Cantu-Paz E, Kamath C (2000) Using evolutionary algorithms to induce oblique decision trees. In: Proceedings of the Genetic and Evolutionary Computation Conference GECCO-2000, Las Vegas, pp 1053–1060
Craven MW, Shavlik JW (1996) Extracting tree‐structured representations of trained networks. In: Advances in Neural Information Processing Systems, vol 8. MIT Press, Cambridge
Crawford S (1989) Extensions to the CART algorithm.Int J Man‐Mach Stud 31(2):197–217
Cremilleux B, Robert C (1997) A theoretical framework for decision trees in uncertain domains: Application to medical data sets. In: Lecture Notes in Artificial Intelligence, vol 1211. Springer, London, pp 145–156
Dantchev N (1996) Therapeutic decision frees in psychiatry. Encephale‐Revue Psychiatr Clinique Biol Therap 22(3):205–214
Dietterich TG, Kong EB (1995) Machine learning bias, statistical bias and statistical variance of decision tree algorithms.Mach Learn, Corvallis
Feigenbaum EA, Simon HA (1962) A theory of the serial position effect. Br J Psychol 53:307–320
Freund Y (1995) Boosting a weak learning algorithm by majority.Inf Comput 121:256–285
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine Learning: Proc. Thirteenth International Conference. Morgan Kauffman, San Francisco, pp 148–156
Gambhir SS (1999) Decision analysis in nuclear medicine.J Nucl Med 40(9):1570–1581
Gehrke J (2003) Decision Tress. In: Nong Y (ed) The Handbook of Data Mining. Lawrence Erlbaum, Mahwah
Goebel M, Gruenwald L (1999) A survey of data mining software tools. SIGKDD Explor 1(1):20–33
Goldberg DE (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, Reading
Hand D (1997) Construction and assessment of classification rules. Wiley, Chichester
Heath D et al (1993) k-DT: A multi-tree learning method.In: Proceedings of the Second International Workshop on Multistrategy Learning, Harpers Fery, pp 138–149
Heath D et al (1993) Learning Oblique Decision Trees.In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence IJCAI-93, pp 1002–1007
Ho TK (1998) The Random Subspace Method for Constructing Decision Forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Hunt EB, Marin J, Stone PT (1966) Experiments in Induction. Academic Press, New York, pp 45–69
Jones JK (2001) The role of data mining technology in the identification of signals of possible adverse drug reactions: Value and limitations. Curr Ther Res‐Clin Exp 62(9):664–672
Kilpatrick S et al (1983) Optimization by Simulated Annealing. Science 220(4598):671–680
Kokol P, Zorman M, Stiglic MM, Malcic I (1998) The limitations of decision trees and automatic learning in real world medical decision making. In: Proceedings of the 9th World Congress on Medical Informatics MEDINFO'98, 52, pp 529–533
Letourneau S, Jensen L (1998) Impact of a decision tree on chronic wound care. J Wound Ostomy Conti Nurs 25:240–247
Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty‐three old and new classification algorithms. Mach Learn 48:203–228
Murthy KVS (1997) On Growing Better Decision Trees from Data, Ph D dissertation. Johns Hopkins University, Baltimore
Neapolitan R, Naimipour K (1996) Foundations of Algorithms. DC Heath, Lexington
Nikolaev N, Slavov V (1998) Inductive genetic programming with decision trees. Intell Data Anal Int J 2(1):31–44
Ohno‐Machado L, Lacson R, Massad E (2000) Decision trees and fuzzy logic: A comparison of models for the selection of measles vaccination strategies in Brazil. Proceedings of AMIA Symposium 2000, Los Angeles, CA, US, pp 625–629
Paterson A, Niblett TB (1982) ACLS Manual. Intelligent Terminals, Edinburgh
Podgorelec V (2001) Intelligent systems design and knowledge discovery with automatic programming. Ph D thesis, University of Maribor
Podgorelec V, Kokol P (1999) Induction f medical decision trees with genetic algorithms. In: Proceedings of the International ICSC Congress on Computational Intelligence Methods and Applications CIMA. Academic Press, Rochester
Podgorelec V, Kokol P (2001) Towards more optimal medical diagnosing with evolutionary algorithms. J Med Syst 25(3):195–219
Podgorelec V, Kokol P (2001) Evolutionary decision forests – decision making with multiple evolutionary constructed decision trees. In: Problems in Applied Mathematics and Computational Intelligence. WSES Press, pp 97–103
Quinlan JR (1979) Discovering rules by induction from large collections of examples. In: Michie D (ed) Expert Systems in the Micro Electronic Age, University Press, Edingburgh, pp 168–201
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Quinlan JR (1987) Simplifying decision trees. Int J Man‐Mach Stud 27:221–234
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco
Rich E, Knight K (1991) Artificial Intelligence, 2nd edn. McGraw Hill, New York
Sanders GD, Hagerty CG, Sonnenberg FA, Hlatky MA, Owens DK (2000) Distributed decision support using a web-based interface: prevention of sudden cardiac death. Med Decis Making 19(2):157–166
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227
Shannon C, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Champagn
Shlien S (1992) Multiple binary decision tree classifiers. Pattern Recognit Lett 23(7):757–763
Sims CJ, Meyn L, Caruana R, Rao RB, Mitchell T, Krohn M (2000) Predicting cesarean delivery with decision tree models.Am J Obstet Gynecol 183:1198–1206
Smyth P, Goodman RM (1991) Rule induction using information theory, In: Piatsky-Scharpiro G, Frawley WJ (eds) Knowledge Discovery in Databases, AAAI Press, Cambridge, pp 159–176
Sprogar M, Kokol P, Hleb S, Podgorelec V, Zorman M (2000) Vector decision trees. Intell Data Anal 4(3–4):305–321
Tou JT, Gonzalez RC (1974) Pattern Recognition Principles. Addison‐Wesley, Reading
Tsien CL, Fraser HSF, Long WJ, Kennedy RL (1998) Using classification tree and logistic regression methods to diagnose myocardial infarction. In: Proceedings of the 9th World Congress on Medical Informatics MEDINFO'98, 52, pp 493–497
Tsien CL, Kohane IS, McIntosh N (2000) Multiple signal integration by decision tree induction to detect artifacts in the neonatal intensive care unit. Artif Intell Med 19(3):189–202
Utgoff PE (1989) Incremental induction of decision trees.Mach Learn 4(2):161–186
Utgoff PE (1989) Perceptron trees: a case study in hybrid concept representations. Connect Sci 1:377–391
White AP, Liu WZ (1994) Bias in information-based measures in decisions tree induction. Mach Learn 15:321–329
Zorman M, Hleb S, Sprogar M (1999) Advanced tool for building decision trees MtDecit 2.0. In: Arabnia HR (ed) Proceedings of the International Conference on Artificial Intelligence ICAI-99. Las Vegas
Zorman M, Kokol P, Podgorelec V (2000) Medical decision making supported by hybrid decision trees. In: Proceedings of the ICSC Symposia on Intelligent Systems & Applications ISA'2000, ICSC Academic Press, Wollongong
Zorman M, Podgorelec V, Kokol P, Peterson M, Lane J (2000) Decision tree's induction strategies evaluated on a hard real world problem. In: Proceedings of the 13th IEEE Symposium on Computer‐Based Medical Systems CBMS'2000, Houston, pp 19–24
Zorman M, Sigut JF, de la Rosa SJL, Alayón S, Kokol P, Verliè M (2006) Evolutionary built decision trees for supervised segmentation of follicular lymphoma images. In: Proceedings of the 9th IASTED International conference on Intelligent systems and control, Honolulu, pp 182–187
Books and Reviews
Breiman L, Friedman JH, Olsen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
Han J, Kamber M (2006) Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco
Hand D, Manilla H, Smyth P (2001) Principles of Data Mining. MIT Press, Cambridge
Kantardzic M (2003) Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, San Francisco
Mitchell TM (1997) Machine Learning. McGraw‐Hill, New York
Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco
Ye N (ed) (2003) The Handbook of Data Mining. Lawrence Erlbaum, Mahwah
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag
About this entry
Cite this entry
Podgorelec, V., Zorman, M. (2012). Decision Trees . In: Meyers, R. (eds) Computational Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1800-9_53
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1800-9_53
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1799-6
Online ISBN: 978-1-4614-1800-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering