Trained Decision Trees for a More Meaningful Accuracy (150 Patients)
- 966 Downloads
Traditionally, decision trees are used for finding the best predictors of health risks and improvements (Chap. 58). However, this method is not entirely appropriate, because a decision tree is built from a data file, and, subsequently, the same data file is applied once more for computing the health risk probabilities from the built tree. Obviously, the accuracy must be close to 100%, because the test sample is 100% identical to the sample used for building the tree, and, therefore, this accuracy does not mean too much. With neural networks this problem of duplicate usage of the same data is solved by randomly splitting the data into two samples, a training sample and a test sample (Chap. 12 in Machine learning in medicine part one, pp. 145–156, Artificial intelligence, multilayer perceptron modeling, Springer Heidelberg Germany, 2013, from the same authors). The current chapter is to assess whether the splitting methodology, otherwise called partitioning, is also feasible for decision trees, and to assess its level of accuracy. Decision trees are both appropriate for data with categorical and continuous outcome (Chap. 58).