A Time Complexity Analysis to the ParDTLT Parallel Algorithm for Decision Tree Induction
In addition to the usual tests for analyzing the performance of a decision tree in a classification process, the analysis of the amount of time and the space resource required are also useful during the supervised decision tree induction. The parallel algorithm called “Parallel Decision Tree for Large Datasets” (or ParDTLT for short) has proved to perform very well when large datasets become part of the training and classification process. The training phase processes in parallel the expansion of a node, considering only a subset of the whole set of training objects. The time complexity analysis proves a linear dependency on the cardinality of the complete set of training objects, and that the dependence is asymptotic and log–linear on the cardinality of the selected subset of training objects when categoric and numeric data are applied, respectively.
KeywordsEntropy ParDTLT algorithm Time complexity Synthetic data Real data
- 1.Dunham, M.H.: Data Mining, Introductory and Advanced Topics. 1st edn. Alan R. Apt (2003)Google Scholar
- 2.Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)Google Scholar
- 3.Franco-Árcega, A., Suárez-Cansino, J., Flores-Flores, L.: A parallel algorithm to induce decision trees for large datasets. In: XXIV International Conference on Information, Communication and Automation Technologies (ICAT 2013), Sarajevo, Bosnia and Herzegovina, 30 October–01 November 2013. IEEE Xplore, Digital Library (2013)Google Scholar
- 8.Kent Martin, J., Hirschberg, D.S.: On the complexity of learning decision trees. In: Proceedings of the 4th International Symposium on Artificial Intelligence and Mathematics (AI/MATH96), pp. 112–115 (1996)Google Scholar