Advertisement

A Time Complexity Analysis to the ParDTLT Parallel Algorithm for Decision Tree Induction

  • Joel Suárez-CansinoEmail author
  • Anilú Franco-Árcega
  • Linda Gladiola Flores-Flores
  • Virgilio López-Morales
  • Ruslan Gabbasov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11288)

Abstract

In addition to the usual tests for analyzing the performance of a decision tree in a classification process, the analysis of the amount of time and the space resource required are also useful during the supervised decision tree induction. The parallel algorithm called “Parallel Decision Tree for Large Datasets” (or ParDTLT for short) has proved to perform very well when large datasets become part of the training and classification process. The training phase processes in parallel the expansion of a node, considering only a subset of the whole set of training objects. The time complexity analysis proves a linear dependency on the cardinality of the complete set of training objects, and that the dependence is asymptotic and log–linear on the cardinality of the selected subset of training objects when categoric and numeric data are applied, respectively.

Keywords

Entropy ParDTLT algorithm Time complexity Synthetic data Real data 

References

  1. 1.
    Dunham, M.H.: Data Mining, Introductory and Advanced Topics. 1st edn. Alan R. Apt (2003)Google Scholar
  2. 2.
    Ross Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)Google Scholar
  3. 3.
    Franco-Árcega, A., Suárez-Cansino, J., Flores-Flores, L.: A parallel algorithm to induce decision trees for large datasets. In: XXIV International Conference on Information, Communication and Automation Technologies (ICAT 2013), Sarajevo, Bosnia and Herzegovina, 30 October–01 November 2013. IEEE Xplore, Digital Library (2013)Google Scholar
  4. 4.
    Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5, 914–925 (1993)CrossRefGoogle Scholar
  5. 5.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  6. 6.
    Adelman-McCarthy, J., Agueros, M.A., Allam, S.S.: The sixth data release of the sloan digital sky survey. ApJS 175(2), 297 (2008)CrossRefGoogle Scholar
  7. 7.
    Buhrman, H., de Wolf, R.: Complexity measures and decision tree complexity: a survey. Theor. Comput. Sci. 288, 21–43 (2002)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Kent Martin, J., Hirschberg, D.S.: On the complexity of learning decision trees. In: Proceedings of the 4th International Symposium on Artificial Intelligence and Mathematics (AI/MATH96), pp. 112–115 (1996)Google Scholar
  9. 9.
    Moshkov, M.J.: Time complexity of decision trees. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets III. LNCS, vol. 3400, pp. 244–459. Springer, Heidelberg (2005).  https://doi.org/10.1007/11427834_12CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Joel Suárez-Cansino
    • 1
    Email author
  • Anilú Franco-Árcega
    • 1
  • Linda Gladiola Flores-Flores
    • 1
  • Virgilio López-Morales
    • 1
  • Ruslan Gabbasov
    • 1
  1. 1.Intelligent Computing Research Group, Information and Systems Technologies Research Center, Engineering and Basic Sciences InstituteAutonomous University of the State of HidalgoMineral de la ReformaMexico

Personalised recommendations