Skip to main content

Improving Accuracy of Classification Based on C4.5 Decision Tree Algorithm Using Big Data Analytics

  • Conference paper
  • First Online:
Computational Intelligence in Data Mining

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 711))

Abstract

C4.5 is an algorithm of decision tree that broadly used classification technique. There are many challenges in the era of big data like size, time, and cost for building a decision tree. Aim of the decision tree construction is to boost up the accuracy on the training data. In predictive modeling, it requires to split the training datasets for this MATLAB is a good choice. Also analysis of data is done easily by decision tree instead of heterogeneous data. In this paper, C4.5 is implemented with the help of MATLAB using four different datasets which provides a confusion matrix in terms of target and output classes. At the end, it compared the features of datasets. The main objective of this research is to boost up the classification accuracy and roll back timing to build a classification model. We have reduced input space using Bhattacharya distance. The proposed method shows better performance for the data file. With the help of BD, improved C4.5 is performing better than original C4.5 in every test case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S. Desai, S. Roy, B. Patel, S. Purandare and M. Kucheria, “Very Fast Decision Tree (VFDT) algorithm on Hadoop”, 2016 International Conference on Computing Communication Control and automation (ICCUBEA), 2016.

    Google Scholar 

  2. S. Bashir, U. Qamar, F. Khan and M. Javed, “An Efficient Rule-Based Classification of Diabetes using ID3, C4.5 & amp; amp; CART Ensembles”, 2014 12th International Conference on frontiers of Information Technology, 2014.

    Google Scholar 

  3. Jiawei Han and MichelineKamber-Data Mining: Concepts and Techniques, 3rd edition, first volume, 2011.

    Google Scholar 

  4. Q. Ross, Morgan Kaufmann Publishers, “C4.5: Programs for Machine Learning”, San MateoInc (1993).

    Google Scholar 

  5. H. Akash, Kiran Bhowmick “A MapReduce based approach for classification” Online International Conference on Green Engineering and Technology (IC-GET) 2016.

    Google Scholar 

  6. Y. Zhen, Q. Yong and L. Jing, “The application of short classification based on C4.5 decision Tree in video retrieval”, 2011 6th IEEE Joint Information Technology Artificial Intelligence Conference, 2011.

    Google Scholar 

  7. M. M Mazid, A.B.M Shawkat Ali, K. S Tickle, “Improved C4.5 Algorithm for Rule Based Classification”, vol. 13, pp 296–301, 2010.

    Google Scholar 

  8. Yuan Z. “An improved network traffic classification algorithm based on Hadoop Decision tree”, Vol. 3, No. 1, March 2016.

    Google Scholar 

  9. X. Bao and X. Guan, “A Method of Predicting Crude Oil Output Based on RS-C4.5 Algorithm”, 3rd International Conference on Information Science and Control Engineering (ICISCE), 2016.

    Google Scholar 

  10. X. Zhao and J. Yang, “An improved TANC classification algorithm based on C4.5”, The 26th Chinese Control and Decision Conference (2014 CCDC), 2014.

    Google Scholar 

  11. S. Soliman, S. Abbas and A. Salem, “Classification of thromobosis collagen diseases based on C4.5 algorithm”, 2015 IEEE Seventh International Conference on Intelligent computing and Information System (ICICIS), 2015.

    Google Scholar 

  12. Z. Yuan and C. Wang, “An improved network traffic classification algorithm based on Hadoop decision tree”, 2016 IEEE Interntional Conference of Online Analysis and Computing Science (ICOACS), 2016.

    Google Scholar 

  13. B. Hssina, A. Merbouha, H. Ezzikouri, M. Erritali, “A comparative study of decision tree ID3 and C4.5”, vol. 1, No. 1, 2010.

    Google Scholar 

  14. Gongging Wu-haiguang Li-Xuegang Hu-yuanjun Bi-jing Zhang-XindongWu-“MReC4.5 Ensemble Classification with MapReduce” 4rt ChinaGrid Annual Conference-2009.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhavna Rawal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rawal, B., Agarwal, R. (2019). Improving Accuracy of Classification Based on C4.5 Decision Tree Algorithm Using Big Data Analytics. In: Behera, H., Nayak, J., Naik, B., Abraham, A. (eds) Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, vol 711. Springer, Singapore. https://doi.org/10.1007/978-981-10-8055-5_19

Download citation

Publish with us

Policies and ethics