Skip to main content

Parallelisation of C4.5 as a Particular Divide and Conquer Computation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1800))

Abstract

In this work we show the research track and the current results about the application of structured parallel programming tools to develop scalable data-mining applications. We discuss the exploitation of the divide and conquer nature of the well known C4.5 classification algorithm in spite of its in-core memory requirements. The opportunity of applying external memory techniques to manage the data is advocated. Current results of the experiments are reported.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Becuzzi, M. Coppola, D. Laforenza, S. Ruggieri, D. Talia, and M. Vanneschi. Data analysis and data mining with parallel architectures: Techniques and experiments. Technical report, Consorzio Pisa Ricerche, project “Parallel Intelligent Systems for Tax Fraud Detection”, December 1998.

    Google Scholar 

  2. P. Becuzzi, M. Coppola, and M. Vanneschi. Association rules in large databases, additional results. http://www.di.unipi.it/~coppola/ep99talk.ps, Aug 1999.

  3. P. Becuzzi, M. Coppola, and M. Vanneschi. Mining of Association Rules in Very Large Databases: a Structured Parallel Approach. In Euro-Par’99 Parallel Processing, volume 1685 of LNCS. Springer, 1999.

    Chapter  Google Scholar 

  4. John Darlington, Yike Guo, Janjao Sutiwaraphun, and Hing Wing To. Parallel Induction Algorithms for Data Mining. In Advances in intelligent data analysis: reasoning about data IDA’ 97, volume 1280 of LNGS, 1997.

    Google Scholar 

  5. J.R. Quinlan. C 4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.

    Google Scholar 

  6. S. Ruggieri. Efficient C4.5. Draft, http://www-kdd.di.unipi.it/software.

  7. John Shafer, Rakesh Agrawal, and Manish Mehta. SPRINT: A Scalable Parallel Classifier for Data Mining. In Proceedings of the 22nd VLDB Conference, 1996.

    Google Scholar 

  8. A. Srivastava, E.H. Han, V. Kumar, and V. Singh. Parallel Formulations of Decision-Tree Classification Algorithms. Data Mining and Knowledge Discovery, 3(3), 1999.

    Google Scholar 

  9. M. Vanneschi. PQE2000: HPC Tools for Industrial Applications. IEEE Concurrency: Parallel, Distributed & Mobile Computing, 6(4):68–73, Oct–Dec 1998.

    Article  Google Scholar 

  10. Jeffrey Scott Vitter. External Memory Algorithms and Data Structures: Dealing with MASSIVE DATA. Draft, http://www.cs.duke.edu/~jsv, January 2000.

  11. Mohammed J. Zaki, Ching-Tien Ho, and Rakesh Agrawal. Scalable Parallel Classification for Data Mining on Shared-Memory Multiprocessors. In Proc. of the IEEE Int’l Conference on Data Engineering, March 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Becuzzi, P., Coppola, M., Ruggieri, S., Vanneschi, M. (2000). Parallelisation of C4.5 as a Particular Divide and Conquer Computation. In: Rolim, J. (eds) Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, vol 1800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45591-4_50

Download citation

  • DOI: https://doi.org/10.1007/3-540-45591-4_50

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67442-9

  • Online ISBN: 978-3-540-45591-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics