Abstract
In this work we show the research track and the current results about the application of structured parallel programming tools to develop scalable data-mining applications. We discuss the exploitation of the divide and conquer nature of the well known C4.5 classification algorithm in spite of its in-core memory requirements. The opportunity of applying external memory techniques to manage the data is advocated. Current results of the experiments are reported.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
P. Becuzzi, M. Coppola, D. Laforenza, S. Ruggieri, D. Talia, and M. Vanneschi. Data analysis and data mining with parallel architectures: Techniques and experiments. Technical report, Consorzio Pisa Ricerche, project “Parallel Intelligent Systems for Tax Fraud Detection”, December 1998.
P. Becuzzi, M. Coppola, and M. Vanneschi. Association rules in large databases, additional results. http://www.di.unipi.it/~coppola/ep99talk.ps, Aug 1999.
P. Becuzzi, M. Coppola, and M. Vanneschi. Mining of Association Rules in Very Large Databases: a Structured Parallel Approach. In Euro-Par’99 Parallel Processing, volume 1685 of LNCS. Springer, 1999.
John Darlington, Yike Guo, Janjao Sutiwaraphun, and Hing Wing To. Parallel Induction Algorithms for Data Mining. In Advances in intelligent data analysis: reasoning about data IDA’ 97, volume 1280 of LNGS, 1997.
J.R. Quinlan. C 4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.
S. Ruggieri. Efficient C4.5. Draft, http://www-kdd.di.unipi.it/software.
John Shafer, Rakesh Agrawal, and Manish Mehta. SPRINT: A Scalable Parallel Classifier for Data Mining. In Proceedings of the 22nd VLDB Conference, 1996.
A. Srivastava, E.H. Han, V. Kumar, and V. Singh. Parallel Formulations of Decision-Tree Classification Algorithms. Data Mining and Knowledge Discovery, 3(3), 1999.
M. Vanneschi. PQE2000: HPC Tools for Industrial Applications. IEEE Concurrency: Parallel, Distributed & Mobile Computing, 6(4):68–73, Oct–Dec 1998.
Jeffrey Scott Vitter. External Memory Algorithms and Data Structures: Dealing with MASSIVE DATA. Draft, http://www.cs.duke.edu/~jsv, January 2000.
Mohammed J. Zaki, Ching-Tien Ho, and Rakesh Agrawal. Scalable Parallel Classification for Data Mining on Shared-Memory Multiprocessors. In Proc. of the IEEE Int’l Conference on Data Engineering, March 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Becuzzi, P., Coppola, M., Ruggieri, S., Vanneschi, M. (2000). Parallelisation of C4.5 as a Particular Divide and Conquer Computation. In: Rolim, J. (eds) Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, vol 1800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45591-4_50
Download citation
DOI: https://doi.org/10.1007/3-540-45591-4_50
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67442-9
Online ISBN: 978-3-540-45591-2
eBook Packages: Springer Book Archive