Abstract
This paper investigates scalable implementations of out-of-core I/O-intensive Data Mining algorithms on affordable parallel architectures, such as clusters of workstations. In order to validate our approach, the K-means algorithm, a well known DM Clustering algorithm, was used as a test case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jain A.K. and Dubes R.C. Algorithms for Clustering Data. Prentice Hall, 1988.
M. Beck et al. Linux Kernel Internals, 2nd ed. Addison-Wesley, 1998.
Rajkumar Buyya, editor. High Performance Cluster Computing. Prentice Hall PTR, 1999.
I. S. Dhillon and D. S. Modha. A data clustering algorithm on distributed memory machines. In ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 1999.
A. A. Freitas and S. H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer Academin Publishers, 1998.
V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining Very Large Databases. IEEE Computer, 32(8):38–45, 1999.
E. Han, G. Karypis, and V. Kumar. Scalable Parallel Data Mining for Association Rules. IEEE Transactions on Knowledge and Data Engineering. To appear.
J.A. Hartigan. Clustering Algorithms. Wiley & Sons, 1975.
G. Karypis, E. Han, and V. Kumar. Chameleon: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer, 32:68–75, 1999.
Mac Queen, J.B. Some Methods for Classification and Analysis of Multivariate Observation. 5 thBerkeley Symp. on Mathematical Statistics and Probability, pages 281–297. Univ. of California Press, 1967.
Chris Ruemmler and John Wilkes. An Introduction to Disk Drive Modeling. IEEE Computer, 27(3):17–28, March 1994.
K. Stoffel and A. Belkoniene. Parallel k-means clustering for large datasets. EuroPar’99 Parallel Processing, Lecture Notes in Computer Science, No. 1685. Springer-Verlag, 1999.
Sterling T.L., Salmon J., Becker D.J., and Savarese D.F. How to Build a Beowulf. A guide to the Implementation and Application of PC Clusters. The MIT Press, 1999.
J. S. Vitter. External Memory Algorithms and Data Structures. In External Memory Algorithms (DIMACS Series on Discrete Mathematics and Theoretical Computer Science). American Mathematical Society, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baraglia, R., Laforenza, D., Orlando, S., Palmerini, P., Perego, R. (2000). Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations. In: Rolim, J. (eds) Parallel and Distributed Processing. IPDPS 2000. Lecture Notes in Computer Science, vol 1800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45591-4_46
Download citation
DOI: https://doi.org/10.1007/3-540-45591-4_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67442-9
Online ISBN: 978-3-540-45591-2
eBook Packages: Springer Book Archive