Abstract
This paper presents a parallel implementation of CURE, an efficient hierarchical data clustering algorithm, using the OpenMP programming model. OpenMP provides a means of transparent management of the asymmetry and non–determinism in CURE, while our OpenMP runtime support enables the effective exploitation of the irregular nested loop–level parallelism. Experimental results for various problem parameters demonstrate the scalability of our implementation and the effective utilization of parallel hardware, which enable the use of CURE for large data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arlia, D., Coppola, M.: Experiments in Parallel Clustering using DBSCAN. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 326–331. Springer, Heidelberg (2001)
OpenMP Architecture Review Board. OpenMP specifications, Available at: http://www.openmp.org
Dimakopoulos, V.V., Tzoumas, X., Leontiadis, E.: A Portable Compiler for OpenMP v. 2.0. In: Proceedings of the 5th European Workshop on OpenMP (EWOMP 2003), Aachen, Germany (October 2003)
Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large DataBases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1998)
Hadjidoukas, P.E., Amsaleg, L.: Portable Support and Exploitation of Nested Parallelism in OpenMP. In: Proceedings of the 6th European Workshop on OpenMP (EWOMP 2004), Stockholm, Sweden (October 2004)
Hadjidoukas, P.E., Polychronopoulos, E.D., Papatheodorou, T.S.: A Modular OpenMP Implementation for Clusters of Multiprocessors. Journal of Parallel and Distributed Computing Practices (PDCP), Special Issue on OpenMP: Experiences, Implementations and Applications 2, 153–168 (2004)
Judd, D., McKinley, P., Jain, A.: Large-Scale Parallel Data Clustering. In: Proceedings of the International Conference on Pattern Recognition (1996)
Nagesh, H.S., Goil, S., Choudhary, A.: A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets. In: Proceedings of the International Conference on Parallel Processing (ICPP 2000) (2000)
Olson, C.F.: Parallel Algorithms for Hierarchical Clustering. Parallel Computing 21, 1313–1325 (1995)
Pizzuti, C., Talia, D.: P–AutoClass: Scalable Parallel Clustering for Mining Large Data Sets. IEEE Transactions on Knowledge and Data Engineering 15(3) (May 2003)
Stoffel, K., Belkoniene, A.: Parallel K–Means Clustering for Large Data Sets. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 1451–1454. Springer, Heidelberg (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hadjidoukas, P.E., Amsaleg, L. (2008). Parallelization of a Hierarchical Data Clustering Algorithm Using OpenMP. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds) OpenMP Shared Memory Parallel Programming. IWOMP 2005. Lecture Notes in Computer Science, vol 4315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68555-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-68555-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68554-8
Online ISBN: 978-3-540-68555-5
eBook Packages: Computer ScienceComputer Science (R0)