An OpenMP Parallelization of the K-means Algorithm Accelerated Using KD-trees
- 139 Downloads
In the paper a KD-tree based filtering algorithm for K-means clustering is considered. A parallel version of the algorithm for shared memory systems, which uses OpenMP tasks both for KD-tree construction and filtering in the assignment step of K-means, is proposed. In our approach, an OpenMP task is created for a recursive call performed by tree construction and filtering procedures. A data partitioning step during the tree construction is also parallelized by OpenMP tasks. In computational experiments we measured runtimes of the parallel and serial version of the filtering algorithm and a parallel version of classical Lloyd’s algorithm for six datasets sampled from two distributions. The results of experiments, performed on a 24-core system indicate that our version filtering algorithm has very good parallel efficiency. Its runtime is up to four orders of magnitude shorter than the runtime of parallel Lloyd’s algorithm.
KeywordsK-means clustering OpenMP tasks KD-trees
This work was supported by Białystok University of Technology grant S/WI/2/2018 funded by Polish Ministry of Science and Higher Education. The calculations were carried out at the Academic Computer Centre in Gdańsk, Poland.
- 12.MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
- 14.OpenMP Architecture Review Board: OpenMP application program interface version 4.5 (2015). http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
- 15.Pelleg, D., Moore, A.: Accelerating exact K-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281 (1999). https://doi.org/10.1145/312129.312248
- 16.Pettinger, D., Di Fatta, G.: Scalability of efficient parallel K-means. In: Proceedings of the 5th IEEE International Conference on e-Science, Workshop on Computational e-Science, pp. 96–101 (2009). https://doi.org/10.1109/ESCIW.2009.5407991