Advertisement

An OpenMP Parallelization of the K-means Algorithm Accelerated Using KD-trees

  • Wojciech KwedloEmail author
  • Michał Łubowicz
Conference paper
  • 139 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12043)

Abstract

In the paper a KD-tree based filtering algorithm for K-means clustering is considered. A parallel version of the algorithm for shared memory systems, which uses OpenMP tasks both for KD-tree construction and filtering in the assignment step of K-means, is proposed. In our approach, an OpenMP task is created for a recursive call performed by tree construction and filtering procedures. A data partitioning step during the tree construction is also parallelized by OpenMP tasks. In computational experiments we measured runtimes of the parallel and serial version of the filtering algorithm and a parallel version of classical Lloyd’s algorithm for six datasets sampled from two distributions. The results of experiments, performed on a 24-core system indicate that our version filtering algorithm has very good parallel efficiency. Its runtime is up to four orders of magnitude shorter than the runtime of parallel Lloyd’s algorithm.

Keywords

K-means clustering OpenMP tasks KD-trees 

Notes

Acknowledgments

This work was supported by Białystok University of Technology grant S/WI/2/2018 funded by Polish Ministry of Science and Higher Education. The calculations were carried out at the Academic Computer Centre in Gdańsk, Poland.

References

  1. 1.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009).  https://doi.org/10.1007/s10994-009-5103-0CrossRefzbMATHGoogle Scholar
  2. 2.
    Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-means++. Proc. VLDB Endow. 5(7), 622–633 (2012).  https://doi.org/10.14778/2180912.2180915CrossRefGoogle Scholar
  3. 3.
    Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. Pract. Exp. 19(13), 1749–1783 (2007).  https://doi.org/10.1002/cpe.1206CrossRefGoogle Scholar
  5. 5.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  6. 6.
    Frias, L., Petit, J.: Parallel partition revisited. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 142–153. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-68552-4_11CrossRefGoogle Scholar
  7. 7.
    Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In: Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 41–78. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-09259-1_2CrossRefGoogle Scholar
  8. 8.
    Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010).  https://doi.org/10.1016/j.patrec.2009.09.011CrossRefGoogle Scholar
  9. 9.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient K-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002).  https://doi.org/10.1109/TPAMI.2002.1017616CrossRefzbMATHGoogle Scholar
  10. 10.
    Kwedlo, W., Czochański, P.J.: A hybrid MPI/OpenMP parallelization of K-means algorithms accelerated using the triangle inequality. IEEE Access 7, 42280–42297 (2019).  https://doi.org/10.1109/ACCESS.2019.2907885CrossRefGoogle Scholar
  11. 11.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982).  https://doi.org/10.1109/TIT.1982.1056489MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)Google Scholar
  13. 13.
    Maitra, R., Melnykov, V.: Simulating data to study performance of finite mixture modeling and clustering algorithms. J. Comput. Graph. Stat. 19(2), 354–376 (2010)MathSciNetCrossRefGoogle Scholar
  14. 14.
    OpenMP Architecture Review Board: OpenMP application program interface version 4.5 (2015). http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
  15. 15.
    Pelleg, D., Moore, A.: Accelerating exact K-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281 (1999).  https://doi.org/10.1145/312129.312248
  16. 16.
    Pettinger, D., Di Fatta, G.: Scalability of efficient parallel K-means. In: Proceedings of the 5th IEEE International Conference on e-Science, Workshop on Computational e-Science, pp. 96–101 (2009).  https://doi.org/10.1109/ESCIW.2009.5407991

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Faculty of Computer ScienceBiałystok University of TechnologyBiałystokPoland

Personalised recommendations