Advertisement

Parallel Subspace Clustering Using Multi-core and Many-core Architectures

  • Amitava Datta
  • Amardeep Kaur
  • Tobias LauerEmail author
  • Sami Chabbouh
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 767)

Abstract

Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset where, a subspace is the subset of dimensions of the data. But exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, thus, parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage, firstly, the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation has shown linear speedup. Secondly, we are developing an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.

Keywords

Data mining Subspace clustering Multi-core architectures Many-core architectures GPU computing 

References

  1. 1.
    Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor. Newsl. 6(1), 90–105 (2004)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)zbMATHGoogle Scholar
  3. 3.
    Kaur, A., Datta, A.: Subscale: fast and scalable subspace clustering for high dimensional data. In: 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 621–628 (2014)Google Scholar
  4. 4.
    Kaur, A., Datta, A.: A novel algorithm for fast and scalable subspace clustering of high-dimensional data. J. Big Data 2(1), 17 (2015)CrossRefGoogle Scholar
  5. 5.
    Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Disc. 26(2), 332–397 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Agrawal, R., Gehrke, J., Gunopulos, D.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 94–105 (1998)Google Scholar
  7. 7.
    Kailing, K., Kriegel, H.P., Kroger, P.: Density-connected subspace clustering for high-dimensional data. In: SIAM International Conference on Data Mining, pp. 246–256 (2004)Google Scholar
  8. 8.
    Zhu, B., Mara, A., Mozo, A.: CLUS: parallel subspace clustering algorithm on spark. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. CCIS, vol. 539, pp. 175–185. Springer, Cham (2015). doi: 10.1007/978-3-319-23201-0_20 CrossRefGoogle Scholar
  9. 9.
    Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 46–55 (1998)CrossRefGoogle Scholar
  10. 10.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)Google Scholar
  11. 11.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  12. 12.
    Zhu, J., Liao, S., Lei, Z., Yi, D., Li, S.Z.: Pedestrian attribute classification in surveillance: database and evaluation. In: ICCV Workshop on Large-Scale Video Search and Mining (LSVSM 2013), Sydney (2013)Google Scholar
  13. 13.
    Nvidia: CUDA home page. http://www.nvidia.com/object/cuda_home_new.html. Accessed 26 May 2017
  14. 14.
    Loughry, J., van Hemert, J., Schoofs, L.: Efficiently enumerating the subsets of a set (2000). applied-math.org/subset.pdf
  15. 15.
    McCaffrey, J.: Generating the mth lexicographical element of a mathematical combination. MSDN Library (2004)Google Scholar
  16. 16.
    Anderson, S.E.: Bit Twiddling Hacks compute the lexicographically next bit permutation. http://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation. Accessed 26 May 2017
  17. 17.
    Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. GPU gems 3(39), 851–876 (2007)Google Scholar
  18. 18.
    Alcantara, D.A.F.: Efficient hash tables on the GPU. Ph.D. thesis, University of California Davis (2011)Google Scholar
  19. 19.
    Strohm, P.T., Wittmer, S., Haberstroh, A., Lauer, T.: GPU-accelerated quantification filters for analytical queries in multidimensional databases. In: Bassiliades, N., Ivanovic, M., Kon-Popovska, M., Manolopoulos, Y., Palpanas, T., Trajcevski, G., Vakali, A. (eds.) New Trends in Database and Information Systems II. AISC, vol. 312, pp. 229–242. Springer, Cham (2015). doi: 10.1007/978-3-319-10518-5_18 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Amitava Datta
    • 1
  • Amardeep Kaur
    • 1
  • Tobias Lauer
    • 2
    Email author
  • Sami Chabbouh
    • 2
  1. 1.School of Computer Science and Software EngineeringUniversity of Western AustraliaPerthAustralia
  2. 2.Department of Electrical Engineering and Information TechnologyOffenburg University of Applied SciencesOffenburgGermany

Personalised recommendations