Data Mining Using Graphics Processing Units

  • Christian Böhm
  • Robert Noll
  • Claudia Plant
  • Bianca Wackersreuther
  • Andrew Zherdin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5740)


During the last few years, Graphics Processing Units (GPU) have evolved from simple devices for the display signal preparation into powerful coprocessors that do not only support typical computer graphics tasks such as rendering of 3D scenarios but can also be used for general numeric and symbolic computation tasks such as simulation and optimization. As major advantage, GPUs provide extremely high parallelism (with several hundred simple programmable processors) combined with a high bandwidth in memory transfer at low cost. In this paper, we propose several algorithms for computationally expensive data mining tasks like similarity search and clustering which are designed for the highly parallel environment of a GPU. We define a multidimensional index structure which is particularly suited to support similarity queries under the restricted programming model of a GPU, and define a similarity join method. Moreover, we define highly parallel algorithms for density-based and partitioning clustering. In an extensive experimental evaluation, we demonstrate the superiority of our algorithms running on GPU over their conventional counterparts in CPU.


Index Structure Query Point Core Object Graphic Processor Similarity Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    NVIDIA CUDA Compute Unified Device Architecture - Programming Guide (2007)Google Scholar
  2. 2.
    Bernstein, D.J., Chen, T.-R., Cheng, C.-M., Lange, T., Yang, B.-Y.: Ecm on graphics cards. In: Soux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 483–501. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Böhm, C., Braunmüller, B., Breunig, M.M., Kriegel, H.-P.: High performance clustering based on the similarity join. In: CIKM, pp. 298–305 (2000)Google Scholar
  4. 4.
    Böhm, C., Noll, R., Plant, C., Zherdin, A.: Indexsupported similarity join on graphics processors. In: BTW, pp. 57–66 (2009)Google Scholar
  5. 5.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: SIGMOD Conference, pp. 93–104 (2000)Google Scholar
  6. 6.
    Cao, F., Tung, A.K.H., Zhou, A.: Scalable clustering using graphics processors. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 372–384. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Catanzaro, B.C., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: ICML, pp. 104–111 (2008)Google Scholar
  8. 8.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)Google Scholar
  9. 9.
    Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Knowledge discovery and data mining: Towards a unifying framework. In: KDD, pp. 82–88 (1996)Google Scholar
  10. 10.
    Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: Gputerasort: high performance graphics co-processor sorting for large database management. In: SIGMOD Conference, pp. 325–336 (2006)Google Scholar
  11. 11.
    Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M.C., Manocha, D.: Fast computation of database operations using graphics processors. In: SIGMOD Conference, pp. 215–226 (2004)Google Scholar
  12. 12.
    Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: SIGMOD Conference, pp. 73–84 (1998)Google Scholar
  13. 13.
    Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: SIGMOD Conference, pp. 47–57 (1984)Google Scholar
  14. 14.
    He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational joins on graphics processors. In: SIGMOD, pp. 511–524 (2008)Google Scholar
  15. 15.
    Katz, G.J., Kider, J.T.: All-pairs shortest-paths for large graphs on the gpu. In: Graphics Hardware, pp. 47–55 (2008)Google Scholar
  16. 16.
    Kitsuregawa, M., Harada, L., Takagi, M.: Join strategies on kd-tree indexed relations. In: ICDE, pp. 85–93 (1989)Google Scholar
  17. 17.
    Koperski, K., Han, J.: Discovery of spatial association rules in geographic information databases. In: Egenhofer, M.J., Herring, J.R. (eds.) SSD 1995. LNCS, vol. 951, pp. 47–66. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  18. 18.
    Leutenegger, S.T., Edgington, J.M., Lopez, M.A.: Str: A simple and efficient algorithm for r-tree packing. In: ICDE, pp. 497–506 (1997)Google Scholar
  19. 19.
    Lieberman, M.D., Sankaranarayanan, J., Samet, H.: A fast similarity join algorithm using graphics processing units. In: ICDE, pp. 1111–1120 (2008)Google Scholar
  20. 20.
    Liu, W., Schmidt, B., Voss, G., Müller-Wittig, W.: Molecular dynamics simulations on commodity gpus with cuda. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 185–196. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Macqueen, J.B.: Some methods of classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  22. 22.
    Manavski, S., Valle, G.: Cuda compatible gpu cards as efficient hardware accelerators for smith-waterman sequence alignment. BMC Bioinformatics 9 (2008)Google Scholar
  23. 23.
    Meila, M.: The uniqueness of a good optimum for k-means. In: ICML, pp. 625–632 (2006)Google Scholar
  24. 24.
    Plant, C., Böhm, C., Tilg, B., Baumgartner, C.: Enhancing instance-based classification with local density: a new algorithm for classifying unbalanced biomedical data. Bioinformatics 22(8), 981–988 (2006)CrossRefGoogle Scholar
  25. 25.
    Shalom, S.A.A., Dash, M., Tue, M.: Efficient k-means clustering using accelerated graphics processors. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 166–175. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  26. 26.
    Szalay, A., Gray, J.: 2020 computing: Science in an exponential world. Nature 440, 413–414 (2006)CrossRefGoogle Scholar
  27. 27.
    Tasora, A., Negrut, D., Anitescu, M.: Large-scale parallel multi-body dynamics with frictional contact on the graphical processing unit. Proc. of Inst. Mech. Eng. Journal of Multi-body Dynamics 222(4), 315–326Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Christian Böhm
    • 1
  • Robert Noll
    • 1
  • Claudia Plant
    • 2
  • Bianca Wackersreuther
    • 1
  • Andrew Zherdin
    • 2
  1. 1.University of MunichGermany
  2. 2.Technische Universität MünchenGermany

Personalised recommendations