Skip to main content

Online Document Clustering Using GPUs

  • Conference paper
Book cover New Trends in Databases and Information Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 241))

Abstract

An algorithm for performing online clustering on the GPU is proposed which makes heavy use of the atomic operations available on the GPU. The algorithm can cluster multiple documents in parallel in way that can saturate all the parallel threads on the GPU. The algorithm takes advantage of atomic operations available on the GPU in order to cluster multiple documents at the same time. The algorithm results in up to 3X speedup using a real time news document data set as well as on randomly generated data compared to a baseline algorithm on the GPU that clusters only one document at a time.

This work was supported in part by the National Science Foundation under Grants IIS-08-12377, IIS-09-48548, IIS-10-18475, and IIS-12-19023; and by Google and NVIDIA. J. Sankaranarayanan is currently at NEC Labs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hjaltason, G.R., Samet, H.: Speeding up construction of PMR quadtree-based spatial indexes. VLDB Journal 11(2), 109–137 (2002)

    Article  Google Scholar 

  2. Lieberman, M.D., Sankaranarayanan, J., Samet, H.: A fast similarity join algorithm using graphics processing units. In: IEEE ICDE, pp. 1111–1120 (April 2008)

    Google Scholar 

  3. Samet, H.: K-nearest neighbor finding using MaxNearestDist. IEEE TPAMI 30(2), 243–252 (2008)

    Article  MathSciNet  Google Scholar 

  4. Samet, H., Alborzi, H., Brabec, F., Esperança, C., Hjaltason, G.R., Morgan, F., Tanin, E.: Use of the SAND spatial browser for digital government applications. CACM 46(1), 63–66 (2003)

    Article  Google Scholar 

  5. Samet, H., Rosenfeld, A., Shaffer, C.A., Webber, R.E.: A geographic information system using quadtrees. Pattern Recognition 17(6), 647–656 (1984)

    Article  Google Scholar 

  6. Samet, H., Tamminen, M.: Bintrees, CSG trees, and time. Computer Graphics 19(3), 121–130 (1985); also in SIGGRAPH 1985

    Article  Google Scholar 

  7. Sankaranarayanan, J., Alborzi, H., Samet, H.: Efficient query processing on spatial networks. In: ACM GIS, pp. 200–209 (November 2005)

    Google Scholar 

  8. Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J.: Twitterstand: News in tweets. In: ACM GIS, pp. 42–51 (November 2009)

    Google Scholar 

  9. Sankaranarayanan, J., Samet, H., Varshney, A.: A fast all nearest neighbor algorithm for applications involving large point-clouds. Computers & Graphics 31(2), 157–174 (2007)

    Article  Google Scholar 

  10. Tanin, E., Harwood, A., Samet, H.: A distributed quadtree index for peer-to-peer settings. In: IEEE ICDE, pp. 254–255 (April 2005)

    Google Scholar 

  11. Teitler, B.E., Lieberman, M.D., Panozzo, D., Sankaranarayanan, J., Samet, H., Sperling, J.: NewsStand: A new view on news. In: ACM GIS, pp. 144–153 (November 2008)

    Google Scholar 

  12. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  13. Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, Reading (2011)

    Google Scholar 

  14. Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high dimensional spaces. In: VLDB, pp. 194–205 (August 1998)

    Google Scholar 

  15. Vishkin, U.: Thinking in parallel: Some basic data-parallel algorithms and techniques, College Park, MD (2007)

    Google Scholar 

  16. Teitler, B.E., Sankaranarayanan, J., Samet, H.: Online document clustering using the GPU. CS-TR 4970, UMD, College Park, MD (August 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin E. Teitler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Teitler, B.E., Sankaranarayanan, J., Samet, H., Adelfio, M.D. (2014). Online Document Clustering Using GPUs. In: Catania, B., et al. New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 241. Springer, Cham. https://doi.org/10.1007/978-3-319-01863-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01863-8_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01862-1

  • Online ISBN: 978-3-319-01863-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics