Abstract
An algorithm for performing online clustering on the GPU is proposed which makes heavy use of the atomic operations available on the GPU. The algorithm can cluster multiple documents in parallel in way that can saturate all the parallel threads on the GPU. The algorithm takes advantage of atomic operations available on the GPU in order to cluster multiple documents at the same time. The algorithm results in up to 3X speedup using a real time news document data set as well as on randomly generated data compared to a baseline algorithm on the GPU that clusters only one document at a time.
This work was supported in part by the National Science Foundation under Grants IIS-08-12377, IIS-09-48548, IIS-10-18475, and IIS-12-19023; and by Google and NVIDIA. J. Sankaranarayanan is currently at NEC Labs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hjaltason, G.R., Samet, H.: Speeding up construction of PMR quadtree-based spatial indexes. VLDB Journal 11(2), 109–137 (2002)
Lieberman, M.D., Sankaranarayanan, J., Samet, H.: A fast similarity join algorithm using graphics processing units. In: IEEE ICDE, pp. 1111–1120 (April 2008)
Samet, H.: K-nearest neighbor finding using MaxNearestDist. IEEE TPAMI 30(2), 243–252 (2008)
Samet, H., Alborzi, H., Brabec, F., Esperança, C., Hjaltason, G.R., Morgan, F., Tanin, E.: Use of the SAND spatial browser for digital government applications. CACM 46(1), 63–66 (2003)
Samet, H., Rosenfeld, A., Shaffer, C.A., Webber, R.E.: A geographic information system using quadtrees. Pattern Recognition 17(6), 647–656 (1984)
Samet, H., Tamminen, M.: Bintrees, CSG trees, and time. Computer Graphics 19(3), 121–130 (1985); also in SIGGRAPH 1985
Sankaranarayanan, J., Alborzi, H., Samet, H.: Efficient query processing on spatial networks. In: ACM GIS, pp. 200–209 (November 2005)
Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J.: Twitterstand: News in tweets. In: ACM GIS, pp. 42–51 (November 2009)
Sankaranarayanan, J., Samet, H., Varshney, A.: A fast all nearest neighbor algorithm for applications involving large point-clouds. Computers & Graphics 31(2), 157–174 (2007)
Tanin, E., Harwood, A., Samet, H.: A distributed quadtree index for peer-to-peer settings. In: IEEE ICDE, pp. 254–255 (April 2005)
Teitler, B.E., Lieberman, M.D., Panozzo, D., Sankaranarayanan, J., Samet, H., Sperling, J.: NewsStand: A new view on news. In: ACM GIS, pp. 144–153 (November 2008)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, Reading (2011)
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high dimensional spaces. In: VLDB, pp. 194–205 (August 1998)
Vishkin, U.: Thinking in parallel: Some basic data-parallel algorithms and techniques, College Park, MD (2007)
Teitler, B.E., Sankaranarayanan, J., Samet, H.: Online document clustering using the GPU. CS-TR 4970, UMD, College Park, MD (August 2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Teitler, B.E., Sankaranarayanan, J., Samet, H., Adelfio, M.D. (2014). Online Document Clustering Using GPUs. In: Catania, B., et al. New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 241. Springer, Cham. https://doi.org/10.1007/978-3-319-01863-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-01863-8_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01862-1
Online ISBN: 978-3-319-01863-8
eBook Packages: EngineeringEngineering (R0)