GPGPU programming requires adjusting existing algorithms and often inventing new ones in order to achieve maximum performance. Solutions already created for supercomputers in nineties are not applicable since SIMD GPU devices are in many aspects different than vector supercomputers. This paper presents a new implementation of B + -tree index for GPU processors. It may be used in cases when processing parallelism and order of elements are equally important in computation. The solution is based on data representation optimal for GPU processing and an efficient parallel tree creation algorithm. We also deeply compare GPU B + -tree and other solutions.


Binary Search Thread Block Memory Latency Memory Copying Brute Force Search 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    NVIDIA Corporation, NVIDIA CUDA C programming guide version 4.0 (2011)Google Scholar
  2. 2.
    NVIDIA Corporation, CUDA C best practices guide (2011)Google Scholar
  3. 3.
    Kaczmarski, K.: Experimental b+-tree for gpu. In: ADBIS 2011, vol. (2), pp. 232–241 (2011)Google Scholar
  4. 4.
    Bingmann, T.: STX B+ Tree C++ Template Classes v 0.8.3 (2008),
  5. 5.
    Hoberock, J., Bell, N.: Thrust CUDA Library v.1.3.0 (2011),
  6. 6.
    Fix, J., Wilkes, A., Skadron, K.: Accelerating braided b+ tree searches on a gpu with cuda. In: Proceedings of the 2nd Workshop on Applications for Multi and Many Core Processors: Analysis, Implementation, and Performance, A4MMC (2011)Google Scholar
  7. 7.
    Kim, S.-W., Won, H.-S.: Batch-construction of b+-trees. In: Proc. of the 2001 ACM Symposium on Applied Computing, SAC 2001. ACM, USA (2001)Google Scholar
  8. 8.
    Kim, C., Chhugani, J., Satish, N., Sedlar, E., Nguyen, A.D., Kaldewey, T., Lee, V.W., Brandt, S.A., Dubey, P.: FAST: fast architecture sensitive tree search on modern cpus and gpus. In: Proc. of the 2010 Int. Conf. on Management of Data, SIGMOD 2010, pp. 339–350. ACM, New York (2010)Google Scholar
  9. 9.
    Rao, J., Ross, K.A.: Cache conscious indexing for decision-support in main memory. In: Proc. of the 25th Int. Conf. on Very Large Data Bases, VLDB 1999, San Francisco, CA, USA, pp. 78–89. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
  10. 10.
    Rao, J., Ross, K.: Making b+-trees cache conscious in main memory. ACM SIGMOD Record (January 2000)Google Scholar
  11. 11.
    Cederman, D., Tsigas, P.: Gpu-quicksort: A practical quicksort algorithm for graphics processors. ACM Journal of Experimental Algorithmics 14 (2009)Google Scholar
  12. 12.
    Harris, M., Owens, J.D., Sengupta, S.: CUDA Data Parallel Primitives Library (2008),
  13. 13.
    Knuth, D.E.: The Art of Computer Programming, Vol. III: Sorting and Searching. Addison-Wesley (1973)Google Scholar
  14. 14.
    Comer, D.: The ubiquitous b-tree. ACM Comput. Surv. 11(2) (1979)Google Scholar
  15. 15.
    Bayer, R., McCreight, E.M.: Organization and maintenance of large ordered indices. Acta Inf. 1, 173–189 (1972)CrossRefGoogle Scholar
  16. 16.
    NVIDIA Corporation, CUDA C Toolkit and SDK v.3.2 (January 2011),
  17. 17.
    Zhang, J., You, S., Gruenwald, L.: Indexing large-scale raster geospatial data using massively parallel gpgpu computing. In: Proc. of the 18th SIGSPATIAL Int. Conf. on Adv. in Geographic Inform. Systems, GIS 2010. ACM, USA (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Krzysztof Kaczmarski
    • 1
  1. 1.Warsaw University of TechnologyWarsawPoland

Personalised recommendations