GPU-Accelerated Database Systems: Survey and Open Challenges

  • Sebastian BreßEmail author
  • Max Heimel
  • Norbert Siegmund
  • Ladjel Bellatreche
  • Gunter Saake
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8920)


The vast amount of processing power and memory bandwidth provided by modern graphics cards make them an interesting platform for data-intensive applications. Unsurprisingly, the database research community identified GPUs as effective co-processors for data processing several years ago. In the past years, there were many approaches to make use of GPUs at different levels of a database system. In this paper, we explore the design space of GPU-accelerated database management systems. Based on this survey, we present key properties, important trade-offs and typical challenges of GPU-aware database architectures, and identify major open challenges. Additionally, we survey existing GPU-accelerated DBMSs and classify their architectural properties. Then, we summarize typical optimizations implemented in GPU-accelerated DBMSs. Finally, we propose a reference architecture, indicating how GPU acceleration can be integrated in existing DBMSs.


GPU-accelerated database Survey Co-processing Modern database architecture 



We thank Tobias Lauer from Jedox AG and the anonymous reviewers of the GPUs in Databases Workshop for their helpful feedback on the workshop version of this paper [17]. We thank Jens Teubner from TU Dortmund University, Michael Saecker from ParStream GmbH, and the anonymous reviewers of the TLDKS journal for their helpful comments on the journal version of this paper.


  1. 1.
    Palo GPU accelerator. White Paper (2010)Google Scholar
  2. 2.
    Parstream - turning data into knowledge. White Paper, November 2010Google Scholar
  3. 3.
    Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: VLDB, pp. 169–180. Morgan Kaufmann Publishers Inc. (2001)Google Scholar
  4. 4.
    Andrzejewski, W., Wrembel, R.: GPU-WAH: applying GPUs to compressing bitmap indexes with word aligned hybrid. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010, Part II. LNCS, vol. 6262, pp. 315–329. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2011)CrossRefGoogle Scholar
  6. 6.
    Bakkum, P., Chakradhar, S.: Efficient data management for GPU databases (2012).
  7. 7.
    Bakkum, P., Skadron, K.: Accelerating SQL database operations on a GPU with CUDA. In: GPGPU, pp. 94–103. ACM (2010)Google Scholar
  8. 8.
    Beier, F., Kilias, T., Sattler, K.-U.: GiST scan acceleration using coprocessors. In: DaMoN, pp. 63–69. ACM (2012)Google Scholar
  9. 9.
    Binnig, C., Hildenbrand, S., Färber, F.: Dictionary-based order-preserving string compression for main memory column stores. In: SIGMOD, pp. 283–296. ACM (2009)Google Scholar
  10. 10.
    Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008)CrossRefGoogle Scholar
  11. 11.
    Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: CIDR, pp. 225–237 (2005)Google Scholar
  12. 12.
    Borkar, S., Chien, A.A.: The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)CrossRefGoogle Scholar
  13. 13.
    Breß, S.: Why it is time for a HyPE: a hybrid query processing engine for efficient GPU coprocessing in dbms. The VLDB PhD Workshop, PVLDB 6(12), 1398–1403 (2013)Google Scholar
  14. 14.
    Breß, S., Beier, F., Rauhe, H., Sattler, K.-U., Schallehn, E., Saake, G.: Efficient co-processor utilization in database query processing. Inf. Syst. 38(8), 1084–1096 (2013)CrossRefGoogle Scholar
  15. 15.
    Breß, S., Geist, I., Schallehn, E., Mory, M., Saake, G.: A framework for cost based optimization of hybrid CPU/GPU query plans in database systems. Control Cybern. 41(4), 715–742 (2012)Google Scholar
  16. 16.
    Breß, S., Haberkorn, R., Ladewig, S.: CoGaDB reference manual (2014).
  17. 17.
    Breß, S., Heimel, M., Siegmund, N., Bellatreche, L., Saake, G.: Exploring the design space of a GPU-aware database architecture. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. AISC, vol. 241, pp. 225–234. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  18. 18.
    Breß, S., Siegmund, N., Bellatreche, L., Saake, G.: An operator-stream-based scheduling engine for effective GPU coprocessing. In: Catania, B., Guerrini, G., Pokorný, J. (eds.) ADBIS 2013. LNCS, vol. 8133, pp. 288–301. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  19. 19.
    Broneske, D., Breß, S., Heimel, M., Saake, G.: Toward hardware-sensitive database operations. In: EDBT, pp. 229–234. (2014)Google Scholar
  20. 20.
    Dees, J., Sanders, P.: Efficient many-core query execution in main memory column-stores. In: ICDE, pp. 350–361. IEEE (2013)Google Scholar
  21. 21.
    Diamos, G., Wu, H., Lele, A., Wang, J., Yalamanchili, S.: Efficient relational algebra algorithms and data structures for GPU. Technical report, Center for Experimental Research in Computer Systems (CERS) (2012)Google Scholar
  22. 22.
    Fang, R., He, B., Lu, M., Yang, K., Govindaraju, N.K., Luo, Q., Sander, P.V.: GPUQP: query co-processing using graphics processors. In: SIGMOD, pp. 1061–1063. ACM (2007)Google Scholar
  23. 23.
    Fang, W., He, B., Luo, Q.: Database compression on graphics processors. PVLDB 3, 670–680 (2010)Google Scholar
  24. 24.
    Gaster, B.R., Howes, L., Kaeli, D., Mistry, P., Schaa, D.: Heterogeneous Computing With Opencl. Elsevier Sci. Technol. 1–2 (2012)Google Scholar
  25. 25.
    Ghodsnia, P.: An in-GPU-memory column-oriented database for processing analytical workloads. In: The VLDB PhD Workshop. VLDB Endowment (2012)Google Scholar
  26. 26.
    Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: SIGMOD, pp. 102–111. ACM (1990)Google Scholar
  27. 27.
    Gregg, C., Hazelwood, K.: Where is the data? why you cannot debate CPU vs. GPU performance without the answer. In: ISPASS, pp. 134–144. IEEE (2011)Google Scholar
  28. 28.
    He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a mapreduce framework on graphics processors. In: PACT, pp. 260–269. ACM (2008)Google Scholar
  29. 29.
    He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query co-processing on graphics processors. In: ACM Transactions on Database System, vol. 34. ACM (2009)Google Scholar
  30. 30.
    He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: SIGMOD, pp. 511–524. ACM (2008)Google Scholar
  31. 31.
    He, B., Yu, J.X.: High-throughput transaction executions on graphics processors. PVLDB 4(5), 314–325 (2011)MathSciNetGoogle Scholar
  32. 32.
    He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB 6(10), 889–900 (2013)Google Scholar
  33. 33.
    Heimel, M., Markl, V.: A first step towards GPU-assisted query optimization. In: ADMS. VLDB Endowment (2012)Google Scholar
  34. 34.
    Heimel, M., Saecker, M., Pirk, H., Manegold, S., Markl, V.: Hardware-oblivious parallelism for in-memory column-stores. PVLDB 6(9), 709–720 (2013)Google Scholar
  35. 35.
    Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)Google Scholar
  36. 36.
    Ilić, A., Sousa, L.: CHPS: an environment for collaborative execution on heterogeneous desktop systems. Int. J. Netw. Comput. 1(1), 96–113 (2011)Google Scholar
  37. 37.
    Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: DaMoN, pp. 55–62. ACM (2012)Google Scholar
  38. 38.
    Kemper, A., Neumann, T.: HyPer: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206. IEEE (2011)Google Scholar
  39. 39.
    Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. 32(4), 422–469 (2000)CrossRefGoogle Scholar
  40. 40.
    Manegold, S., Boncz, P., Kersten, M.L.: Generic database cost models for hierarchical memory systems. In: PVLDB, pp. 191–202. VLDB Endowment (2002)Google Scholar
  41. 41.
    Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing database architecture for the new bottleneck: memory access. VLDB J. 9(3), 231–246 (2000)CrossRefzbMATHGoogle Scholar
  42. 42.
    Manegold, S., Kersten, M.L., Boncz, P.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. PVLDB 2(2), 1648–1653 (2009)Google Scholar
  43. 43.
    Mostak, T.: An overview of MapD (massively parallel database). White Paper, Massachusetts Institute of Technology, April 2013.
  44. 44.
    Neumann, T.: Efficiently compiling efficient query plans for modern hardware. PVLDB 4(9), 539–550 (2011)Google Scholar
  45. 45.
    NVIDIA. NVIDIA CUDA C programming guide, pp. 31–36, 40, 213–216, Version 6.0. (2014). Accessed 21 April 2014
  46. 46.
    Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)CrossRefGoogle Scholar
  47. 47.
    Pirk, H.: Efficient cross-device query processing. In: The VLDB PhD Workshop. VLDB Endowment (2012)Google Scholar
  48. 48.
    Pirk, H., Manegold, S., Kersten, M.: Accelerating foreign-key joins using asymmetric memory channels. In: ADMS, pp. 585–597. VLDB Endowment (2011)Google Scholar
  49. 49.
    Pirk, H., Manegold, S., Kersten, M.: Waste not... efficient co-processing of relational data. In: ICDE. IEEE (2014)Google Scholar
  50. 50.
    Przymus, P., Kaczmarski, K.: Dynamic compression strategy for time series database using GPU. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. AISC, vol. 241, pp. 235–244. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  51. 51.
    Przymus, P., Kaczmarski, K., Stencel, K.: A bi-objective optimization framework for heterogeneous CPU/GPU query plans. In: CS&P, pp. 342–354. CEUR-WS (2013)Google Scholar
  52. 52.
    Rabl, T., Poess, M., Jacobsen, H.-A., O’Neil, P., O’Neil, E.: Variations of the star schema benchmark to test the effects of data skew on query performance. In: ICPE, pp. 361–372. ACM (2013)Google Scholar
  53. 53.
    Rauhe, H., Dees, J., Sattler, K.-U., Faerber, F.: Multi-level parallel query execution framework for CPU and GPU. In: Catania, B., Guerrini, G., Pokorný, J. (eds.) ADBIS 2013. LNCS, vol. 8133, pp. 330–343. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  54. 54.
    Răducanu, B., Boncz, P., Zukowski, M.: Micro adaptivity in vectorwise. In: SIGMOD, pp. 1231–1242. ACM (2013)Google Scholar
  55. 55.
    Saecker, M., Markl, V.: Big data analytics on modern hardware architectures: a technology survey. In: Aufaure, M.-A., Zimányi, E. (eds.) eBISS 2012. LNBIP, vol. 138, pp. 125–149. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  56. 56.
    Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley Professional, Upper Saddle River (2010)Google Scholar
  57. 57.
    Schäler, M., Grebhahn, A., Schröter, R., Schulze, S., Köppen, V., Saake, G.: QuEval: beyond high-dimensional indexing à la carte. PVLDB 6(14), 1654–1665 (2013)Google Scholar
  58. 58.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34. ACM (1979)Google Scholar
  59. 59.
    Tsirogiannis, D., Harizopoulos, S., Shah, M.A.: Analyzing the energy efficiency of a database server. In: SIGMOD, pp. 231–242. ACM (2010)Google Scholar
  60. 60.
    Viglas, S.D.: Just-in-time compilation for SQL query processing. PVLDB 6(11), 1190–1191 (2013)Google Scholar
  61. 61.
    Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: MICRO, pp. 107–118. IEEE (2012)Google Scholar
  62. 62.
    Yuan, Y., Lee, R., Zhang, X.: The yin and yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013)Google Scholar
  63. 63.
    Zhang, S., He, J., He, B., OmniDB, M.L.: Towards portable and efficient query processing on parallel CPU/GPU architectures. PVLDB 6(12), 1374–1377 (2013)Google Scholar
  64. 64.
    Zhong, J., He, B.: Medusa: simplified graph processing on gpus. IEEE Trans. Parallel Distrib. Syst. 99, 1–14 (2013)Google Scholar
  65. 65.
    Zhong, J., He, B.: Parallel graph processing on graphics processors made easy. PVLDB 6(12), 1270–1273 (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Sebastian Breß
    • 1
    Email author
  • Max Heimel
    • 2
  • Norbert Siegmund
    • 3
  • Ladjel Bellatreche
    • 4
  • Gunter Saake
    • 1
  1. 1.University of MagdeburgMagdeburgGermany
  2. 2.Technische Universität BerlinBerlinGermany
  3. 3.University of PassauPassauGermany
  4. 4.LIAS/ISAE-ENSMAPoitiersFrance

Personalised recommendations