Designing High-Performance In-Memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM

Chu, Ching-Hsiang; Potluri, Sreeram; Goswami, Anshuman; Gorentla Venkata, Manjunath; Imam, Neena; Newburn, Chris J.

doi:10.1007/978-3-030-04918-8_10

Ching-Hsiang Chu¹⁸,
Sreeram Potluri¹⁹,
Anshuman Goswami¹⁹,
Manjunath Gorentla Venkata²⁰,
Neena Imam²⁰ &
…
Chris J. Newburn¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11283))

Included in the following conference series:

Workshop on OpenSHMEM and Related Technologies

427 Accesses

Abstract

Graphics Processing Units (GPUs) are popular for their massive parallelism and high bandwidth memory and are being increasingly used in data-intensive applications. In this context, GPU-based In-Memory Key-Value (G-IMKV) Stores have been proposed to take advantage of GPUs’ capability to achieve high-throughput indexing operations. The state-of-the-art implementations batch requests on the CPU at the server before launching a compute kernel to process operations on the GPU. They also require explicit data movement operations between the CPU and GPU. However, the startup overhead of compute kernel launches and memory copies limit the throughput of these frameworks unless operations are batched into large groups.

In this paper, we propose the use of persistent GPU compute kernels and of OpenSHMEM to maximize GPU and network utilization with smaller batch sizes. This also helps improve the response time observed by clients while still achieving high throughput at the server. Specifically, clients and servers use OpenSHMEM primitives to move data between CPU and GPU by avoiding copies, and the server interacts with a persistently running compute kernel on the GPU to delegate various key-value store operations efficiently to streaming multi-processors. The experimental results show up to 4.8x speedup compared to the existing G-IMKV framework for a small batch of 1000 keys.

This work was sponsored by the U.S. Department of Energy’s Office of Advanced Scientific Computing Research. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research was supported by the United States Department of Defense (DoD) and Computational Research and Development Programs at Oak Ridge National Laboratory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology. https://github.com/NVIDIA/gdrcopy. Accessed 9 Sept 2018
Mega-KV: A GPU-Based In-Memory Key-Value Store. http://kay21s.github.io/megakv/. Accessed 9 Sept 2018
MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE. http://mvapich.cse.ohio-state.edu/. Accessed 9 Sept 2018
NVIDIA CUDA. http://docs.nvidia.com/cuda. Accessed 9 Sept 2018
NVIDIA GPUDirect. https://developer.nvidia.com/gpudirect. Accessed 9 Sept 2018
OpenMPI: Open Source High Performance Computing. http://www.open-mpi.org/. Accessed 9 Sept 2018
OpenSHMEM.org. http://www.openshmem.org/site/. Accessed 9 Sept 2018
Redis. https://redis.io/. Accessed 9 Sept 2018
Top 500 Supercomputer sites. http://www.top500.org/. Accessed 9 Sept 2018
Blott, M., Karras, K., Liu, L., Vissers, K., Bär, J., István, Z.: Achieving 10Gbps line-rate key-value stores with FPGAs. In: Presented as Part of the 5th USENIX Workshop on Hot Topics in Cloud Computing, San Jose, CA. USENIX (2013)
Google Scholar
Chu, C.H., Hamidouche, K., Venkatesh, A., Awan, A.A., Panda, D.K.: CUDA kernel based collective reduction operations on large-scale GPU clusters. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 726–735, May 2016
Google Scholar
Dragojević, A., Narayanan, D., Castro, M., Hodson, O.: FaRM: fast remote memory. In: 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), Seattle, WA, pp. 401–414. USENIX Association (2014)
Google Scholar
Fitzpatrick, B.: Distributed caching with memcached. Linux J. 2004(124), 5 (2004)
Google Scholar
Fu, H., Venkata, M.G., Choudhury, A.R., Imam, N., Yu, W.: High-performance key-value store on OpenSHMEM. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 559–568, May 2017
Google Scholar
Fu, H., SinghaRoy, K., Venkata, M.G., Zhu, Y., Yu, W.: SHMemCache: enabling memcached on the OpenSHMEM global address model. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM 2016. LNCS, vol. 10007, pp. 131–145. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50995-2_9
Chapter Google Scholar
Hamidouche, K., Venkatesh, A., Awan, A.A., Subramoni, H., Chu, C.H., Panda, D.K.: Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters. In: 2015 IEEE International Conference on Cluster Computing, pp. 78–87, September 2015
Google Scholar
Hetherington, T.H., Rogers, T.G., Hsu, L., O’Connor, M., Aamodt, T.M.: Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In: 2012 IEEE International Symposium on Performance Analysis of Systems Software, pp. 88–98, April 2012
Google Scholar
Hetherington, T.H., O’Connor, M., Aamodt, T.M.: MemcachedGPU: scaling-up scale-out key-value stores. In: Proceedings of the Sixth ACM Symposium on Cloud Computing. SoCC 2015, pp. 43–57. ACM, New York (2015)
Google Scholar
Jin, X., et al.: NetCache: balancing key-value stores with fast in-network caching. In: Proceedings of the 26th Symposium on Operating Systems Principles, SOSP 2017, pp. 121–136. ACM, New York (2017)
Google Scholar
Kim, J., Lee, S., Vetter, J.S.: PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017, pp. 57:1–57:14. ACM, New York (2017)
Google Scholar
Li, B., et al.: KV-Direct: high-performance in-memory key-value store with programmable NIC. In: Proceedings of the 26th Symposium on Operating Systems Principles, SOSP 2017, pp. 137–152. ACM, New York (2017)
Google Scholar
Li, S., et al.: Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA 2015, pp. 476–488, ACM, New York (2015)
Google Scholar
Lim, H., Han, D., Andersen, D.G., Kaminsky, M.: MICA: a holistic approach to fast in-memory key-value storage. In: 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), Seattle, WA, pp. 429–444. USENIX Association (2014)
Google Scholar
Lu, X., Shankar, D., Panda, D.K.: Scalable and distributed key-value store-based data management using RDMA-memcached. IEEE Data Eng. Bull. 40, 50–61 (2017)
Google Scholar
Mitchell, C., Geng, Y., Li, J.: Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In: Presented as Part of the 2013 USENIX Annual Technical Conference (USENIX ATC 13), San Jose, CA, pp. 103–114. USENIX (2013)
Google Scholar
Namashivayam, N., et al.: Symmetric memory partitions in OpenSHMEM: a case study with Intel KNL. In: Gorentla Venkata, M., Imam, N., Pophale, S. (eds.) OpenSHMEM 2017. LNCS, vol. 10679, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73814-7_1
Chapter Google Scholar
Potluri, S., Bureddy, D., Wang, H., Subramoni, H., Panda, D.K.: Extending OpenSHMEM for GPU computing. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 1001–1012, May 2013
Google Scholar
Potluri, S., Goswami, A., Rossetti, D., Newburn, C.J., Venkata, M.G., Imam, N.: GPU-Centric communication on NVIDIA GPU clusters with InfiniBand: a case study with OpenSHMEM. In: 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pp. 253–262, December 2017
Google Scholar
Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.: Efficient Inter-node MPI communication using GPUDirect RDMA for infiniband clusters with NVIDIA GPUs. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp. 80–89, October 2013
Google Scholar
Potluri, S., Goswami, A., Venkata, M.G., Imam, N.: Efficient breadth first search on multi-GPU systems using GPU-centric OpenSHMEM. In: Gorentla Venkata, M., Imam, N., Pophale, S. (eds.) OpenSHMEM 2017. LNCS, vol. 10679, pp. 82–96. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73814-7_6
Chapter Google Scholar
Wang, H., Potluri, S., Bureddy, D., Rosales, C., Panda, D.K.: GPU-aware MPI on RDMA-enabled clusters: design, implementation and evaluation. IEEE Trans. Parallel Distrib. Syst. 25(10), 2595–2605 (2014). Oct
Article Google Scholar
Wei, X., Shi, J., Chen, Y., Chen, R., Chen, H.: Fast in-memory transaction processing using RDMA and HTM. ACM Trans. Comput. Syst. 35, 3:1–3:37 (2015)
Google Scholar
Zhang, K., Wang, K., Yuan, Y., Guo, L., Lee, R., Zhang, X.: Mega-KV: a case for GPUs to maximize the throughput of in-memory key-value stores. Proc. VLDB Endow. 8(11), 1226–1237 (2015). Jul
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43202, USA
Ching-Hsiang Chu
NVIDIA Corporation, Santa Clara, CA, 95051, USA
Sreeram Potluri, Anshuman Goswami & Chris J. Newburn
Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
Manjunath Gorentla Venkata & Neena Imam

Authors

Ching-Hsiang Chu
View author publications
You can also search for this author in PubMed Google Scholar
Sreeram Potluri
View author publications
You can also search for this author in PubMed Google Scholar
Anshuman Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Manjunath Gorentla Venkata
View author publications
You can also search for this author in PubMed Google Scholar
Neena Imam
View author publications
You can also search for this author in PubMed Google Scholar
Chris J. Newburn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ching-Hsiang Chu .

Editor information

Editors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, USA
Swaroop Pophale
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Neena Imam
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Ferrol Aderholdt
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Manjunath Gorentla Venkata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chu, CH., Potluri, S., Goswami, A., Gorentla Venkata, M., Imam, N., Newburn, C.J. (2019). Designing High-Performance In-Memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM. In: Pophale, S., Imam, N., Aderholdt, F., Gorentla Venkata, M. (eds) OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity. OpenSHMEM 2018. Lecture Notes in Computer Science(), vol 11283. Springer, Cham. https://doi.org/10.1007/978-3-030-04918-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-04918-8_10
Published: 19 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04917-1
Online ISBN: 978-3-030-04918-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics