HiPower: A High-Performance RDMA Acceleration Solution for Distributed Transaction Processing

  • Runhua Zhang
  • Yang Cheng
  • Jinkun Geng
  • Shuai Wang
  • Kaihui Gao
  • Guowei ShenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11783)


The increasing complex tasks and growing size of data have necessitated the application of distributed transaction processing (DTP), which decouples tasks and data among multiple nodes for jointly processing. However, compared with the revolutionary development of computation power, the network capability falls relatively behind, leaving communication as a more distinct bottleneck. This paper focuses on the recent emerging RDMA technology, which can greatly improve communication performance but cannot be well exploited in many cases due to improper interactive design between the requester and responder. Our research finds that the typical implementation of confirming per work request (CPWR) triggers considerable CPU involvement, which further degrades the overall performance of RDMA communication. Targeting at this, we propose HiPower, which leverages a batched confirmation scheme with lower CPU utilization, to improve high-frequency communication efficiency. Our experiments show that, compared with CPWR, HiPower can improve the communication efficiency by up to 75% and reduce CPU cost by up to 79%, which speeds up the overall FCT (Flow Completion Time) by up to 14% on real workflow (Resnet-152).


RDMA Distributed transaction processing Batched confirmation One-by-one confirmation 



This work is supported by the National Natural Science Foundation of China (No. 61802081), the Guizhou Provincial Natural Science Foundation (No. 20161052, No. 20183001).


  1. 1.
    qperf - measure RDMA and IP performance. Technical report, Johann George (2009).
  2. 2.
    How to compile, use and configure rdma-enabled tensorflow. Technical report, HKUST and Tensorflow community (2018).
  3. 3.
    Chen, H., et al.: Fast in-memory transaction processing using RDMA and HTM. In: TOCS 2017 (2017) CrossRefGoogle Scholar
  4. 4.
    Dragojevic, A., Narayanan, D., Castro, M.: RDMA reads: to use or not to use? IEEE Data Eng. Bull. (2017)Google Scholar
  5. 5.
    Dragojević, A., Narayanan, D., Hodson, O., et al.: FaRM: fast remote memory. In: NSDI 2014 (2014)Google Scholar
  6. 6.
    Frey, P.W., Alonso, G.: Minimizing the hidden cost of RDMA. In: 2009 29th IEEE International Conference on Distributed Computing Systems (2009)Google Scholar
  7. 7.
    Geng, J.: CODE: incorporating correlation and dependency for task scheduling in data center. In: ISPA 2017 (2017)Google Scholar
  8. 8.
    Guo, C., et al.: RDMA over commodity ethernet at scale. In: SIGCOMM 2016 (2016)Google Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016 (2016)Google Scholar
  10. 10.
    Kalia, A., Kaminsky, M., Andersen, D.G.: FaSST: fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In: OSDI 2016 (2016)Google Scholar
  11. 11.
    Kalia, A., Kaminsky, M., Andersen, D.G.: Using RDMA efficiently for key-value services. In: SIGCOMM 2015 (2015)Google Scholar
  12. 12.
    Kaminsky, A.K.M., Andersen, D.G.: Design guidelines for high performance RDMA systems. In: ATC 2016 (2016)Google Scholar
  13. 13.
    Kim, D., et al.: HyperLoop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems. In: SIGCOMM 2018 (2018)Google Scholar
  14. 14.
    Li, M., Andersen, D.G., Smola, A.J., Yu, K.: Communication efficient distributed machine learning with the parameter server. In: Advances in Neural Information Processing Systems, pp. 19–27 (2014)Google Scholar
  15. 15.
    Lu, X., Rahman, M.W.U., Islam, N., Shankar, D., Panda, D.K.: Accelerating spark with RDMA for big data processing: early experiences. In: Hot Interconnects 2014 (2014)Google Scholar
  16. 16.
    Luo, L., Nelson, J., Ceze, L., Phanishayee, A., Krishnamurthy, A.: Parameter hub: a rack-scale parameter server for distributed deep neural network training. In: SOCC 2018 (2018) Google Scholar
  17. 17.
    Mitchell, C., Geng, Y., Li, J.: Using one-sided \(\{\)RDMA\(\}\) reads to build a fast, CPU-efficient key-value store. In: ATC 2013 (2013)Google Scholar
  18. 18.
    Wei, J., et al.: Managed communication and consistency for fast data-parallel iterative analytics. In: Proceedings of the Sixth ACM Symposium on Cloud Computing Google Scholar
  19. 19.
    Wei, X., Dong, Z., Chen, R., Chen, H.: Deconstructing RDMA-enabled distributed transactions: hybrid is better! In: OSDI 2018 (2018)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  • Runhua Zhang
    • 1
    • 2
    • 3
  • Yang Cheng
    • 2
  • Jinkun Geng
    • 2
  • Shuai Wang
    • 2
  • Kaihui Gao
    • 2
  • Guowei Shen
    • 1
    • 3
    Email author
  1. 1.Department of Computer Science and TechnologyGuizhou UniversityGuiyangChina
  2. 2.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  3. 3.CETC Big Data Research Institute Co. Ltd.ChengduChina

Personalised recommendations