Skip to main content

RBPCCM: Relax Blocking Parallel Collective Communication Mechanism Base on Hardware with Scalability

  • Conference paper
  • First Online:
  • 466 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 600))

Abstract

With the development of parallel computation, the scale of high performance computing system increases dramatically and the collective communication has become its bottleneck. The collective communication with the hardware support has the relatively high performance. However, scalability of collective communication is always a crucial problem, because the number of nodes involved is not fixed. This paper proposes the Relax Blocking Parallel Collective Communication Mechanism (RBPCCM) to improve the performance of the collective communication in parallel computation. This mechanism, cooperating hardware and software, implements the scalable collective communication by distributing collective resource allocation numbers. Furthermore, RBPCCM supports the implementation in various scales of endpoint, unconstrained by the interconnect network topology. A functional simulation model is built based on the system of Sunway Taihu Light to verify the correctness and scalability of this proposed method. The implementation of RBPCCM prototype is built based on the network interface, and a FPGA platform is constructed for performance test. It is testified that RBPCCM has the improvement as regards to delay performance from 2.4 to 37 times, compared with the Point-to-Point communication based on software.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lucas, R., Ang, J., Bergman, K., et al.: DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) report: top ten exascale research challenges (2014)

    Google Scholar 

  2. Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance. In: Achieving Optimal Performance on the 8192 Processors of ASCI Q, Proceedings of SC2003, pp. 1–17. ACM, New York (2003)

    Google Scholar 

  3. Rabenseifner, R.: Automatic MPI counter profiling of all users: first result on a CRAY T3E 900-512. In: Proceedings of the Message Passing Interface Developer’s and User’s Conference (MPIDC), pp. 77–85. HLRS, Atlanta, USA (1999)

    Google Scholar 

  4. Moody, A., Fernandez, J., Petrini, F., et al.: Scalable NIC-based reduction on large-scale clusters. In: ACM/IEEE Conference on Supercomputing, p. 59. ACM (2003)

    Google Scholar 

  5. Culler, D., Richard, K.Y., Patterson, D., Eicken, T. et al.: LogP: towards a realistic model of parallel computation. 28(7), 1–12 (1993)

    Google Scholar 

  6. Gabrielyan, E., Hersch, R.D.: Network topology aware scheduling of collective communications. In: International Conference on Telecommunications, vol. 2, pp. 1051–1058. IEEE (2003)

    Google Scholar 

  7. Sanders, P., Sibeyn, J.F.: A bandwidth latency tradeoff for broadcast and reduction. In: Bode, A., Ludwig, T., Karl, W., Wismüller, R. (eds.) Euro-Par 2000. LNCS, vol. 1900, pp. 918–926. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44520-X_128

    Chapter  Google Scholar 

  8. Hoefler, T., Squyres, J.M., Rehm, W., Lumsdaine, A.: A case for non-blocking collective operations. In: Min, G., Di Martino, B., Yang, L.T., Guo, M., Rünger, G. (eds.) ISPA 2006. LNCS, vol. 4331, pp. 155–164. Springer, Heidelberg (2006). https://doi.org/10.1007/11942634_17

    Chapter  Google Scholar 

  9. Petrini, F., Coll, S., Frachtenberg, E., et al.: Hardware- and software-based collective communication on the quadrics network. In: IEEE International Symposium on Network Computing and Applications, pp. 24–35. IEEE (2001)

    Google Scholar 

  10. Giampapa, M.E., Giampapa, M.E., Giampapa, M.E., et al.: The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer. International Conference on Supercomputing, pp. 94–103. ACM (2008)

    Google Scholar 

  11. Faraj, A., Kumar, S., Smith, B., et al.: MPI collective communications on the Blue Gene/P supercomputer: algorithms and optimizations. In: International Conference on Supercomputing, pp. 489–490. ACM (2009)

    Google Scholar 

  12. Haring, R., Ohmacht, M., Fox, T., et al.: The IBM Blue Gene/Q compute chip. IEEE Micro 32(2), 48–60 (2011)

    Article  Google Scholar 

  13. Arimilli, B., Arimilli, R., Chung, V., et al.: The PERCS high-performance interconnect, pp. 75–82. IEEE (2010)

    Google Scholar 

  14. Mai, L., Rupprecht, L., Alim, A., et al.: NetAgg: using middleboxes for application-specific on-path aggregation in data centres, vol. 23(6), pp. 249–262 (2014)

    Google Scholar 

  15. Wagner, A., Jin, H.W., Panda, D.K., et al.: NIC-based offload of dynamic user-defined modules for Myrinet clusters. IEEE International Conference on CLUSTER Computing, pp. 205–214. IEEE Computer Society (2004)

    Google Scholar 

  16. Yu, W., Buntinas, D., Graham, R.L., et al.: Efficient and scalable barrier over quadrics and Myrinet with a new NIC-based collective message passing protocol, p. 182 (2004)

    Google Scholar 

  17. Zahavi, E., Zahavi, E., Zahavi, E., et al.: Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction. In: The Workshop on Optimization of Communication in HPC, pp. 1–10. IEEE Press (2016)

    Google Scholar 

  18. Arap, O., Swany, M.: Offloading collective operations to programmable logic on a Zynq cluster. In: High-Performance Interconnects, pp. 76–83. IEEE (2016)

    Google Scholar 

  19. Lu, Y., Shen, Z., Zhou, E., Zhu, M.: MCRM system: CIM-. In: Chen, G., Pan, Y., Guo, M., Lu, J. (eds.) ISPA 2005. LNCS, vol. 3759, pp. 549–558. Springer, Heidelberg (2005). https://doi.org/10.1007/11576259_60

    Chapter  Google Scholar 

Download references

Acknowledgements

This research is supported by National Science and Technology Major Project with No. 2013ZX0102-8001-001-001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang-hui Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ren, Xj., Zhou, Z., Peng, Q., Xie, Xh. (2018). RBPCCM: Relax Blocking Parallel Collective Communication Mechanism Base on Hardware with Scalability. In: Xu, W., Xiao, L., Li, J., Zhang, C., Zhu, Z. (eds) Computer Engineering and Technology. NCCET 2017. Communications in Computer and Information Science, vol 600. Springer, Singapore. https://doi.org/10.1007/978-981-10-7844-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7844-6_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7843-9

  • Online ISBN: 978-981-10-7844-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics