Abstract
Partitioned Global Address Space (PGAS) models, typified by languages such as Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key building block for High Performance Computing (HPC) applications. Architectural trends in supercomputing make such programming models increasingly attractive, and newer, more sophisticated models such as UPC++, Legion and Chapel that rely upon similar communication paradigms are gaining popularity. GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in future exascale machines. The library is an evolution of the popular GASNet communication system, building upon over 15 years of lessons learned. We describe and evaluate several features and enhancements that have been introduced to address the needs of modern client systems. Microbenchmark results demonstrate the RMA performance of GASNet-EX is competitive with several MPI-3 implementations on current HPC systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
On Summitdev we set one environment variable to restrict the MPI implementation to a single rail of the dual-rail network, to provide a meaningful comparison to GASNet-EX. We recommend this configuration because use of a single rail per process can yield significant latency improvements.
References
Alverson, B., Froese, E., Kaplan, L., Roweth, D.: Cray XC series network. White Paper WP-Aries01-1112, Cray Inc., November 2012. https://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf
Argonne National Laboratory: Joint Laboratory for System Evaluation. http://www.jlse.anl.gov
Bachan, J., Baden, S.B., Bonachea, D., Hargrove, P.H., Hofmeyr, S., Jacquelin, M., Kamil, A., van Straalen, B.: UPC++ specification, v1.0 draft 8. Technical report LBNL-2001179, Lawrence Berkeley National Laboratory, September 2018. https://doi.org/10.25344/S45P4X
Bachan, J., Bonachea, D., Hargrove, P.H., Hofmeyr, S., Jacquelin, M., Kamil, A., van Straalen, B., Baden, S.B.: The UPC++ PGAS library for exascale computing. In: Proceedings of the Second Annual PGAS Applications Workshop, PAW17, pp. 7:1–7:4. ACM, New York (2017). https://doi.org/10.1145/3144779.3169108
Barrett, B.W., Brightwell, R., Hemmert, S., Pedretti, K., Wheeler, K., Underwood, K., Riesen, R., Maccabe, A.B., Hudson, T.: The Portals 4.0 network programming interface. Technical report SAND2012-10087, Sandia National Laboratories, November 2012. https://doi.org/10.2172/1088065
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012 (2012). https://doi.org/10.1109/SC.2012.71
Bell, C., Bonachea, D.: A new DMA registration strategy for pinning-based high performance networks. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (2003). https://doi.org/10.1109/IPDPS.2003.1213363
Bell, C., Chen, W., Bonachea, D., Yelick, K.: Evaluating support for global address space languages on the Cray X1. In: 19th Annual International Conference on Supercomputing (ICS), June 2004. https://doi.org/10.1145/1006209.1006236
Birrittella, M.S., Debbage, M., Huggahalli, R., Kunz, J., Lovett, T., Rimmer, T., Underwood, K.D., Zak, R.C.: Intel Omni-Path Architecture: enabling scalable, high performance fabrics. In: IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 1–9, August 2015. https://doi.org/10.1109/HOTI.2015.22
Bocchino, R.L., Adve, V.S., Chamberlain, B.L.: Software transactional memory for large scale clusters. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), pp. 247–258. ACM, New York (2008). https://doi.org/10.1145/1345206.1345242
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.K.: Myrinet: a gigabit-per-second local area network. IEEE Micro 15(1), 29–36 (1995). https://doi.org/10.1109/40.342015
Bonachea, D.: Proposal for extending the UPC memory copy library functions and supporting extensions to GASNet, v2.0. Technical report LBNL-56495-v2.0, Lawrence Berkeley National Laboratory, March 2007. https://doi.org/10.2172/920052
Bonachea, D.: AMMPI home page. http://gasnet.lbl.gov/ammpi
Bonachea, D.: GASNet specification, v1.1. Technical report, UCB/CSD-02-1207, University of California, Berkeley, October 2002. https://doi.org/10.25344/S4MW28
Bonachea, D., Duell, J.: Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. Int. J. High Perform. Comput. Netw. 1(1–3), 91–99 (2004). https://doi.org/10.1504/IJHPCN.2004.007569
Bonachea, D., Hargrove, P., Welcome, M., Yelick, K.: Porting GASNet to Portals: Partitioned Global Address Space (PGAS) language support for the Cray XT. In: Cray Users Group (2009). https://doi.org/10.25344/S4RP46
Bonachea, D., Hargrove, P.H.: GASNet specification, v1.8.1. Technical report, LBNL-2001064, Lawrence Berkeley National Laboratory, August 2017. https://doi.org/10.2172/1398512
Buntinas, D., Mercier, G., Gropp, W.: Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem. In: Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2006), vol. 1, pp. 521–530, May 2006. https://doi.org/10.1109/CCGRID.2006.31
Callahan, D., Chamberlain, B.L., Zima, H.P.: The Cascade High Productivity Language. In: International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), pp. 52–60 (2004). https://doi.org/10.1109/HIPS.2004.10002
Chan, C., Wang, B., Bachan, J., Macfarlane, J.: Mobiliti: scalable transportation simulation using high-performance parallel computing. In: IEEE International Conference on Intelligent Transportation Systems (ITSC), pp. 634–641 (2018). https://doi.org/10.1109/ITSC.2018.8569397
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2005) (2005). https://doi.org/10.1145/1103845.1094852
Chen, W., Bonachea, D., Duell, J., Husband, P., Iancu, C., Yelick, K.: A performance analysis of the Berkeley UPC compiler. In: Proceedings of the 17th International Conference on Supercomputing (ICS), June 2003. https://doi.org/10.1145/782814.782825
Cray Inc.: Cray XC Series. https://www.cray.com/sites/default/files/Cray-XC-Series-Brochure.pdf. Accessed 17 July 2018
Daily, J., Vishnu, A., Palmer, B., van Dam, H., Kerbyson, D.: On the suitability of MPI as a PGAS runtime. In: 21st International Conference on High Performance Computing (HiPC), December 2014. https://doi.org/10.1109/HiPC.2014.7116712
Doerfler, D., Austin, B., Cook, B., Deslippe, J., Kandalla, K., Mendygral, P.: Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC. Concurr. Comput. Pract. Exp. 30(1), e4297 (2017). https://doi.org/10.1002/cpe.4297
Dotsenko, Y., Coarfa, C., Mellor-Crummey, J.: A multi-platform Co-array Fortran compiler. In: Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques (PACT) (2004). https://doi.org/10.1109/PACT.2004.1342539
Driscoll, M.: PyGAS. http://mbdriscoll.github.io/pygas
Dunigan, T.H., Vetter, J.S., Worley, P.H.: Performance evaluation of the SGI Altix 3700. In: International Conference on Parallel Processing (ICPP 2005), pp. 231–240, June 2005. https://doi.org/10.1109/ICPP.2005.61
Eachempati, D., Jun, H.J., Chapman, B.: An open-source compiler and runtime implementation for Coarray Fortran. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Models (PGAS 2010), pp. 13:1–13:8. ACM (2010). https://doi.org/10.1145/2020373.2020386
von Eicken, T., Culler, D.E., Goldstein, S.C., Schauser, K.E.: Active Messages: a mechanism for integrated communication and computation. In: Proceedings of the 19th International Symposium on Computer Architecture, Gold Coast, Australia, pp. 256–266, May 1992. https://doi.org/10.1145/139669.140382
Faanes, G., Bataineh, A., Roweth, D., Court, T., Froese, E., Alverson, B., Johnson, T., Kopnick, J., Higgins, M., Reinhard, J.: Cray Cascade: a scalable HPC system based on a Dragonfly network. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, Los Alamitos, CA, USA, pp. 103:1–103:9. IEEE Computer Society Press (2012). https://doi.org/10.1109/SC.2012.39
Fanfarillo, A., Burnus, T., Cardellini, V., Filippone, S., Nagle, D., Rouson, D.: OpenCoarrays: open-source transport layers supporting Coarray Fortran compilers. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, pp. 4:1–4:11. ACM, New York (2014). https://doi.org/10.1145/2676870.2676876
GASNet. http://gasnet.lbl.gov
Gerstenberger, R., Besta, M., Hoefler, T.: Enabling highly-scalable remote memory access programming with MPI-3 one sided. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC 2013), pp. 53:1–53:12. ACM, New York (2013). https://doi.org/10.1145/2503210.2503286
Grun, P., Hefty, S., Sur, S., Goodell, D., Russell, R.D., Pritchard, H., Squyres, J.M.: A brief introduction to the OpenFabrics interfaces - a new network API for maximizing high performance application efficiency. In: IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 34–39, August 2015. https://doi.org/10.1109/HOTI.2015.19
Hargrove, P.H., Bonachea, D.: GASNet-EX performance improvements due to specialization for the Cray Aries network. Technical report. LBNL-2001134, Lawrence Berkeley National Laboratory, March 2018. https://doi.org/10.2172/1430690
Hargrove, P.H., Bonachea, D., Bell, C.: Experiences implementing Partitioned Global Address Space (PGAS) languages on InfiniBand. In: OpenFabrics Alliance International Workshop, April 2008. http://downloads.openfabrics.org/Media/Sonoma2008/Sonoma_2008_Wed_PGAS%20over%20IB.pdf
Hilfinger, P., Bonachea, D., Datta, K., Gay, D., Graham, S., Kamil, A., Liblit, B., Pike, G., Su, J., Yelick, K.: Titanium language reference manual. Technical report, UCB/EECS-2005-15.1, University of California, Berkeley, November 2001. https://doi.org/10.25344/S4H59R
Hjelm, N.: An evaluation of the one-sided performance in Open MPI. In: Proceedings of the 23rd European MPI Users’ Group Meeting, EuroMPI 2016, pp. 184–187. ACM, New York (2016). https://doi.org/10.1145/2966884.2966890
IBM: LAPI programming guide. IBM Technical report SA22-7936-00 (2003)
Ibrahim, K.Z., Yelick, K.: On the conditions for efficient interoperability with threads: an experience with PGAS languages using Cray communication domains. In: Proceedings of the 28th ACM International Conference on Supercomputing, ICS 2014, pp. 23–32. ACM (2014). https://doi.org/10.1145/2597652.2597657
InfiniBand Trade Association. http://www.infinibandta.org
Intel Corporation: Introducing Intel®MPI Benchmarks. https://software.intel.com/en-us/articles/intel-mpi-benchmarks. Accessed 17 July 2018
Intel Corporation: Performance Scaled Messaging 2 (PSM2) Programmer’s Guide, April 2017. Order No.: H76473–6.0
Intrepid Technology Inc.: Clang UPC Compiler. http://clangupc.github.io
Intrepid Technology Inc.: GCC/UPC Compiler. http://www.gccupc.org
Jose, J., Hamidouche, K., Zhang, J., Venkatesh, A., Panda, D.K.: Optimizing collective communication in UPC. In: IEEE International Parallel Distributed Processing Symposium Workshops, pp. 361–370, May 2014. https://doi.org/10.1109/IPDPSW.2014.49
Krasnov, A., Schultz, A., Wawrzynek, J., Gibeling, G., Droz, P.Y.: RAMP Blue: a message-passing manycore system in FPGAs. In: Proceedings of International Conference on Field Programmable Logic and Applications, pp. 54–61, August 2007. https://doi.org/10.1109/FPL.2007.4380625
Kumar, S., Mamidala, A.R., Faraj, D.A., Smith, B., Blocksome, M., Cernohous, B., Miller, D., Parker, J., Ratterman, J., Heidelberger, P., Chen, D., Steinmacher-Burrow, B.: PAMI: a parallel active message interface for the Blue Gene/Q supercomputer. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 763–773, May 2012. https://doi.org/10.1109/IPDPS.2012.73
Kumar, S., Dozsa, G., Almasi, G., Chen, D., Giampapa, M.E., Heidelberger, P., Blocksome, M., Faraj, A., Parker, J., Ratterman, J., Smith, B., Archer, C.: The Deep Computing Messaging Framework: generalized scalable message passing on the Blue Gene/P supercomputer. In: 22nd Annual International Conference on Supercomputing (ICS), June 2008. https://doi.org/10.1145/1375527.1375544
Matsumiya, R., Endo, T.: Scalable RMA-based communication library featuring node-local NVMs. In: Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC 2018), pp. 1–7 (2018). https://doi.org/10.1109/HPEC.2018.8547546
Mattson, T.G., Cledat, R., Cavé, V., Sarkar, V., Budimlic, Z., Chatterjee, S., Fryman, J., Ganev, I., Knauerhase, R., Lee, M., Meister, B., Nickerson, B., Pepperling, N., Seshasayee, B., Tasirlar, S., Teller, J., Vrvilo, N.: The Open Community Runtime: a runtime system for extreme scale computing. In: IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–7, September 2016. https://doi.org/10.1109/HPEC.2016.7761580
Mellanox Technologies Inc.: MellanoX Messaging Library User Manual, Rev 2.1 (2014). Document Number: 4113
MPI Forum: MPI-2: a message-passing interface standard. Int. J. High Perform. Comput. Appl. 12, 1–299 (1998). https://www.mpi-forum.org/docs/mpi-2.0/mpi-20.ps
MPI Forum: MPI: a message-passing interface standard, v1.1. Technical report, University of Tennessee, Knoxville, 12 June 1995. https://www.mpi-forum.org/docs/mpi-1.1/mpi-11.ps
MPI Forum: MPI: a message-passing interface standard, version 3.0. Technical report, University of Tennessee, Knoxville, 21 September 2012. https://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
Murai, H., Nakao, M., Iwashita, H., Sato, M.: Preliminary performance evaluation of Coarray-based implementation of fiber Miniapp suite using XcalableMP PGAS language. In: Proceedings of the Second Annual PGAS Applications Workshop, PAW17, pp. 1:1–1:7. ACM (2017). https://doi.org/10.1145/3144779.3144780
MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE. http://mvapich.cse.ohio-state.edu
NERSC: Cori Haswell Nodes. https://doi.org/10.25344/S4859K. Accessed 17 July 2018
NERSC: Cori Intel Xeon Phi (KNL) Nodes. https://doi.org/10.25344/S4D012. Accessed 17 July 2018
NERSC: National Energy Research Scientific Computing Center. http://www.nersc.gov
Nieplocha, J., Carpenter, B.: ARMCI: a portable remote memory copy library for distributed array libraries and compiler run-time systems. In: Rolim, J., Mueller, F., Zomaya, A.Y., Ercal, F., Olariu, S., Ravindran, B., Gustafsson, J., Takada, H., Olsson, R., Kale, L.V., Beckman, P., Haines, M., ElGindy, H., Caromel, D., Chaumette, S., Fox, G., Pan, Y., Li, K., Yang, T., Chiola, G., Conte, G., Mancini, L.V., Méry, D., Sanders, B., Bhatt, D., Prasanna, V. (eds.) IPPS 1999. LNCS, vol. 1586, pp. 533–546. Springer, Heidelberg (1999). https://doi.org/10.1007/BFb0097937
Nishtala, R., Hargrove, P.H., Bonachea, D.O., Yelick, K.A.: Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS) (2009). https://doi.org/10.1109/IPDPS.2009.5161076
Nishtala, R., Zheng, Y., Hargrove, P., Yelick, K.A.: Tuning collective communication for Partitioned Global Address Space programming models. Parallel Comput. 37(9), 576–591 (2011). https://doi.org/10.1016/j.parco.2011.05.006
Numrich, R.W., Reid, J.: Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum 17(2), 1–31 (1998). https://doi.org/10.1145/289918.289920
Oak Ridge Leadership Computing Facility. https://www.olcf.ornl.gov
Summitdev. https://www.olcf.ornl.gov/tag/summitdev/. Accessed 17 July 2018
OpenFabrics Libfabric. https://ofiwg.github.io/libfabric/
Petrini, F., chun Feng, W., Hoisie, A., Coll, S., Frachtenberg, E.: The Quadrics network (QsNet): high-performance clustering technology. In: HOT 9 Interconnects. Symposium on High Performance Interconnects, pp. 125–130 (2001). https://doi.org/10.1109/HIS.2001.946704
Pophale, S., Nanjegowda, R., Curtis, T., Chapman, B., Jin, H., Poole, S., Kuehn, J.: OpenSHMEM performance and potential: a NPB experimental study. In: Proceedings of the 6th Conference on Partitioned Global Address Space Programming Models (PGAS 2012) (2012). https://www.osti.gov/biblio/1055092
Shah, V.B.: An interactive system for combinatorial scientific computing with an emphasis on programmer productivity. Ph.D. thesis, University of California at Santa Barbara, Santa Barbara, CA, USA (2007)
Shamis, P., Venkata, M.G., Lopez, M.G., Baker, M.B., Hernandez, O., Itigin, Y., Dubman, M., Shainer, G., Graham, R.L., Liss, L., Shahar, Y., Potluri, S., Rossetti, D., Becker, D., Poole, D., Lamb, C., Kumar, S., Stunkel, C., Bosilca, G., Bouteiller, A.: UCX: an open source framework for HPC network APIs and beyond. In: IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 40–43, August 2015. https://doi.org/10.1109/HOTI.2015.13
Su, H., Gordon, B., Oral, S., George, A.: SCI networking for shared-memory computing in UPC: blueprints of the GASNet SCI conduit. In: Proceedings of the 29th Annual IEEE International Conference on Local Computer Networks. LCN 2004, pp. 718–725. IEEE Computer Society, Washington, DC (2004). https://doi.org/10.1109/LCN.2004.107
UCX: Unified Communication X. http://www.openucx.org/
UPC Consortium: UPC Language and Library Specifications, v1.3. Technical report, LBNL-6623E, Lawrence Berkeley National Laboratory, November 2013. https://doi.org/10.2172/1134233
Vetter, S., Caldeira, A., Kahle, M.E., Saverimuthu, G., Vearner, K.C.: IBM Power System S822LC Technical Overview and Introduction, December 2015. IBM Form #REDP-5283-00
Willenberg, R., Chow, P.: A heterogeneous GASNet implementation for FPGA-accelerated computing. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, pp. 2:1–2:9. ACM, New York (2014). https://doi.org/10.1145/2676870.2676885
Yelick, K., Hilfinger, P., Graham, S., Bonachea, D., Su, J., Kamil, A., Datta, K., Colella, P., Wen, T.: Parallel languages and compilers: perspective from the Titanium experience. Int. J. High Perform. Comput. Appl. 21(3), 266–290 (2007). https://doi.org/10.1177/1094342007078449
Zheng, Y., Kamil, A., Driscoll, M.B., Shan, H., Yelick, K.: UPC++: a PGAS extension for C++. In: IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1105–1114, May 2014. https://doi.org/10.1109/IPDPS.2014.115
Zhou, H., Mhedheb, Y., Idrees, K., Glass, C.W., Gracia, J., Fürlinger, K.: DART-MPI: an MPI-based implementation of a PGAS runtime system. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, pp. 3:1–3:11 (2014). https://doi.org/10.1145/2676870.2676875
Acknowledgments
This research was funded in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply
About this paper
Cite this paper
Bonachea, D., Hargrove, P.H. (2019). GASNet-EX: A High-Performance, Portable Communication Library for Exascale. In: Hall, M., Sundar, H. (eds) Languages and Compilers for Parallel Computing. LCPC 2018. Lecture Notes in Computer Science(), vol 11882. Springer, Cham. https://doi.org/10.1007/978-3-030-34627-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-34627-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34626-3
Online ISBN: 978-3-030-34627-0
eBook Packages: Computer ScienceComputer Science (R0)