Abstract
Alltoallv() is a collective operation which allows all processes to exchange variable amounts of data with all other processes in the communication group. This means that Alltoallv() requires not only \(O(N^2)\) communications, but typically also additional exchanges of the data lengths that will be transmitted in the eventual Alltoallv() call. This pre-exchange is used to calculate the proper offsets for the receiving buffers on the target processes. However, we propose two new candidate interfaces for Alltoallv() that would mitigate the need for the user to set up this extra exchange of information at the possible cost of memory efficiency. We explain the new interface variants and show how a single call can be used in place of the traditional Alltoall()/ Alltoallv() pair. We then discuss the performance tradeoffs for overall communication and memory costs, as well as both software and hardware-based optimizations and their applicability to the various proposed interfaces.
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
If the null character is unsuitable, the flag parameter can typically be chosen to be the maximum value of the corresponding datatype, e.g. LONG_MAX for long int, since there will typically be overflow issues with a data set if its domain includes this value.
References
CPMD. http://cpmd.org/
Calculating the properties of materials from first principles, June 2012. http://www.castep.org/
Programming environments release announcement for cray XC30 systems (2013). http://docs.cray.com/books/S-9408-1306//S-9408-1306.pdf
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63–73 (1991). http://hpc.sagepub.com/content/5/3/63.abstract
Bruck, J., Ho, C.T., Upfal, E., Kipnis, S., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Trans. Parallel Distrib. Syst. 8(11), 1143–1156 (1997). http://dx.doi.org/10.1109/71.642949
Goglin, B., Moreaud, S.: Knem: A generic and scalable kernel-assisted intra-node MPI communication framework. J. Parallel Distrib. Comput. 73(2), 176–188 (2013). http://www.sciencedirect.com/science/article/pii/S0743731512002316
Jackson, A., Booth, S.: Planned AlltoallV a clustered approach. Technical report, EPCC Edinburgh Parallel Computing Centre, July 2004
Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J., Dongarra, J.: Kernel assisted collective intra-node mpi communication among multi-core and many-core cpus. In: 2011 International Conference on Parallel Processing (ICPP), pp. 532–541, September 2011
Pophale, S., Nanjegowda, R., Curtis, A.R., Chapman, B., Jin, H., Poole, S.W., Kuehn, J.A.: OpenSHMEM performance and potential: a NPB experimental study. In: PGAS, January 2012
Xu, C., Venkata, M., Graham, R., Wang, Y., Liu, Z., Yu, W.: Sloavx: Scalable logarithmic alltoallv algorithm for hierarchical multicore systems. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 369–376, May 2013
Yu, W., Panda, D., Buntinas, D.: Scalable, high-performance nic-based all-to-all broadcast over myrinet/gm. In: 2004 IEEE International Conference on Cluster Computing, pp. 125–134, September 2004
Acknowledgements
The work at Oak Ridge National Laboratory (ORNL) is supported by the United States Department of Defense and used the resources of the Extreme Scale Systems Center located at ORNL.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lopez, M.G., Shamis, P., Gorentla Venkata, M. (2015). An Evaluation of OpenSHMEM Interfaces for the Variable-Length Alltoallv() Collective Operation. In: Gorentla Venkata, M., Shamis, P., Imam, N., Lopez, M. (eds) OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies. OpenSHMEM 2014. Lecture Notes in Computer Science(), vol 9397. Springer, Cham. https://doi.org/10.1007/978-3-319-26428-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-26428-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26427-1
Online ISBN: 978-3-319-26428-8
eBook Packages: Computer ScienceComputer Science (R0)