Design and Implementation of an Extended Collectives Library for Unified Parallel C

Teijeiro, Carlos; Taboada, Guillermo L.; Touriño, Juan; Doallo, Ramón; Mouriño, José C.; Mallón, Damián A.; Wibecan, Brian

doi:10.1007/s11390-013-1313-9

Design and Implementation of an Extended Collectives Library for Unified Parallel C

Regular Paper
Published: 01 February 2013

Volume 28, pages 72–89, (2013)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Carlos Teijeiro¹,
Guillermo L. Taboada¹,
Juan Touriño¹,
Ramón Doallo¹,
José C. Mouriño²,
Damián A. Mallón³ &
…
Brian Wibecan⁴

108 Accesses
2 Citations
Explore all metrics

Abstract

Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X

Integrating Asynchronous Task Parallelism with OpenSHMEM

Scalable PGAS collective operations in NUMA clusters

Article 08 May 2014

Damián A. Mallón, Guillermo L. Taboada, … Brian Wibecan

References

El-Ghazawi T, Chauvin S. UPC benchmarking issues. In Proc. the 30th Int. Conference on Parallel Processing, Sept. 2001, pp.365-372.
Taboada G L, Teijeiro C, Touriño J et al. Performance evaluation of unified parallel C collective communications. In Proc. the 11th IEEE Int. Conf. High Performance Computing and Communications, Jun. 2009, pp.69-78.
Salama R A, Sameh A. Potential performance improvement of collective operations in UPC. Advances in Parallel Computing, 2008, 15: 413-422.
Google Scholar
Cantonnet F, Yao Y, Zahran M M et al. Productivity analysis of the UPC language. In Proc. the 18th Int. Parallel and Distributed Processing Symposium, Apr. 2004, pp.254.
Nishtala R, Almási G, Caşcaval C. Performance without pain = productivity: Data layout and collective communication in UPC. In Proc. the 13thACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Feb. 2008, pp.99-110.
Nishtala R, Zheng Y, Hargrove P, Yelick K. Tuning collective communication for Partitioned Global Address Space programming models. Parallel Computing, 2011, 37(9): 576-591.
Article Google Scholar
Bruck J, Ho C T, Kipnis S, Upfal E, Weathersby D. Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 1997, 8(11): 1143-1156.
Article Google Scholar
Dinan J, Balaji P, Lusk E L et al. Hybrid parallel programming with MPI and unified parallel C. In Proc. the 7th Int. Conf. Computing Frontiers, May 2010, pp.177-186.
El-Ghazawi T, Cantonnet F, Yao Y, Annareddy S, Mohamed A S. Benchmarking parallel compilers: A UPC case study. Future Generation Computer Systems, 2006, 22(7): 764-775.
Article Google Scholar
Mallón D A, Taboada G L, Teijeiro C, Touriño J, Fraguela B B, Gómez A, Doallo R, Mouriño J C. Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In Proc. the 16th European PVM/MPI Users' Group Meeting, Sept. 2009, pp.174-184.
Zhang Z, Seidel S. Benchmark measurements of current UPC platforms. In Proc. the 19th Int. Parallel and Distributed Processing Symposium, Apr. 2005.
Dean J, Ghemawat S. MapReduce: A flexible data processing tool. Communications of the ACM, 2010, 53(1): 72-77.
Article Google Scholar
Teijeiro C, Taboada G L, Touriño J, Doallo R. Design and implementation of MapReduce using the PGAS programming model with UPC. In Proc. the 17th International Conference on Parallel and Distributed Systems, Dec. 2011, pp.196-203.

Download references

Author information

Authors and Affiliations

Computer Architecture Group, University of A Coruña, A Coruña, 15071, Spain
Carlos Teijeiro (Student Member), Guillermo L. Taboada, Juan Touriño (Senior Member, IEEE, Member, ACM) & Ramón Doallo (Member, IEEE)
Galicia Supercomputing Center, Santiago de Compostela, 15705, Spain
José C. Mouriño
Jülich Supercomputing Centre, Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich, D-52425, Germany
Damián A. Mallón
Industry Standard Servers Group, Hewlett-Packard Company, Montgomery, Alabama, 36117, U.S.A.
Brian Wibecan

Authors

Carlos Teijeiro
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo L. Taboada
View author publications
You can also search for this author in PubMed Google Scholar
Juan Touriño
View author publications
You can also search for this author in PubMed Google Scholar
Ramón Doallo
View author publications
You can also search for this author in PubMed Google Scholar
José C. Mouriño
View author publications
You can also search for this author in PubMed Google Scholar
Damián A. Mallón
View author publications
You can also search for this author in PubMed Google Scholar
Brian Wibecan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Teijeiro.

Additional information

This work was funded by Hewlett-Packard (Project \Improving UPC Usability and Performance in Constellation Systems: Implementation/Extensions of UPC Libraries"), and partially supported by the Ministry of Science and Innovation of Spain under Project No. TIN2010-16735 and the Galician Government (Consolidation of Competitive Research Groups, Xunta de Galicia ref. 2010/6).

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 27.5 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teijeiro, C., Taboada, G.L., Touriño, J. et al. Design and Implementation of an Extended Collectives Library for Unified Parallel C. J. Comput. Sci. Technol. 28, 72–89 (2013). https://doi.org/10.1007/s11390-013-1313-9

Download citation

Received: 08 February 2012
Revised: 21 September 2012
Published: 01 February 2013
Issue Date: January 2013
DOI: https://doi.org/10.1007/s11390-013-1313-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Design and Implementation of an Extended Collectives Library for Unified Parallel C

Abstract

Access this article

Similar content being viewed by others

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X

Integrating Asynchronous Task Parallelism with OpenSHMEM

Scalable PGAS collective operations in NUMA clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(DOC 27.5 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Design and Implementation of an Extended Collectives Library for Unified Parallel C

Abstract

Access this article

Similar content being viewed by others

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X

Integrating Asynchronous Task Parallelism with OpenSHMEM

Scalable PGAS collective operations in NUMA clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

(DOC 27.5 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation