Toward a Standard Interface for User-Defined Scheduling in OpenMP

Kale, Vivek; Iwainsky, Christian; Klemm, Michael; Müller Korndörfer, Jonas H.; Ciorba, Florina M.

doi:10.1007/978-3-030-28596-8_13

Vivek Kale¹²,
Christian Iwainsky¹³,
Michael Klemm¹⁴,
Jonas H. Müller Korndörfer¹⁵ &
…
Florina M. Ciorba¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11718))

Included in the following conference series:

International Workshop on OpenMP

836 Accesses
7 Citations

Abstract

Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain instances. Given the large number of other possible scheduling strategies, standardizing each of them is infeasible. A more viable approach is to extend the OpenMP standard to allow a user to define loop scheduling strategies within her application. The approach will enable standard-compliant application-specific scheduling. This work analyzes the principal components required by user-defined scheduling and proposes two competing interfaces as candidates for the OpenMP standard. We conceptually compare the two proposed interfaces with respect to the three host languages of OpenMP, i.e., C, C++, and Fortran. These interfaces serve the OpenMP community as a basis for discussion and prototype implementation supporting user-defined scheduling in an OpenMP library.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unified Sequential Optimization Directives in OpenMP

Towards Unifying OpenMP Under the Task-Parallel Paradigm

A Proposal for Task-Generating Loops in OpenMP*

Notes

1.
We consider the utility of each of the proposals to application programs in an extended version of this work, accessible at the following link: https://arxiv.org/abs/1906.08911.

References

QuickThread: A New C++ Multicore Library, November 2009. http://www.drdobbs.com/parallel/quickthread-a-new-c-multicore-library/221800155
LLVM’s OpenMP Compiler, April 2019. https://openmp.llvm.org/
An Enhanced OpenMP Library, January 2018. https://github.com/lapesd/libgomp. Accessed 27 Apr 2018
Banicescu, I.: Load balancing and data locality in the parallelization of the fast multipole algorithm. Ph.D. thesis, New York Polytechnic University (1996)
Google Scholar
Banicescu, I., Liu, Z.: Adaptive factoring: a dynamic scheduling method tuned to the rate of weight changes. In: Proceedings of 8th High performance computing Symposium, pp. 122–129. Society for Computer Simulation International (2000)
Google Scholar
Banicescu, I., Velusamy, V., Devaprasad, J.: On the scalability of dynamic scheduling scientific applications with adaptive weighted factoring. Cluster Comput. 6(3), 215–226 (2003). https://doi.org/10.1023/A:1023588520138
Article Google Scholar
Bast, H.: Dynamic scheduling with incomplete information. In: Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 1998, pp. 182–191. ACM, New York (1998)
Google Scholar
Ciorba, F.M., Iwainsky, C., Buder, P.: OpenMP loop scheduling revisited: making a case for more schedules. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 21–36. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_2
Chapter Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1) (1998)
Article Google Scholar
Donfack, S., Grigori, L., Gropp, W.D., Kale, V.: Hybrid static/dynamic scheduling for already optimized dense matrix factorizations. In: 2012 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China (2012)
Google Scholar
Dong, Y., Chen, J., Yang, X., Deng, L., Zhang, X.: Energy-oriented OpenMP parallel loop scheduling. In: 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 162–169, December 2008
Google Scholar
Dongarra, J., Beckman, P., et al.: The international exascale software roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011)
Article Google Scholar
Flynn Hummel, S., Banicescu, I., Wang, C.T., Wein, J.: Load balancing and data locality via fractiling: an experimental study. In: Szymanski, B.K., Sinharoy, B. (eds.) Languages, Compilers and Run-Time Systems for Scalable Computers, pp. 85–98. Springer, Boston (1996). https://doi.org/10.1007/978-1-4615-2315-4_7
Chapter Google Scholar
Flynn Hummel, S., Schmidt, J., Uma, R.N., Wein, J.: Load-sharing in Heterogeneous Systems via Weighted Factoring. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 1996, pp. 318–328. ACM, New York (1996)
Google Scholar
Flynn Hummel, S., Schonberg, E., Flynn, L.E.: Factoring: a method for scheduling parallel loops. Commun. ACM 35(8), 90–101 (1992)
Article Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)
MATH Google Scholar
Govindaswamy, K.: An API for adaptive loop scheduling in shared address space architectures. Master’s thesis, Mississippi State University (2003)
Google Scholar
Kale, V., Donfack, S., Grigori, L., Gropp, W.D.: Lightweight scheduling for balancing the tradeoff between load balance and locality. Poster at International Conference on High Performance Computing, Networking, Storage and Analysis (2014)
Google Scholar
Kale, V., Gamblin, T., Hoefler, T., de Supinski, B.R., Gropp, W.D.: Abstract: slack-conscious lightweight loop scheduling for improving scalability of bulk-synchronous MPI applications. In: High Performance Computing, Networking Storage and Analysis, SC Companion, p. 1392, November 2012
Google Scholar
Kale, V., Gropp, W.: Load balancing for regular meshes on SMPs with MPI. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 229–238. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15646-5_24
Chapter Google Scholar
Kale, V., Gropp, W.D.: Composing low-overhead scheduling strategies for improving performance of scientific applications. In: Terboven, C., de Supinski, B.R., Reble, P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2015. LNCS, vol. 9342, pp. 18–29. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24595-9_2
Chapter Google Scholar
Kasielke, F., Tschüter, R., Iwainsky, C., Velten, M., Ciorba, F.M., Banicescu, I.: Exploring loop scheduling enhancements in OpenMP: an LLVM case study. In: Proceedings of the 18th International Symposium on Parallel and Distributed Computing (ISPDC 2019), Amsterdam, June 2019
Google Scholar
Krueger, P., Shivaratri, N.G.: Adaptive location policies for global scheduling. IEEE Trans. Softw. Eng. 20(6), 432–444 (1994)
Article Google Scholar
Kruskal, C.P., Weiss, A.: Allocating independent subtasks on parallel processors. IEEE Trans. Softw. Eng. SE–11(10), 1001–1016 (1985)
Article Google Scholar
Li, H., Tandri, S., Stumm, M., Sevcik, K.C.: Locality and loop scheduling on NUMA multiprocessors. In: Proceedings of the 1993 International Conference on Parallel Processing, ICPP 1993, Washington, DC, USA, vol. 2, pp. 140–147. IEEE Computer Society (1993)
Google Scholar
Polychronopoulos, C.D., Kuck, D.J.: Guided self-scheduling: a practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput. C–36(12), 1425–1439 (1987)
Article Google Scholar
Rountree, B., Lowenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: making DVS practical for complex HPC applications. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009, Yorktown Heights, NY, USA, pp. 460–469. ACM (2009)
Google Scholar
Seo, S., et al.: Argobots: a lightweight low-level threading and tasking framework. IEEE Trans. Parallel Distrib. Syst. 29(3), 512–526 (2018)
Article Google Scholar
Tang, P., Yew, P.C.: Processor self-scheduling for multiple-nested parallel loops. In: Proceedings of International Conference on Parallel Processing, pp. 528–535. IEEE, December 1986
Google Scholar
Thoman, P., Jordan, H., Pellegrini, S., Fahringer, T.: Automatic OpenMP loop scheduling: a combined compiler and runtime approach. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 88–101. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_7
Chapter Google Scholar
Tzen, T.H., Ni, L.M.: Trapezoid self-scheduling: a practical scheduling scheme for parallel compilers. IEEE Trans. Parallel Distrib. Syst. 4(1), 87–98 (1993)
Article Google Scholar
Wang, Y., Nicolau, A., Cammarota, R., Veidenbaum, A.V.: A fault tolerant self-scheduling scheme for parallel loops on shared memory systems. In: 2012 19th International Conference on High Performance Computing, pp. 1–10, December 2012
Google Scholar
Zhang, Y., Voss, M.: Runtime empirical selection of loop schedulers on hyperthreaded SMPs. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005) - Papers - Volume 01, IPDPS 2005, Washington, DC, USA, p. 44.2. IEEE Computer Society (2005)
Google Scholar

Download references

Acknowledgments

We thank Alice Koniges from Maui HPCC for providing us with NERSC’s cluster Cori for experimenting with machine learning applications using OpenMP, which helped us consider a relevant platform for user-defined scheduling. This work is partly funded by the Hessian State Ministry of Higher Education by granting the “Hessian Competence Center for High Performance Computing” and by the Swiss National Science Foundation in the context of the “Multi-level Scheduling in Large Scale High Performance Computers” (MLS) grant, number 169123.

Author information

Authors and Affiliations

Brookhaven National Laboratory, Upton, USA
Vivek Kale
Technische Universität Darmstadt, Darmstadt, Germany
Christian Iwainsky
Intel Deutschland GmbH, Feldkirchen, Germany
Michael Klemm
University of Basel, Basel, Switzerland
Jonas H. Müller Korndörfer & Florina M. Ciorba

Authors

Vivek Kale
View author publications
You can also search for this author in PubMed Google Scholar
Christian Iwainsky
View author publications
You can also search for this author in PubMed Google Scholar
Michael Klemm
View author publications
You can also search for this author in PubMed Google Scholar
Jonas H. Müller Korndörfer
View author publications
You can also search for this author in PubMed Google Scholar
Florina M. Ciorba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vivek Kale .

Editor information

Editors and Affiliations

University of Auckland, Auckland, New Zealand
Xing Fan
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
University of Auckland, Auckland, New Zealand
Oliver Sinnen
University of Auckland, Auckland, New Zealand
Nasser Giacaman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kale, V., Iwainsky, C., Klemm, M., Müller Korndörfer, J.H., Ciorba, F.M. (2019). Toward a Standard Interface for User-Defined Scheduling in OpenMP. In: Fan, X., de Supinski, B., Sinnen, O., Giacaman, N. (eds) OpenMP: Conquering the Full Hardware Spectrum. IWOMP 2019. Lecture Notes in Computer Science(), vol 11718. Springer, Cham. https://doi.org/10.1007/978-3-030-28596-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-28596-8_13
Published: 09 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28595-1
Online ISBN: 978-3-030-28596-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Toward a Standard Interface for User-Defined Scheduling in OpenMP

Abstract

Access this chapter

Similar content being viewed by others

Unified Sequential Optimization Directives in OpenMP

Towards Unifying OpenMP Under the Task-Parallel Paradigm

A Proposal for Task-Generating Loops in OpenMP*

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Toward a Standard Interface for User-Defined Scheduling in OpenMP

Abstract

Access this chapter

Similar content being viewed by others

Unified Sequential Optimization Directives in OpenMP

Towards Unifying OpenMP Under the Task-Parallel Paradigm

A Proposal for Task-Generating Loops in OpenMP*

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation