Scalability Analysis of Job Scheduling Using Virtual Nodes

Bobroff, Norman; Coppinger, Richard; Fong, Liana; Seelam, Seetharami; Xu, Jing

doi:10.1007/978-3-642-04633-9_11

Norman Bobroff¹⁸,
Richard Coppinger¹⁹,
Liana Fong¹⁸,
Seetharami Seelam¹⁸ &
…
Jing Xu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5798))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

482 Accesses
3 Citations

Abstract

It is important to identify scalability constraints in existing job scheduling software as they are applied to next generation parallel systems. In this paper, we analyze the scalability of job scheduling and job dispatching functions in the IBM LoadLeveler job scheduler. To enable this scalability study, we propose and implement a new virtualization method to deploy different size LoadLeveler clusters with minimal number of physical machines. Our scalability studies with the virtualization show that the LoadLeveler resource manager can comfortably handle over 12,000 compute nodes, the largest scale we have tested so far. However, our study shows that the static resource matching in the scheduling cycle and job object processing during the hierarchical job launching are two impediments for the scalability of LoadLeveler.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AIX 5l workload manager (wlm), http://www.redbooks.ibm.com/redbooks/pdfs/sg245977.pdf
Darpa high productivity computing systems project, http://www.darpa.mil/ipto/programs/hpcs/hpcs.asp
IBM tivoli workload scheduler loadleveler, http://publib.boulder.ibm.com/-infocenter/clresctr/vxrx/index.jsp
Linux distributions, http://www.linux.org/dist/
Aridor, Y., Domany, T., Goldshmidt, O., Kliteynik, Y., Moreira, J., Shmueli, E.: Open job management architecture for the Blue Gene/L supercomputer. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 91–107. Springer, Heidelberg (2005)
Chapter Google Scholar
Aridor, Y., Domany, T., Goldshmidt, O., Kliteynik, Y., Shmueli, E., Moreira, J.E.: Multitoroidal interconnects for tightly coupled supercomputers. IEEE Trans. Parallel Distrib. Syst. 19(1), 52–65 (2008)
Article Google Scholar
Pruyne, J., Livny, M.: A worldwide flock of condors: Load sharing among workstation clusters. Journal on Future Generations of Computer Systems (1996)
Google Scholar
Moreira, J.E., Chan, W., Fong, L.L., Franke, H., Jette, M.A.: An infrastructure for efficient parallel job execution in terascale computing environments. In: Supercomputing 1998: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), Washington, DC, USA, pp. 1–14. IEEE Computer Society, Los Alamitos (1998)
Google Scholar
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)
Article Google Scholar
Pfister, G.F.: An introduction to the InfiniBand architecture. In: Jin, H., Cortes, T., Buyya, R. (eds.) High Performance Mass Storage and Parallel I/O: Technologies and Applications, ch. 42, pp. 617–632. IEEE Computer Society Press/Wiley, New York (2001)
Google Scholar
Ryu, K.D., Daly, D., Seminara, M., Song, S., Crumley, P.G.: Agent multiplication: An economical large-scale testing environment for system management solutions. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, April 2008, pp. 1–8 (2008)
Google Scholar
Stunkel, C.B., Shea, D.G., Aball, B., Atkins, M.G., Bender, C.A., Grice, D.G., Hochschild, P., Joseph, D.J., Nathanson, B.J., Swetz, R.A., Stucke, R.F., Tsao, M., Varker, P.R.: The sp2 high-performance switch. IBM System Journal 34(2), 185–204 (1995)
Article Google Scholar
Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor – a distributed job scheduler. In: Sterling, T. (ed.) Beowulf Cluster Computing with Linux. MIT Press, Cambridge (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, Hawthorne, NY, 10532, USA
Norman Bobroff, Liana Fong & Seetharami Seelam
IBM Systems and Technology Group, Poughkeepsie, NY, 12601, USA
Richard Coppinger
University of Florida, Gainesville, FL, 32611, USA
Jing Xu

Authors

Norman Bobroff
View author publications
You can also search for this author in PubMed Google Scholar
Richard Coppinger
View author publications
You can also search for this author in PubMed Google Scholar
Liana Fong
View author publications
You can also search for this author in PubMed Google Scholar
Seetharami Seelam
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft, 475 Brannan St., 94107, San Francisco, CA, USA
Eitan Frachtenberg
Robotics Research Institute, Section Information Technology, TU Dortmund University, Otto-Hahn-Str. 8, 44227, Dortmund, Germany
Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bobroff, N., Coppinger, R., Fong, L., Seelam, S., Xu, J. (2009). Scalability Analysis of Job Scheduling Using Virtual Nodes. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2009. Lecture Notes in Computer Science, vol 5798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04633-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-04633-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04632-2
Online ISBN: 978-3-642-04633-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics