Skip to main content

Scalability Analysis of Job Scheduling Using Virtual Nodes

  • Conference paper
Job Scheduling Strategies for Parallel Processing (JSSPP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5798))

Included in the following conference series:

Abstract

It is important to identify scalability constraints in existing job scheduling software as they are applied to next generation parallel systems. In this paper, we analyze the scalability of job scheduling and job dispatching functions in the IBM LoadLeveler job scheduler. To enable this scalability study, we propose and implement a new virtualization method to deploy different size LoadLeveler clusters with minimal number of physical machines. Our scalability studies with the virtualization show that the LoadLeveler resource manager can comfortably handle over 12,000 compute nodes, the largest scale we have tested so far. However, our study shows that the static resource matching in the scheduling cycle and job object processing during the hierarchical job launching are two impediments for the scalability of LoadLeveler.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AIX 5l workload manager (wlm), http://www.redbooks.ibm.com/redbooks/pdfs/sg245977.pdf

  2. Darpa high productivity computing systems project, http://www.darpa.mil/ipto/programs/hpcs/hpcs.asp

  3. IBM tivoli workload scheduler loadleveler, http://publib.boulder.ibm.com/-infocenter/clresctr/vxrx/index.jsp

  4. Linux distributions, http://www.linux.org/dist/

  5. Aridor, Y., Domany, T., Goldshmidt, O., Kliteynik, Y., Moreira, J., Shmueli, E.: Open job management architecture for the Blue Gene/L supercomputer. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 91–107. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Aridor, Y., Domany, T., Goldshmidt, O., Kliteynik, Y., Shmueli, E., Moreira, J.E.: Multitoroidal interconnects for tightly coupled supercomputers. IEEE Trans. Parallel Distrib. Syst. 19(1), 52–65 (2008)

    Article  Google Scholar 

  7. Pruyne, J., Livny, M.: A worldwide flock of condors: Load sharing among workstation clusters. Journal on Future Generations of Computer Systems (1996)

    Google Scholar 

  8. Moreira, J.E., Chan, W., Fong, L.L., Franke, H., Jette, M.A.: An infrastructure for efficient parallel job execution in terascale computing environments. In: Supercomputing 1998: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), Washington, DC, USA, pp. 1–14. IEEE Computer Society, Los Alamitos (1998)

    Google Scholar 

  9. Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)

    Article  Google Scholar 

  10. Pfister, G.F.: An introduction to the InfiniBand architecture. In: Jin, H., Cortes, T., Buyya, R. (eds.) High Performance Mass Storage and Parallel I/O: Technologies and Applications, ch. 42, pp. 617–632. IEEE Computer Society Press/Wiley, New York (2001)

    Google Scholar 

  11. Ryu, K.D., Daly, D., Seminara, M., Song, S., Crumley, P.G.: Agent multiplication: An economical large-scale testing environment for system management solutions. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, April 2008, pp. 1–8 (2008)

    Google Scholar 

  12. Stunkel, C.B., Shea, D.G., Aball, B., Atkins, M.G., Bender, C.A., Grice, D.G., Hochschild, P., Joseph, D.J., Nathanson, B.J., Swetz, R.A., Stucke, R.F., Tsao, M., Varker, P.R.: The sp2 high-performance switch. IBM System Journal 34(2), 185–204 (1995)

    Article  Google Scholar 

  13. Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor – a distributed job scheduler. In: Sterling, T. (ed.) Beowulf Cluster Computing with Linux. MIT Press, Cambridge (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bobroff, N., Coppinger, R., Fong, L., Seelam, S., Xu, J. (2009). Scalability Analysis of Job Scheduling Using Virtual Nodes. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2009. Lecture Notes in Computer Science, vol 5798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04633-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04633-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04632-2

  • Online ISBN: 978-3-642-04633-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics