Skip to main content

The Importance of Complete Data Sets for Job Scheduling Simulations

  • Conference paper
Job Scheduling Strategies for Parallel Processing (JSSPP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6253))

Included in the following conference series:

Abstract

This paper has been inspired by the study of the complex data set from the Czech National Grid MetaCentrum. Unlike other widely used workloads from Parallel Workloads Archive or Grid Workloads Archive, this data set includes additional information concerning machine failures, job requirements and machine parameters which allows to perform more realistic simulations. We show that large differences in the performance of various scheduling algorithms appear when these additional information are used. Moreover, we studied other publicly available workloads and partially reconstructed information concerning their machine failures and job requirements using statistical and analytical models to demonstrate that similar behavior is also expectable for other workloads. We suggest that additional information about both machines and jobs should be incorporated into the workloads archives to allow proper and more realistic simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Xhafa, F., Abraham, A.: Computational models and heuristic methods for grid scheduling problems. Future Generation Computer Systems 26(4), 608–621 (2010)

    Article  Google Scholar 

  2. Feitelson, D.G.: Parallel workloads archive (PWA), http://www.cs.huji.ac.il/labs/parallel/workload/

  3. Epema, D., Anoep, S., Dumitrescu, C., Iosup, A., Jan, M., Li, H., Wolters, L.: Grid workloads archive (GWA), http://gwa.ewi.tudelft.nl/pmwiki/

  4. Skovira, J., Chan, W., Zhou, H., Lifka, D.: The EASY - LoadLeveler API project. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1996 and JSSPP 1996. LNCS, vol. 1162, pp. 41–47. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  5. Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: A backfilling case study. IEEE Transactions on Parallel and Distributed Systems 16(2), 175–182 (2005)

    Article  Google Scholar 

  6. Jones, J.P.: PBS Professional 7, administrator guide. Altair (2005)

    Google Scholar 

  7. Xu, M.Q.: Effective metacomputing using LSF multicluster. In: CCGRID 2001: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, pp. 100–105. IEEE, Los Alamitos (2001)

    Google Scholar 

  8. Cluster Resources: Moab workload manager administrator’s guide, version 5.3 (2010), http://www.clusterresources.com/products/mwm/docs/

  9. MetaCentrum, http://meta.cesnet.cz/

  10. Klusáček, D., Rudová, H.: Complex real-life data sets in Grid simulations (abstract). In: Cracow Grid Workshop 2009 Abstracts (CGW 2009), Cracow, Poland (2009)

    Google Scholar 

  11. Klusáček, D., Rudová, H.: Efficient grid scheduling through the incremental schedule-based approach. Computational Intelligence: An International Journal (to appear 2010)

    Google Scholar 

  12. Klusáček, D., Rudová, H., Baraglia, R., Pasquali, M., Capannini, G.: Comparison of multi-criteria scheduling techniques. In: Grid Computing Achievements and Prospects, pp. 173–184. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Kondo, D., Javadi, B., Iosup, A., Epema, D.: The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems. Technical Report 00433523, INRIA (2009)

    Google Scholar 

  14. Zhang, Y., Squillante, M.S., Sivasubramaniam, A., Sahoo, R.K.: Performance implications of failures in large-scale cluster scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 233–252. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Schroeder, B., Gibson, G.A.: A large-scale study of failures in high-performance computing systems. In: DSN 2006: Proceedings of the International Conference on Dependable Systems and Networks, pp. 249–258. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  16. Iosup, A., Jan, M., Sonmez, O., Epema, D.H.J.: On the dynamic resource availability in grids. In: GRID 2007: Proceedings of the 8th IEEE/ACM International Conference on Grid Computing, pp. 26–33. IEEE Computer Society, Los Alamitos (2007)

    Chapter  Google Scholar 

  17. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  18. Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global Grid computing for job scheduling. In: GRID 2004: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pp. 374–379. IEEE, Los Alamitos (2004)

    Google Scholar 

  19. Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.J.: The grid workloads archive. Future Generation Computer Systems 24(7), 672–686 (2008)

    Article  Google Scholar 

  20. Chapin, S.J., Cirne, W., Feitelson, D.G., Jones, J.P., Leutenegger, S.T., Schwiegelshohn, U., Smith, W., Talby, D.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  21. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  22. Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63(11), 1105–1122 (2003)

    Article  MATH  Google Scholar 

  23. Feitelson, D.G., Rudolph, L.: Metrics and benchmarking for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  24. Repository of availability traces (RAT), http://www.cs.illinois.edu/~pbg/availability/

  25. The computer failure data repository (CFDR), http://cfdr.usenix.org/

  26. Sahoo, R.K., Sivasubramaniam, A., Squillante, M.S., Zhang, Y.: Failure data analysis of a large-scale heterogeneous server environment. In: DSN 2004: Proceedings of the 2004 International Conference on Dependable Systems and Networks, pp. 772–784. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  27. Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, 2nd edn., vol. 1. Wiley-Interscience, Hoboken (1994)

    MATH  Google Scholar 

  28. Heath, T., Martin, R.P., Nguyen, T.D.: Improving cluster availability using workstation validation. ACM SIGMETRICS Performance Evaluation Review 30(1), 217–227 (2002)

    Article  Google Scholar 

  29. Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Selective reservation strategies for backfill job scheduling. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 55–71. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  30. Hovestadt, M., Kao, O., Keller, A., Streit, A.: Scheduling in HPC resource management systems: Queueing vs. planning. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 1–20. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  31. Sulistio, A., Cibej, U., Venugopal, S., Robic, B., Buyya, R.: A toolkit for modelling and simulating data Grids: an extension to GridSim. Concurrency and Computation: Practice & Experience 20(13), 1591–1609 (2008)

    Article  Google Scholar 

  32. Klusáček, D., Rudová, H.: Alea 2 – job scheduling simulator. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques (SIMUTools 2010), ICST (2010)

    Google Scholar 

  33. Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12(6), 529–543 (2001)

    Article  Google Scholar 

  34. Krallmann, J., Schwiegelshohn, U., Yahyapour, R.: On the design and evaluation of job scheduling algorithms. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 17–42. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klusáček, D., Rudová, H. (2010). The Importance of Complete Data Sets for Job Scheduling Simulations. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2010. Lecture Notes in Computer Science, vol 6253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16505-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16505-4_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16504-7

  • Online ISBN: 978-3-642-16505-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics