Skip to main content

Modeling Job Arrivals in a Data-Intensive Grid

  • Conference paper
Job Scheduling Strategies for Parallel Processing (JSSPP 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4376))

Included in the following conference series:

Abstract

In this paper we present an initial analysis of job arrivals in a production data-intensive Grid and investigate several traffic models to characterize the interarrival time processes. Our analysis focuses on the heavy-tail behavior and autocorrelation structures, and the modeling is carried out at three different levels: Grid, Virtual Organization (VO), and region. A set of m-state Markov modulated Poisson processes (MMPP) is investigated, while Poisson processes and hyperexponential renewal processes are evaluated for comparison studies. We apply the transportation distance metric from dynamical systems theory to further characterize the differences between the data trace and the simulated time series, and estimate errors by bootstrapping. The experimental results show that MMPPs with a certain number of states are successful to a certain extent in simulating the job traffic at different levels, fitting both the interarrival time distribution and the autocorrelation function. However, MMPPs are not able to match the autocorrelations for certain VOs, in which strong deterministic semi-periodic patterns are observed. These patterns are further characterized using different representations. Future work is needed to model both deterministic and stochastic components in order to better capture the correlation structure in the series.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distribution via the EM algorithm. Scand. J. Statist. 23, 419–441 (1996)

    MATH  Google Scholar 

  2. Barabasi, A.-L.: The origin of bursts and heavy tails in human dynamics. Nature 435, 207–211 (2005)

    Article  Google Scholar 

  3. Basu, S., Foufoula-Georgiou, E.: Detection of nonlinearity and chaoticity in time series using the transportation distance function. Physics Letters A 301, 413–423 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. Beran, J.: Statistics for Long Memory Processes. Chapman and Hall, New York (1994)

    MATH  Google Scholar 

  5. Brémaud, P.: Markov Chains. Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, New York (2001)

    Google Scholar 

  6. Chapin, S.J., et al.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  7. Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: IEEE 4th Annual Workshop on Workload Characterization, IEEE Computer Society Press, Los Alamitos (2001)

    Google Scholar 

  8. Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  9. Downey, A.B., Feitelson, D.G.: The elusive goal of workload characterization. Performance Evaluation Review 26(4), 14–29 (1999)

    Article  Google Scholar 

  10. Dumitrescu, C., Raicu, I., Foster, I.: DI-GRUBER: A Distributed Approach to Grid Resource Brokering. In: Proceedings of Supercomputing ’05, ACM Press, New York (2005)

    Google Scholar 

  11. Workload Management in EGEE and gLite. http://lxmi.mi.infn.it/egee-jra1-wm/

  12. EMpht program. http://home.imf.au.dk/asmus/

  13. Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  14. Fischer, W., Meier-Hellstern, K.: The Markov-modulated Poisson process (MMPP) cookbook. Performance Evaluation 18(2), 149–171 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  15. Li, H.: Tools for Workload Modeling in the Grid. http://www.liacs.nl/home/hli/gwm/

  16. Heffes, H., Lucantoni, D.M.: A Markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance. IEEE J. on Sel. Areas in Comm. 4(6), 856–868 (1986)

    Article  Google Scholar 

  17. Jagerman, D.L., Melamed, B., Willinger, W.: Stochastic modeling of traffic processes. In: Frontiers in Queueing: Models, Methods and Problems, CRC Press, Boca Raton (1996)

    Google Scholar 

  18. Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  19. Karagiannis, T., Faloutsos, M.: SELFIS: A Tool For Self-Similarity and Long-Range Dependence Analysis. In: 1st Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches, Canada (2002)

    Google Scholar 

  20. Künsch, H.R.: The jackknife and bootstrap for general stationary observations. The Annals of Statistics 17, 1217–1241 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  21. The Worldwide LHC Computing Grid project. http://lcg.web.cern.ch/LCG/

  22. Leland, W., et al.: On the self-similar nature of ethernet traffic (extended version). IEEE/ACM Trans. on Networking 2(1), 1–15 (1994)

    Article  Google Scholar 

  23. Li, H., Groep, D., Wolters, L.: Workload Characteristics of a Multi-cluster Supercomputer. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 176–193. Springer, Heidelberg (2005)

    Google Scholar 

  24. lp_solve 5.5.0.7. http://lpsolve.sourceforge.net/5.5/

  25. Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Para. and Dist. Comput. 63(11), 1105–1122 (2003)

    Article  MATH  Google Scholar 

  26. Löbel, A.: Solving large-scale real-world minimum-cost flow problems by a network simplex method. Technical Report SC 96-7, Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) (February 1996), Software available at http://www.zib.de/Optimization/Software/Mcf/

  27. Medernach, E.: Workload analysis of a cluster in a Grid environment. In: Feitelson, D.G., et al. (eds.) JSSPP 2005. LNCS, vol. 3834, Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  28. Moeckel, R., Murray, B.: Measuring the distance between timeseries. Physica D 102, 187–194 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  29. Muskulus, M., et al.: Estimating differences between probability densities and time series (In preparation), Software available at http://www.math.leidenuniv.nl/~muskulus/

  30. Neuts, M.F.: Structured Stochastic Matrices of M/G/1-type and their Applications. Marcel Dekker, New York (1989)

    MATH  Google Scholar 

  31. Nabrzyski, J., Schopf, J.M., Weglarz, J.: Grid Resource Management: State of the Art and Future Trends. International Series in Operations Research & Management Science. Springer, Heidelberg (2003)

    Google Scholar 

  32. Politis, D.N.: The Impact of Bootstrap Methods on Time Series Analysis. Statistical Science 18(2), 219–230 (2003)

    Article  MathSciNet  Google Scholar 

  33. Parallel Workload Archive. http://www.cs.huji.ac.il/labs/parallel/workload/

  34. Riska, A.: Aggregate Matrix-analytic Techniques and their Applications. PhD thesis, Department of Computer Science, College of William and Mary (2002)

    Google Scholar 

  35. Roberts, W.J.J., Ephraim, Y., Dieguez, E.: On Ryden’s EM algorithm for estimating MMPP’s. IEEE Sig. Proc. Let. (to appear)

    Google Scholar 

  36. The LCG Real Time Monitor. http://gridportal.hep.ph.ic.ac.uk/rtm/

  37. Ryden, T.: Parameter estimation for Markov modulated Poisson processes. Communications in Statistics - Stochastic Models 10(4), 795–829 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  38. Ryden, T.: An EM algorithm for estimation in Markov-modulated Poisson processes. Comp. Stat. and Data Analysis 21, 431–447 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  39. Schrijver, A.: Theory of Linear and Integer Programming. Wiley, Chichester (1998)

    MATH  Google Scholar 

  40. Scott, S.L.: Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century. J. Am. Stat. Assoc. 97(457), 337–351 (2002)

    Article  MATH  Google Scholar 

  41. Song, B., Ernemann, C., Yahyapour, R.: Parallel Computer Workload Modeling with Markov Chains. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 47–62. Springer, Heidelberg (2005)

    Google Scholar 

  42. Squillante, M.S., Yao, D.D., Zhang, L.: The impact of job arrival patterns on parallel scheduling. ACM SIGMETRICS Performance Evaluation Review 26(4), 52–59 (1999)

    Article  Google Scholar 

  43. Takeuchi, J.I., Yamanishi, K.: A Unified Framework for Detecting Outliers and Change Points from Time Series. IEEE Transactions on Knowledge and Data Engineering 18(4), 482–492 (2006)

    Article  Google Scholar 

  44. Willinger, W., Taqqu, M.S., Erramilli, A.: A Bibliographical Guide to Self-Similar Traffic and Performance Modeling for Modern High-Speed Networks. In: Stochastic Networks: Theory and Applications, pp. 339–366. Oxford University Press, Oxford (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Eitan Frachtenberg Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Li, H., Muskulus, M., Wolters, L. (2007). Modeling Job Arrivals in a Data-Intensive Grid. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2006. Lecture Notes in Computer Science, vol 4376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71035-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71035-6_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71034-9

  • Online ISBN: 978-3-540-71035-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics