Abstract
As large scale computation systems are growing to exascale, Resources and Jobs Management Systems (RJMS) need to evolve to manage this scale modification. However, their study is problematic since they are critical production systems, where experimenting is extremely costly due to downtime and energy costs. Meanwhile, many scheduling algorithms emerging from theoretical studies have not been transferred to production tools for lack of realistic experimental validation. To tackle these problems we propose Batsim, an extendable, language-independent and scalable RJMS simulator. It allows researchers and engineers to test and compare any scheduling algorithm, using a simple event-based communication interface, which allows different levels of realism. In this paper we show that Batsim’s behaviour matches the one of the real RJMS OAR. Our evaluation process was made with reproducibility in mind and all the experiment material is freely available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
NPB 3.3.1 available here [25].
References
Balouek, D., et al.: Adding virtualization capabilities to the grid’5000 testbed. In: Ivanov, I.I., Sinderen, M., Leymann, F., Shan, T. (eds.) CLOSER 2012. CCIS, vol. 367, pp. 3–20. Springer, Cham (2013). doi:10.1007/978-3-319-04519-1_1
Barcelona Supercomputing Center: Extrae, February 2016. https://www.bsc.es/computer-sciences/extrae
Bedaride, P., Degomme, A., Genaud, S., Legrand, A., Markomanolis, G., Quinson, M., Stillwell, M., Suter, F., Videau, B.: Toward better simulation of MPI applications on ethernet/TCP networks, November 2013. https://hal.inria.fr/hal-00919507/document
Bell, W.H., Cameron, D.G., Millar, A.P., Capozza, L., Stockinger, K., Zini, F.: Optorsim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17(4), 403–416 (2003)
Caniou, Y., Gay, J.-S.: Simbatch: an API for simulating and predicting the performance of parallel resources managed by batch systems. In: César, E., et al. (eds.) Euro-Par 2008. LNCS, vol. 5415, pp. 223–234. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00955-6_27
Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Martin, C., Mounié, G., Neyron, P., Richard, O.: A batch scheduler with high level components. In: IEEE International Symposium on Cluster Computing and the Grid, 2005. CCGrid 2005, vol. 2, pp. 776–783. IEEE (2005)
Casanova, H., Giersch, A., Legrand, A., Quinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014). http://hal.inria.fr/hal-01017319
Clauss, P.N., Stillwell, M., Genaud, S., Suter, F., Casanova, H., Quinson, M.: Single node on-line simulation of MPI applications with SMPI, May 2011. https://hal.inria.fr/inria-00527150/document
Diaz, A., Batista, R., Castro, O.: Realtss: a real-time scheduling simulator. In: 4th International Conference on Electrical and Electronics Engineering, 2007. ICEEE 2007, pp. 165–168. IEEE (2007)
Dutot, P.-F., Poquet, M., Trystram, D.: Communication models insights meet simulations. In: Hunold, S., et al. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 258–269. Springer, Cham (2015). doi:10.1007/978-3-319-27308-2_22
Estrada, T., Flores, D., Taufer, M., Teller, P.J., Kerstens, A., Anderson, D.P., et al.: The effectiveness of threshold-based scheduling policies in BOINC projects. In: Second IEEE International Conference on e-Science and Grid Computing, 2006. e-Science 2006, p. 88. IEEE (2006)
Feitelson, D.G.: Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press, Cambridge (2015). https://cds.cern.ch/record/2005898
Grid5000: Nancy: Home - Grid5000, February 2016. https://www.grid5000.fr/mediawiki/index.php/Nancy:Home
Imbert, M., Pouilloux, L., Rouzaud-Cornabas, J., Lébre, A., Hirofuchi, T.: Using the EXECO toolbox to perform automatic and reproducible cloud experiments, December 2013. https://hal.inria.fr/hal-00861886
Inria: InriaForge: Evalys: Projet Home. https://gforge.inria.fr/projects/evalys
Inria: BatSim Homepage, February 2016. http://batsim.gforge.inria.fr/
Inria: InriaForge: Batsimctn: Project Home, February 2016. https://gforge.inria.fr/projects/simctn/
Inria: InriaForge:expe_batsim: Project Home, February 2016. https://gforge.inria.fr/projects/expe-batsim
Inria: Welcome to execo–execo v2.5.3, February 2016. http://execo.gforge.inria.fr/doc/latest-stable/
Jones, W.M., Ligon III, W.B., Pang, L.W., Stanzione, D.: Characterization of bandwidth-aware meta-schedulers for co-allocating jobs across multiple clusters. J. Supercomput. 34(2), 135–163 (2005)
Klusáček, D., Rudová, H.: Alea 2: job scheduling simulator. In: Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques, p. 61. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering) (2010)
Legrand, A.: Simgrid Usages, January 2016. http://simgrid.gforge.inria.fr/Usages.php
Lucarelli, G., Mendonca, F., Trystram, D., Wagner, F.: Contiguity and locality in backfilling scheduling. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 586–595. IEEE (2015)
Mercier, M.: MPI+PRV+TIT-traces_nas-Benchmarks_2016-02-08-10-10-44, February 2016. http://academictorrents.com/details/53b46a4ff43a8ae91f674b26c65c5cc6187f4f8e
NASA: NAS Parallel Benchmarks, February 2016. https://www.nas.nasa.gov/publications/npb.html
Pascual, J.A., Miguel-Alonso, J., Lozano, J.A.: Locality-aware policies to improve job scheduling on 3D tori. J. Supercomput. 71(3), 966–994 (2015)
Ridruejo Perez, F.J., Miguel-Alonso, J.: INSEE: an interconnection network simulation and evaluation environment. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 1014–1023. Springer, Heidelberg (2005). doi:10.1007/11549468_111
Phatanapherom, S., Uthayopas, P., Kachitvichyanukul, V.: Dynamic scheduling II: fast simulation model for grid scheduling using HyperSim. In: Proceedings of the 35th Conference on Winter Simulation: Driving Innovation, pp. 1494–1500. Winter Simulation Conference (2003)
Proebsting, T., Warren, A.M.: Repeatability and benefaction in computer systems research. Technical report, The university of Arizona (2015). http://reproducibility.cs.arizona.edu/v2/RepeatabilityTR.pdf
Ruiz, C., Harrache, S., Mercier, M., Richard, O.: Reconstructable software appliances with kameleon. SIGOPS Oper. Syst. Rev. 49(1), 80–89 (2015)
Stanisic, L., Legrand, A.: Effective reproducible research with org-mode and Git. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 475–486. Springer, Cham (2014). doi:10.1007/978-3-319-14325-5_41
Takefusa, A., Matsuoka, S., Nakada, H., Aida, K., Nagashima, U.: Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the Eighth International Symposium on High Performance Distributed Computing, pp. 97–104. IEEE (1999)
tcbozzetti: tcbozzetti/trabalhoconclusao, February 2016. https://github.com/tcbozzetti/trabalhoconclusao
oar team: Batsim protocol description (2016), https://github.com/oar-team/batsim/blob/master/doc/proto_description.md
oar team: Kamelot (2016). https://github.com/oar-team/oar3/blob/master/oar/kao/kamelot.py
Xia, H., Dail, H., Casanova, H., Chien, A.: The microgrid: using emulation to predict application performance in diverse grid network environments. In: Proceedings of the Workshop on Challenges of Large Applications in Distributed Environments (2004)
Yu, J., Buyya, R.: A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34(3), 44–49 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Dutot, PF., Mercier, M., Poquet, M., Richard, O. (2017). Batsim: A Realistic Language-Independent Resources and Jobs Management Systems Simulator. In: Desai, N., Cirne, W. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP JSSPP 2015 2016. Lecture Notes in Computer Science(), vol 10353. Springer, Cham. https://doi.org/10.1007/978-3-319-61756-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-61756-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61755-8
Online ISBN: 978-3-319-61756-5
eBook Packages: Computer ScienceComputer Science (R0)