Abstract
In order for bioinformatics workflows to achieve good performance when running on shared clusters, resources must be properly allocated, adjusting to the needs of the bioinformatics applications within.
Time-changing cluster status, caused by the dynamic workload, must also be considered. Users of bioinformatics applications are prompted with the dilemma of providing adequate job description, without prior hint of the resources used by their applications. As a result, naive approaches are taken and both platform efficiency and users’ goals, such as makespan or cost, are compromised. To prevent that, we propose a Resource Manager (RM) for bioinformatics workflows running in shared clusters, capable of improving platform efficiency and reducing average makespan of queued applications.
Our RM contains a predictor that generates multiple job performance predictions, under different combinations of resources. We also included a shared-resource model, that considers the degree of multiprogramming of the nodes (DP), and determines which applications are more compatible for sharing same-node resources. With this information, we developed a scheduling algorithm capable of operating in compliance with the cluster’s default manager, i.e. SLURM.
At the end, our RM is tested on a set of queued workflows, formed by multiple applications each. We prove that a 28% makespan reduction, and a 75% resource efficiency improvement, can be achieved.
Funded by the Spanish Economy ministry. Project Number: TIN2014-53234-C2-1-R.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, P., Kifer, D., Olston, C.: Scheduling shared scans of large data files. Proc. VLDB Endow. 1(1), 958–969 (2008). doi:10.1145/1453856.1453960
Al-Ali, R., Kathiresan, N., El Anbari, M., Schendel, E.R., Zaid, T.A.: Workflow optimization of performance and quality of service for bioinformatics application in high performance computing. J. Comput. Sci. 15, 3–10 (2016). doi:10.1016/j.jocs.2016.03.005
Christopher, A., Andrew, M., Stefan, S.: Locally weighted learning. Artif. Intell. Rev. 11(1–5), 11–73 (1997). doi:10.1023/A:1006511328852
Downey, A.B.: Predicting queue times on space-sharing parallel computers. In: Proceedings of the 11th International Parallel Processing Symposium, pp. 209–218. IEEE (1997) doi:10.1109/IPPS.1997.580894
Figueira, S.M., Berman, F.: A slowdown model for applications executing on time-shared clusters of workstations. IEEE Trans. Parallel Distrib. Syst. 12(6), 653–670 (2001). doi:10.1109/71.932718
Hatem, A., Bozdağ, D., Toland, A.E., Çatalyürek, Ü.V.: Benchmarking short sequence mapping tools. BMC Bioinform. 14(1), 184 (2013). doi:10.1109/BIBM.2011.83
Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.: The grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008). doi:10.1016/j.future.2008.02.003
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009). doi:10.1093/bioinformatics/btp324
Lord, E., Diallo, A., Makarenkov, V.: Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC Bioinform. 16(1), 68 (2015). doi:10.1186/s12859-015-0508-1
Murali, P., Vadhiyar, S.: Qespera: an adaptive framework for prediction of queue waiting times in supercomputer systems. Concurr. Comput. Pract. Exp. 28(9), 2685–2710 (2016). doi:10.1002/cpe.3735
Prodan, R.: Specification and runtime workflow support in the askalon grid environment. Sci. Program. 15(4), 193–211 (2007). doi:10.1155/2007/734021
Seneviratne, S., Levy, D.: Enhanced host load prediction by division of user load signal for grid computing. J. Cluster Comput. (2005, submitted)
Seneviratne, S., Levy, D.C.: Task profiling model for load profile prediction. Future Gener. Comput. Syst. 27(3), 245–255 (2011). doi:10.1016/j.future.2010.09.004
Seneviratne, S., Levy, D.C., Buyya, R.: A taxonomy of performance prediction systems in the parallel and distributed computing grids. arXiv preprint arXiv:1307.2380 (2013)
Shanthini, J., Shankarkumar, K.: Anatomy study of execution time predictions in heterogeneous systems. Int. J. Comput. Appl. 45(7), 39–43 (2012). doi:10.5120/6795-9123
Song, B., Ernemann, C., Yahyapour, R.: Parallel computer workload modeling with Markov chains. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 47–62. Springer, Heidelberg (2005). doi:10.1007/11407522_3
Yang, L., Schopf, J.M., Foster, I.: Conservative scheduling: using predicted variance to improve scheduling decisions in dynamic environments. In: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 31. ACM (2003). doi:10.1109/SC.2003.10015
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Badosa, F., Acevedo, C., Espinosa, A., Vera, G., Ripoll, A. (2017). A Resource Manager for Maximizing the Performance of Bioinformatics Workflows in Shared Clusters. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-65482-9_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65481-2
Online ISBN: 978-3-319-65482-9
eBook Packages: Computer ScienceComputer Science (R0)