A Resource Manager for Maximizing the Performance of Bioinformatics Workflows in Shared Clusters

Badosa, Ferran; Acevedo, César; Espinosa, Antonio; Vera, Gonzalo; Ripoll, Ana

doi:10.1007/978-3-319-65482-9_35

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10393))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

2344 Accesses

Abstract

In order for bioinformatics workflows to achieve good performance when running on shared clusters, resources must be properly allocated, adjusting to the needs of the bioinformatics applications within.

Time-changing cluster status, caused by the dynamic workload, must also be considered. Users of bioinformatics applications are prompted with the dilemma of providing adequate job description, without prior hint of the resources used by their applications. As a result, naive approaches are taken and both platform efficiency and users’ goals, such as makespan or cost, are compromised. To prevent that, we propose a Resource Manager (RM) for bioinformatics workflows running in shared clusters, capable of improving platform efficiency and reducing average makespan of queued applications.

Our RM contains a predictor that generates multiple job performance predictions, under different combinations of resources. We also included a shared-resource model, that considers the degree of multiprogramming of the nodes (DP), and determines which applications are more compatible for sharing same-node resources. With this information, we developed a scheduling algorithm capable of operating in compliance with the cluster’s default manager, i.e. SLURM.

At the end, our RM is tested on a set of queued workflows, formed by multiple applications each. We prove that a 28% makespan reduction, and a 75% resource efficiency improvement, can be achieved.

Funded by the Spanish Economy ministry. Project Number: TIN2014-53234-C2-1-R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, P., Kifer, D., Olston, C.: Scheduling shared scans of large data files. Proc. VLDB Endow. 1(1), 958–969 (2008). doi:10.1145/1453856.1453960
Article Google Scholar
Al-Ali, R., Kathiresan, N., El Anbari, M., Schendel, E.R., Zaid, T.A.: Workflow optimization of performance and quality of service for bioinformatics application in high performance computing. J. Comput. Sci. 15, 3–10 (2016). doi:10.1016/j.jocs.2016.03.005
Article Google Scholar
Christopher, A., Andrew, M., Stefan, S.: Locally weighted learning. Artif. Intell. Rev. 11(1–5), 11–73 (1997). doi:10.1023/A:1006511328852
Google Scholar
Downey, A.B.: Predicting queue times on space-sharing parallel computers. In: Proceedings of the 11th International Parallel Processing Symposium, pp. 209–218. IEEE (1997) doi:10.1109/IPPS.1997.580894
Figueira, S.M., Berman, F.: A slowdown model for applications executing on time-shared clusters of workstations. IEEE Trans. Parallel Distrib. Syst. 12(6), 653–670 (2001). doi:10.1109/71.932718
Article Google Scholar
Hatem, A., Bozdağ, D., Toland, A.E., Çatalyürek, Ü.V.: Benchmarking short sequence mapping tools. BMC Bioinform. 14(1), 184 (2013). doi:10.1109/BIBM.2011.83
Article Google Scholar
Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.: The grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008). doi:10.1016/j.future.2008.02.003
Article Google Scholar
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009). doi:10.1093/bioinformatics/btp324
Article Google Scholar
Lord, E., Diallo, A., Makarenkov, V.: Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC Bioinform. 16(1), 68 (2015). doi:10.1186/s12859-015-0508-1
Article Google Scholar
Murali, P., Vadhiyar, S.: Qespera: an adaptive framework for prediction of queue waiting times in supercomputer systems. Concurr. Comput. Pract. Exp. 28(9), 2685–2710 (2016). doi:10.1002/cpe.3735
Article Google Scholar
Prodan, R.: Specification and runtime workflow support in the askalon grid environment. Sci. Program. 15(4), 193–211 (2007). doi:10.1155/2007/734021
Google Scholar
Seneviratne, S., Levy, D.: Enhanced host load prediction by division of user load signal for grid computing. J. Cluster Comput. (2005, submitted)
Google Scholar
Seneviratne, S., Levy, D.C.: Task profiling model for load profile prediction. Future Gener. Comput. Syst. 27(3), 245–255 (2011). doi:10.1016/j.future.2010.09.004
Article Google Scholar
Seneviratne, S., Levy, D.C., Buyya, R.: A taxonomy of performance prediction systems in the parallel and distributed computing grids. arXiv preprint arXiv:1307.2380 (2013)
Shanthini, J., Shankarkumar, K.: Anatomy study of execution time predictions in heterogeneous systems. Int. J. Comput. Appl. 45(7), 39–43 (2012). doi:10.5120/6795-9123
Google Scholar
Song, B., Ernemann, C., Yahyapour, R.: Parallel computer workload modeling with Markov chains. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 47–62. Springer, Heidelberg (2005). doi:10.1007/11407522_3
Chapter Google Scholar
Yang, L., Schopf, J.M., Foster, I.: Conservative scheduling: using predicted variance to improve scheduling decisions in dynamic environments. In: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 31. ACM (2003). doi:10.1109/SC.2003.10015

Download references

Author information

Authors and Affiliations

Universitat Autònoma de Barcelona, Bellaterra, Spain
Ferran Badosa, César Acevedo, Antonio Espinosa & Ana Ripoll
Centre for Research in Agricultural Genomics, Bellaterra, Spain
Gonzalo Vera

Authors

Ferran Badosa
View author publications
You can also search for this author in PubMed Google Scholar
César Acevedo
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Espinosa
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Vera
View author publications
You can also search for this author in PubMed Google Scholar
Ana Ripoll
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ferran Badosa .

Editor information

Editors and Affiliations

Inria, Rennes, France
Shadi Ibrahim
University of Texas at San Antonio, San Antonio, Texas, USA
Kim-Kwang Raymond Choo
Aalto University, Espoo, Finland
Zheng Yan
University of Alberta, Edmonton, Alberta, Canada
Witold Pedrycz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Badosa, F., Acevedo, C., Espinosa, A., Vera, G., Ripoll, A. (2017). A Resource Manager for Maximizing the Performance of Bioinformatics Workflows in Shared Clusters. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-65482-9_35
Published: 11 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65481-2
Online ISBN: 978-3-319-65482-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics