Skip to main content

A Resource Manager for Maximizing the Performance of Bioinformatics Workflows in Shared Clusters

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2017)

Abstract

In order for bioinformatics workflows to achieve good performance when running on shared clusters, resources must be properly allocated, adjusting to the needs of the bioinformatics applications within.

Time-changing cluster status, caused by the dynamic workload, must also be considered. Users of bioinformatics applications are prompted with the dilemma of providing adequate job description, without prior hint of the resources used by their applications. As a result, naive approaches are taken and both platform efficiency and users’ goals, such as makespan or cost, are compromised. To prevent that, we propose a Resource Manager (RM) for bioinformatics workflows running in shared clusters, capable of improving platform efficiency and reducing average makespan of queued applications.

Our RM contains a predictor that generates multiple job performance predictions, under different combinations of resources. We also included a shared-resource model, that considers the degree of multiprogramming of the nodes (DP), and determines which applications are more compatible for sharing same-node resources. With this information, we developed a scheduling algorithm capable of operating in compliance with the cluster’s default manager, i.e. SLURM.

At the end, our RM is tested on a set of queued workflows, formed by multiple applications each. We prove that a 28% makespan reduction, and a 75% resource efficiency improvement, can be achieved.

Funded by the Spanish Economy ministry. Project Number: TIN2014-53234-C2-1-R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, P., Kifer, D., Olston, C.: Scheduling shared scans of large data files. Proc. VLDB Endow. 1(1), 958–969 (2008). doi:10.1145/1453856.1453960

    Article  Google Scholar 

  2. Al-Ali, R., Kathiresan, N., El Anbari, M., Schendel, E.R., Zaid, T.A.: Workflow optimization of performance and quality of service for bioinformatics application in high performance computing. J. Comput. Sci. 15, 3–10 (2016). doi:10.1016/j.jocs.2016.03.005

    Article  Google Scholar 

  3. Christopher, A., Andrew, M., Stefan, S.: Locally weighted learning. Artif. Intell. Rev. 11(1–5), 11–73 (1997). doi:10.1023/A:1006511328852

    Google Scholar 

  4. Downey, A.B.: Predicting queue times on space-sharing parallel computers. In: Proceedings of the 11th International Parallel Processing Symposium, pp. 209–218. IEEE (1997) doi:10.1109/IPPS.1997.580894

  5. Figueira, S.M., Berman, F.: A slowdown model for applications executing on time-shared clusters of workstations. IEEE Trans. Parallel Distrib. Syst. 12(6), 653–670 (2001). doi:10.1109/71.932718

    Article  Google Scholar 

  6. Hatem, A., Bozdağ, D., Toland, A.E., Çatalyürek, Ü.V.: Benchmarking short sequence mapping tools. BMC Bioinform. 14(1), 184 (2013). doi:10.1109/BIBM.2011.83

    Article  Google Scholar 

  7. Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.: The grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008). doi:10.1016/j.future.2008.02.003

    Article  Google Scholar 

  8. Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009). doi:10.1093/bioinformatics/btp324

    Article  Google Scholar 

  9. Lord, E., Diallo, A., Makarenkov, V.: Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC Bioinform. 16(1), 68 (2015). doi:10.1186/s12859-015-0508-1

    Article  Google Scholar 

  10. Murali, P., Vadhiyar, S.: Qespera: an adaptive framework for prediction of queue waiting times in supercomputer systems. Concurr. Comput. Pract. Exp. 28(9), 2685–2710 (2016). doi:10.1002/cpe.3735

    Article  Google Scholar 

  11. Prodan, R.: Specification and runtime workflow support in the askalon grid environment. Sci. Program. 15(4), 193–211 (2007). doi:10.1155/2007/734021

    Google Scholar 

  12. Seneviratne, S., Levy, D.: Enhanced host load prediction by division of user load signal for grid computing. J. Cluster Comput. (2005, submitted)

    Google Scholar 

  13. Seneviratne, S., Levy, D.C.: Task profiling model for load profile prediction. Future Gener. Comput. Syst. 27(3), 245–255 (2011). doi:10.1016/j.future.2010.09.004

    Article  Google Scholar 

  14. Seneviratne, S., Levy, D.C., Buyya, R.: A taxonomy of performance prediction systems in the parallel and distributed computing grids. arXiv preprint arXiv:1307.2380 (2013)

  15. Shanthini, J., Shankarkumar, K.: Anatomy study of execution time predictions in heterogeneous systems. Int. J. Comput. Appl. 45(7), 39–43 (2012). doi:10.5120/6795-9123

    Google Scholar 

  16. Song, B., Ernemann, C., Yahyapour, R.: Parallel computer workload modeling with Markov chains. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 47–62. Springer, Heidelberg (2005). doi:10.1007/11407522_3

    Chapter  Google Scholar 

  17. Yang, L., Schopf, J.M., Foster, I.: Conservative scheduling: using predicted variance to improve scheduling decisions in dynamic environments. In: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 31. ACM (2003). doi:10.1109/SC.2003.10015

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ferran Badosa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Badosa, F., Acevedo, C., Espinosa, A., Vera, G., Ripoll, A. (2017). A Resource Manager for Maximizing the Performance of Bioinformatics Workflows in Shared Clusters. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65482-9_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65481-2

  • Online ISBN: 978-3-319-65482-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics