Advertisement

Container Orchestration on HPC Clusters

  • Marco Enrico Piras
  • Luca PiredduEmail author
  • Marco Moro
  • Gianluigi Zanetti
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11887)

Abstract

Use of software containers and services in science is a rising trend that is not satisfied by the HPC computing resources often available in research contexts. We propose a method to grow Kubernetes clusters onto transient nodes allocated through the Grid Engine batch workload manager. The method is being used to run a mix of data-intensive service applications and bursty HPC-style workflows on an OpenStack-based Kubernetes deployment, while keeping a homogeneous job management, logging, monitoring, and storage infrastructure. Moreover, it is relatively straightforward to convert the implementation to be compatible with other workload managers.

Keywords

Cloud computing HPC 

Notes

Acknowledgements

This work was partially supported by the TDM project funded by Sardinian Regional Authorities under grant agreement POR FESR 2014-2020 Azione 1.2 (D. 66/14 13.12.2016 S3-ICT).

References

  1. 1.
    Clyburne-Sherin, A., Fei, X., Green, S.A.: Computational reproducibility via containers in social psychology, April 2019. http://osf.io/s8mz4
  2. 2.
    Container Network Interface - networking for Linux containers, April 2019. https://github.com/containernetworking/cni. Accessed 26 Apr 2019
  3. 3.
    Flannel is a simple and easy way to configure a layer 3 network fabric designed for Kubernetes, April 2019. https://github.com/coreos/flannel. Accessed 26 Apr 2019
  4. 4.
    Gentzsch, W.: Sun grid engine: towards creating a compute power grid. In: Proceedings of the 1st International Symposium on Cluster Computing and the Grid, CCGRID 2001, p. 35. IEEE Computer Society, Washington, DC, USA (2001)Google Scholar
  5. 5.
    Grüning, B., et al.: Practical computational reproducibility in the life sciences. Cell Syst. 6(6), 631–635 (2018).  https://doi.org/10.1016/j.cels.2018.03.014CrossRefGoogle Scholar
  6. 6.
    Guerler, A., et al.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46(W1), W537–W544 (2018).  https://doi.org/10.1093/nar/gky379CrossRefGoogle Scholar
  7. 7.
    Huang, X., Saha, A.K., Dutta, D., Gao, C.: Kubebench: a benchmarking platform for ML workloads. In: 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 73–76 (2018).  https://doi.org/10.1109/AI4I.2018.8665688
  8. 8.
    Jacobsen, D.M., Canon, R.S.: Contain this, unleashing Docker for HPC. In: Proceedings of the Cray User Group (2015)Google Scholar
  9. 9.
    Khalid, A.: HPC-wire: Bridging HPC and Cloud Native development with Kubernetes, April 2019. https://www.hpcwire.com/solution_content/ibm/cross-industry/bridging-hpc-and-cloud-native-development-with-kubernetes/. Accessed 26 Apr 2019
  10. 10.
    kube-batch, April 2019. https://github.com/kubernetes-sigs/kube-batch. Accessed 26 Apr 2019
  11. 11.
    Kubeflow: The machine learning toolkit for kubernetes, April 2019. https://www.kubeflow.org. Accessed 26 Apr 2019
  12. 12.
    Kubernetes: production-grade container orchestration, April 2019. https://www.kubernetes.io. Accessed 26 Apr 2019
  13. 13.
    Deploy a production ready kubernetes cluster, April 2019. https://kubespray.io. Accessed 26 Apr 2019
  14. 14.
    Kubespray, April 2019. https://github.com/tdm-project/kubespray/. Accessed 26 Apr 2019
  15. 15.
    Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12(5), e0177459 (2017)CrossRefGoogle Scholar
  16. 16.
    Liu, F., Keahey, K., Riteau, P., Weissman, J.: Dynamically negotiating capacity between on-demand and batch clusters. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 493–503. IEEE, November 2018.  https://doi.org/10.1109/SC.2018.00041
  17. 17.
    Marmol, V., Jnagal, R., Hockin, T.: Networking in containers and container clusters. In: Proceedings of NetDev 0.1 (2015)Google Scholar
  18. 18.
    Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014(239) (2014) Google Scholar
  19. 19.
    Nagler, R., Bruhwiler, D.L., Moeller, P., Webb, S.: Sustainability and reproducibility via containerized computing. CoRR abs/1509.08789 (2015)Google Scholar
  20. 20.
    Nekrutenko, A., Team, G., Goecks, J., Taylor, J., Blankenberg, D.: Biology needs evolutionary software tools: let’s build them right. Mol. Biol. Evol. 35(6), 1372–1375 (2018).  https://doi.org/10.1093/molbev/msy084CrossRefGoogle Scholar
  21. 21.
    Oracle Inc.: Sun N1 Grid Engine 6.1 Administration Guide, April 2019. Accessed 26 Apr 2019Google Scholar
  22. 22.
    Peters, K., et al.: PhenoMeNal: processing and analysis of metabolomics data in the cloud. GigaScience, 8(2), giy149 (2018)Google Scholar
  23. 23.
    Piras, M.E., del Rio, M., Pireddu, L., Gaggero, M., Zanetti, G.: Manage-cluster: simple utility to help deploy Kubernetes clusters with Terraform and KubeSpray, April 2019. https://github.com/tdm-project/tdm-manage-cluster. Accessed 26 Apr 2019
  24. 24.
    Silver, A.: Software simplified. Nat. News 546(7656), 173 (2017)CrossRefGoogle Scholar
  25. 25.
    Skamarock, W.C., et al.: A description of the advanced research WRF model, version 4. Technical report, National Center for Atmospheric Research, Boulder, CO, USA (2008)Google Scholar
  26. 26.
    Terraform, April 2019. https://www.terraform.io. Accessed 26 Apr 2019
  27. 27.
    da Veiga Leprevost, F., et al.: BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics 33(16), 2580–2582 (2017).  https://doi.org/10.1093/bioinformatics/btx192CrossRefGoogle Scholar
  28. 28.
    Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003).  https://doi.org/10.1007/10968987_3CrossRefGoogle Scholar
  29. 29.
    Zhang, J., Lu, X., Chakraborty, S., Panda, D.K.D.K.: Slurm-V: extending slurm for building efficient HPC cloud with SR-IOV and IVShmem. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 349–362. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-43659-3_26CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marco Enrico Piras
    • 1
  • Luca Pireddu
    • 1
    Email author
  • Marco Moro
    • 1
  • Gianluigi Zanetti
    • 1
  1. 1.CRS4PulaItaly

Personalised recommendations