The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC

  • Tina Erica OdakaEmail author
  • Anderson Banihirwe
  • Guillaume Eynard-Bontemps
  • Aurelien Ponte
  • Guillaume Maze
  • Kevin Paul
  • Jared Baker
  • Ryan Abernathey
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1190)


The Pangeo ecosystem is an interactive computing software stack for HPC and public cloud infrastructures. In this paper, we show benchmarking results of the Pangeo platform on two different HPC systems. Four different geoscience operations were considered in this benchmarking study with varying chunk sizes and chunking schemes. Both strong and weak scaling analyses were performed. Chunk sizes between 64 MB to 512 MB were considered, with the best scalability obtained for 512 MB. Compared to certain manual chunking schemes, the auto chunking scheme scaled well.


Pangeo Interactive computing HPC Cloud Benchmarking Dask Xarray 



Dr. Abernathey was supported by NSF Earthcube award 1740648. Dr. Paul and Mr. Banihirwe were both supported by NSF Earthcube award 1740633.


  1. 1.
    Zender, C.S.: Analysis of self-describing gridded geoscience data with netCDF Operators (NCO). Environ. Model. Softw. 23(10–11), 1338–1342 (2008). Scholar
  2. 2.
    The NCAR Command Language (Version 6.6.2) [Software]. Boulder, Colorado: UCAR/NCAR/CISL/TDD (2019).
  3. 3.
    Nitzberg, B., Schopf, J.M., Jones, J.P.: PBS Pro: grid computing and scheduling attributes. In: Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds.) Grid Resource Management. International Series in Operations Research & Management Science, vol. 64, pp. 183–190. Springer, Boston (2004).
  4. 4.
    Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). Scholar
  5. 5.
    Pangeo: A community platform for Big Data geoscience.
  6. 6.
    Robinson, N.H., Hamman, J., Abernathey, R.: Science needs to rethink how it interacts with big data: Five principles for effective scientific big data systems. arXiv e-prints p. arXiv:1908.03356, August 2019
  7. 7.
    Eynard-Bontemps, G., Abernathey, R., Hamman, J., Ponte, A., Rath, W.: The PANGEO big data ecosystem and its use at CNES. In: Proceedings of 2019 Big Data from Space, . Munich, Germany, pp. 49–52 (2019).
  8. 8.
    Abernathey, R., et al.: Pangeo NSF Earthcube Proposal (2017).
  9. 9.
    Yu, X., Ponte, A.L., Elipot, S., Menemenlis, D., Zaron, E.D., Abernathey, R.: Surface kinetic energy distributions in the global oceans from a high-resolution numerical model and surface drifter observations. Geophys. Res. Lett. 46(16), 9757–9766 (2019). Scholar
  10. 10.
    Rotary spectral analysis of surface currents and zonal average.
  11. 11.
    Kluyver, T., et al.: Jupyter Notebooks – a publishing format for reproducible computational workflows. In: Loizides, F., Scmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press (2016).
  12. 12.
    Hoyer, S., Hamman, J.: Xarray: N-D labeled arrays and datasets in Python. J. Open Res. Softw. 5(1), 10 (2017). Scholar
  13. 13.
    Met Office: Iris: A Python library for analysing and visualising meteorological and oceanographic data sets. Exeter, Devon (2010–2013).
  14. 14.
    Rocklin, M.: Dask: parallel computation with blocked algorithms and task scheduling. In: Huff, K., Bergstra, J. (eds.) Proceedings of the 14th Python in Science Conference, pp. 126–132 (2015).
  15. 15.
    Dask Development Team: Dask: library for dynamic task scheduling (2016).
  16. 16.
    Zaharia, M., et al.: Apache Spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). Scholar
  17. 17.
  18. 18.
    CNES: The Centre National d’Etudes Spatiales (CNES) is the government agency responsible for shaping and implementing France’s space policy in Europe.
  19. 19.
    Computational and Information Systems Laboratory.: Cheyenne: SGI ICE XA Cluster (2017).
  20. 20.
    JupyterHub — JupyterHub 1.0.0 documentation.
  21. 21.
  22. 22.
  23. 23.
    Benchmarking and scaling studies of the Pangeo platform.
  24. 24.
    Liu, J., Wu, J., Panda, D.K.: High performance RDMA-based MPI implementation over InfiniBand. Int. J. Parallel Prog. 32(3), 167–198 (2004). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Laboratory for Ocean Physics and Satellite Remote Sensing UMR LOPS, Ifremer, Univ. Brest, CNRS, IRD, IUEMBrestFrance
  2. 2.National Center for Atmospheric ResearchBoulderUSA
  3. 3.CNES Computing Center Team, Centre National d’Etudes SpatialesToulouseFrance
  4. 4.Lamont Doherty Earth ObservatoryColumbia UniversityNew YorkUSA

Personalised recommendations