Advertisement

Accelerating Experimental Science Using Jupyter and NERSC HPC

  • Matthew L. HendersonEmail author
  • William Krinsman
  • Shreyas Cholia
  • Rollin Thomas
  • Trevor Slaton
Conference paper
  • 23 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1190)

Abstract

Large scale experimental science workflows require support for a unified, interactive, real-time platform that can manage a distributed set of resources connected to High Performance Computing (HPC) systems. What is needed is a tool that provides the ease-of-use and interactivity of a web science gateway, while providing the scientist the ability to build custom, ad-hoc workflows in a composable way. The Jupyter platform can play a key role here to enable the ingestion and analysis of real-time streaming data, integrate with HPC resources in a closed-loop, and enable interactive ad-hoc analyses with running workflows.

We want to enable high-quality reproducible human-in-the-loop science using HPC and Jupyter at the National Energy Research Scientific Computing Center (NERSC). Achieving that goal is challenging in the general case because scientific workflows and data can vary significantly in size and type between disciplines. There are many areas of work to achieve highly reproducible science, let alone human-in-the-loop interactive scientific workflows, but we focus here on some basic elements for enabling an improved interactive HPC experience including creating reusable recipes and workflows with Notebooks, sharing and cloning Notebooks, and parallelization and scaling of scientific code requiring HPC and using Jupyter.

Keywords

HPC Interactive Jupyter Scientific workflows Reuse Parameters 

Notes

Acknowledgements

This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.

We wish to thank the Jupyter team; Colin Ophus, Benjamin Savitzky, and Steven Zeltmann at NCEM; and Dilworth Parkinson at ALS Beamline 8.3.2. We would also like to thank Lindsey Heagy for the geoscience Notebook example.

References

  1. 1.
  2. 2.
  3. 3.
    Convert notebooks to other formats. https://nbconvert.readthedocs.io/en/latest/
  4. 4.
  5. 5.
  6. 6.
    Ipyparallel: Using ipython for parallel computing. https://ipyparallel.readthedocs.io/en/latest/
  7. 7.
  8. 8.
    Jupyterlab: Real time collaboration. https://github.com/jupyterlab/jupyterlab/issues/5382
  9. 9.
    Jupyterlab: The next generation web-based user interface for project jupyter. https://github.com/jupyterlab/jupyterlab
  10. 10.
  11. 11.
    mlhenderson fork of lbnl-2019-resistive-casing notebook. https://github.com/mlhenderson/lbnl-2019-resistive-casing
  12. 12.
  13. 13.
    Scalable analytics in python. https://dask.org/
  14. 14.
  15. 15.
    Pangeo (2018). https://pangeo.io/
  16. 16.
    Bsavitzky, et al.: py4dstem/py4dstem: Doi release, July 2019.  https://doi.org/10.5281/zenodo.3333960
  17. 17.
    Clausen, A., et al.: Libertem/libertem: 0.1.0, November 2018.  https://doi.org/10.5281/zenodo.1478763
  18. 18.
    Das, S., et al.: Observation of room-temperature polar skyrmions. Nature 568(7752), 368 (2019)CrossRefGoogle Scholar
  19. 19.
    Dask: dask-labextension, September 2019. https://github.com/dask/dask-labextension
  20. 20.
    Farrell, S., et al.: Interactive distributed deep learning with Jupyter notebooks. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 678–687. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-02465-9_49CrossRefGoogle Scholar
  21. 21.
    Folk, M., Cheng, A., Yates, K.: HDF5: a file format and I/O library for high performance computing applications. In: Proceedings of Supercomputing, vol. 99, pp. 5–33 (1999)Google Scholar
  22. 22.
    Guizar-Sicairos, M., Thurman, S.T., Fienup, J.R.: Efficient subpixel image registration algorithms. Opt. Lett. 33(2), 156–158 (2008)CrossRefGoogle Scholar
  23. 23.
    Heagy, L.J., Oldenburg, D.W.: Modeling electromagnetics on cylindrical meshes with applications to steel-cased wells. Comput. Geosci. 125, 115–130 (2019).  https://doi.org/10.1016/j.cageo.2018.11.010 CrossRefGoogle Scholar
  24. 24.
    Jupyter: jupyter/nbviewer, September 2019. https://github.com/jupyter/nbviewer
  25. 25.
    Project Jupyter, et al.: Binder 2.0 - reproducible, interactive, sharable environments for science at scale. In: Akici, F., Lippa, D., Niederhut, D., Pacer, M. (eds.) Proceedings of the 17th Python in Science Conference, pp. 113–120 (2018).  https://doi.org/10.25080/Majora-4af1f417-011
  26. 26.
    Kanitpanyacharoen, W., et al.: A comparative study of x-ray tomographic microscopy on shales at different synchrotron facilities: ALS, APS and SLS. J. Synchrotron Radiat. 20(1), 172–180 (2013)CrossRefGoogle Scholar
  27. 27.
    Kluyver, T., et al.: Jupyter notebooks-a publishing format for reproducible computational workflows. In: ELPUB, pp. 87–90 (2016)Google Scholar
  28. 28.
    Nteract: nteract/papermill, September 2019. https://github.com/nteract/papermill
  29. 29.
    Oliphant, T.E.: A Guide to NumPy, vol. 1. Trelgol Publishing USA (2006) Google Scholar
  30. 30.
    Ophus, C.: Four-dimensional scanning transmission electron microscopy (4D-STEM): from scanning nanodiffraction to ptychography and beyond. Microsc. Microanal. 25(3), 563–582 (2019)CrossRefGoogle Scholar
  31. 31.
    Panova, O., et al.: Diffraction imaging of nanocrystalline structures in organic semiconductor molecular thin films. Nat. Mater. 18, 860–865 (2019).  https://doi.org/10.1038/s41563-019-0387-3CrossRefGoogle Scholar
  32. 32.
    Pekin, T.C., et al.: Direct measurement of nanostructural change during in situ deformation of a bulk metallic glass. Nat. Commun. 10(1), 2445 (2019)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Soummer, R., Pueyo, L., Sivaramakrishnan, A., Vanderbei, R.J.: Fast computation of Lyot-style coronagraph propagation. Opt. Express 15(24), 15935–15951 (2007)CrossRefGoogle Scholar
  34. 34.
    Van Der Walt, S., Colbert, S.C., Varoquaux, G.: The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13(2), 22 (2011)CrossRefGoogle Scholar

Copyright information

© This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2020

Authors and Affiliations

  • Matthew L. Henderson
    • 1
    • 3
    Email author
  • William Krinsman
    • 1
    • 3
  • Shreyas Cholia
    • 1
    • 2
    • 3
  • Rollin Thomas
    • 2
    • 3
  • Trevor Slaton
    • 2
    • 3
  1. 1.Computational Research DivisionBerkeleyUSA
  2. 2.NERSCBerkeleyUSA
  3. 3.Lawrence Berkeley National LaboratoryBerkeleyUSA

Personalised recommendations