Accelerating Experimental Science Using Jupyter and NERSC HPC
- 23 Downloads
Large scale experimental science workflows require support for a unified, interactive, real-time platform that can manage a distributed set of resources connected to High Performance Computing (HPC) systems. What is needed is a tool that provides the ease-of-use and interactivity of a web science gateway, while providing the scientist the ability to build custom, ad-hoc workflows in a composable way. The Jupyter platform can play a key role here to enable the ingestion and analysis of real-time streaming data, integrate with HPC resources in a closed-loop, and enable interactive ad-hoc analyses with running workflows.
We want to enable high-quality reproducible human-in-the-loop science using HPC and Jupyter at the National Energy Research Scientific Computing Center (NERSC). Achieving that goal is challenging in the general case because scientific workflows and data can vary significantly in size and type between disciplines. There are many areas of work to achieve highly reproducible science, let alone human-in-the-loop interactive scientific workflows, but we focus here on some basic elements for enabling an improved interactive HPC experience including creating reusable recipes and workflows with Notebooks, sharing and cloning Notebooks, and parallelization and scaling of scientific code requiring HPC and using Jupyter.
KeywordsHPC Interactive Jupyter Scientific workflows Reuse Parameters
This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.
We wish to thank the Jupyter team; Colin Ophus, Benjamin Savitzky, and Steven Zeltmann at NCEM; and Dilworth Parkinson at ALS Beamline 8.3.2. We would also like to thank Lindsey Heagy for the geoscience Notebook example.
- 1.Clonenotebooks. https://github.com/krinsman/clonenotebooks/
- 3.Convert notebooks to other formats. https://nbconvert.readthedocs.io/en/latest/
- 4.dask-jobqueue. https://jobqueue.dask.org/en/latest/
- 5.Gif quantum k2 system. https://www.gatan.com/products/tem-imaging-spectroscopy/gif-quantum-k2-system
- 6.Ipyparallel: Using ipython for parallel computing. https://ipyparallel.readthedocs.io/en/latest/
- 7.Jupyter contents api. https://jupyter-notebook.readthedocs.io/en/stable/extending/contents.html
- 8.Jupyterlab: Real time collaboration. https://github.com/jupyterlab/jupyterlab/issues/5382
- 9.Jupyterlab: The next generation web-based user interface for project jupyter. https://github.com/jupyterlab/jupyterlab
- 10.lbnl-2019-resistive-casing notebook. https://github.com/simpeg-research/lbnl-2019-resistive-casing
- 11.mlhenderson fork of lbnl-2019-resistive-casing notebook. https://github.com/mlhenderson/lbnl-2019-resistive-casing
- 12.nbviewer. https://nbviewer.jupyter.org/
- 13.Scalable analytics in python. https://dask.org/
- 15.Pangeo (2018). https://pangeo.io/
- 16.Bsavitzky, et al.: py4dstem/py4dstem: Doi release, July 2019. https://doi.org/10.5281/zenodo.3333960
- 17.Clausen, A., et al.: Libertem/libertem: 0.1.0, November 2018. https://doi.org/10.5281/zenodo.1478763
- 19.Dask: dask-labextension, September 2019. https://github.com/dask/dask-labextension
- 21.Folk, M., Cheng, A., Yates, K.: HDF5: a file format and I/O library for high performance computing applications. In: Proceedings of Supercomputing, vol. 99, pp. 5–33 (1999)Google Scholar
- 24.Jupyter: jupyter/nbviewer, September 2019. https://github.com/jupyter/nbviewer
- 25.Project Jupyter, et al.: Binder 2.0 - reproducible, interactive, sharable environments for science at scale. In: Akici, F., Lippa, D., Niederhut, D., Pacer, M. (eds.) Proceedings of the 17th Python in Science Conference, pp. 113–120 (2018). https://doi.org/10.25080/Majora-4af1f417-011
- 27.Kluyver, T., et al.: Jupyter notebooks-a publishing format for reproducible computational workflows. In: ELPUB, pp. 87–90 (2016)Google Scholar
- 28.Nteract: nteract/papermill, September 2019. https://github.com/nteract/papermill
- 29.Oliphant, T.E.: A Guide to NumPy, vol. 1. Trelgol Publishing USA (2006) Google Scholar