Towards High Performance Data Analytics for Climate Change

Fiore, Sandro; Elia, Donatello; Palazzo, Cosimo; Antonio, Fabrizio; D’Anca, Alessandro; Foster, Ian; Aloisio, Giovanni

doi:10.1007/978-3-030-34356-9_20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

International Conference on High Performance Computing

5960 Accesses
2 Citations

Abstract

The continuous increase in the data produced by simulations, experiments and edge components in the last few years has forced a shift in the scientific research process, leading to the definition of a fourth paradigm in Science, concerning data-intensive computing. This data deluge, in fact, introduces various challenges related to big data volumes, formats heterogeneity and the speed in the data production and gathering that must be handled to effectively support scientific discovery. To this end, High Performance Computing (HPC) and data analytics are both considered as fundamental and complementary aspects of the scientific process and together contribute to a new paradigm encompassing the efforts from the two fields called High Performance Data Analytics (HPDA). In this context, the Ophidia project provides a HPDA framework which joins the HPC paradigm with scientific data analytics. This contribution presents some aspects regarding the Ophidia HPDA framework, such as the multidimensional storage model, its distributed and hierarchical implementation along with a benchmark of a parallel in-memory time series reduction operator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
OPH_REDUCE2 documentation http://ophidia.cmcc.it/documentation/users/operators/OPH_REDUCE2.html.
2.
GlusterFS documentation https://docs.gluster.org/en/latest/.
3.
ICCLIM (Indice Calculation CLIMate) https://icclim.readthedocs.io/en/latest/ intro.html.
4.
NCAR command language https://www.ncl.ucar.edu/.
5.
PyOphidia - Conda Forge https://anaconda.org/conda-forge/pyophidia.
6.
Dask, library for dynamic task scheduling https://dask.org.
7.
Pangeo. A community platform for big data geoscience. https://pangeo.io/.
8.
The ESiWACE Center of Excellence on Weather and Climate Simulations in Europe project https://www.esiwace.eu/.
9.
ESiWACE Earth System Data Middleware https://github.com/ESiWACE/esdm.

References

Aloisio, G., Fiore, S.: Towards exascale distributed data management. Int. J. High Perform. Comput. Appl. 23(4), 398–400 (2009). https://doi.org/10.1177/1094342009347702
Article Google Scholar
Aloisio, G., Fiore, S., Foster, I., Williams, D.: Scientific big data analytics challenges at large scale. Proceedings of Big Data and Extreme-scale Computing (BDEC) (2013)
Google Scholar
Asch, M., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. Comput. Appl. 32(4), 435–479 (2018). https://doi.org/10.1177/1094342018778123
Article Google Scholar
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: The multidimensional database system RasDaMan. SIGMOD Rec. 27(2), 575–577 (1998). https://doi.org/10.1145/276305.276386
Article Google Scholar
Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., Widmann, N.: Spatio-temporal retrieval with RasDaMan. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB 1999 pp. 746–749. Morgan Kaufmann Publishers Inc., San Francisco (1999). http://dl.acm.org/citation.cfm?id=645925.671513
Baumann, P., Furtado, P., Ritsch, R., Widmann, N.: The RasDaMan approach to multidimensional database management. In: Proceedings of the 1997 ACM Symposium on Applied Computing, SAC 1997, pp. 166–173. ACM, New York (1997). https://doi.org/10.1145/331697.331732
Bell, G., Hey, T., Szalay, A.: Beyond the data deluge. Science 323(5919), 1297–1298 (2009). https://doi.org/10.1126/science.1170411
Article Google Scholar
Brown, P.G.: Overview of sciDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 963–968. ACM, New York (2010). https://doi.org/10.1145/1807167.1807271
D’Anca, A., et al.: On the use of in-memory analytics workflows to computer science indicators from large climate datasets. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 1035–1043, May 2017. https://doi.org/10.1109/CCGRID.2017.132
Dongarra, J., et al.: The international exascale software project roadmap. Int. J. High Perform. Comput. Appl. 25(1), 3–60 (2011). https://doi.org/10.1177/1094342010391989
Article Google Scholar
Elia, D., et al.: An in-memory based framework for scientific data analytics. In: Proceedings of the ACM International Conference on Computing Frontiers, CF 2016, pp. 424–429. ACM, New York (2016). https://doi.org/10.1145/2903150.2911719
Fiore, S., et al.: Ophidia: a full software stack for scientific data analytics. In: 2014 International Conference on High Performance Computing Simulation (HPCS), pp. 343–350, July 2014. https://doi.org/10.1109/HPCSim.2014.6903706
Fiore, S., et al.: Distributed and cloud-based multi-model analytics experiments on large volumes of climate change data in the earth system grid federation eco-system. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 2911–2918, December 2016. https://doi.org/10.1109/BigData.2016.7840941
Fiore, S., D’Anca, A., Palazzo, C., Foster, I.T., Williams, D.N., Aloisio, G.: Ophidia: toward big data analytics for escience. In: Proceedings of the International Conference on Computational Science, ICCS 2013, Barcelona, Spain, 5–7 June 2013, pp. 2376–2385 (2013). https://doi.org/10.1016/j.procs.2013.05.409
Fiore, S., et al.: Big data analytics on large-scale scientific datasets in the INDIGO-DataCloud project. In: Proceedings of the Computing Frontiers Conference, CF 2017, pp. 343–348. ACM, New York (2017). https://doi.org/10.1145/3075564.3078884
Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases. AD 2011, pp. 36–47. ACM, New York (2011). https://doi.org/10.1145/1966895.1966900
Golfarelli, M., Rizzi, S.: Data Warehouse Design: Modern Principles and Methodologies, 1st edn. McGraw-Hill Inc., New York (2009)
Google Scholar
Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005). https://doi.org/10.1145/1107499.1107503
Article Google Scholar
Hu, F., et al.: ClimateSpark: an in-memory distributed computing framework for big climate data analytics. Comput. Geosci. 115, 154–166 (2018). https://doi.org/10.1016/j.cageo.2018.03.011
Article Google Scholar
Palamuttam, R., et al.: SciSpark: applying in-memory distributed computing to weather event detection and tracking. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2020–2026, October 2015. https://doi.org/10.1109/BigData.2015.7363983
Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015). https://doi.org/10.1145/2699414
Article Google Scholar
Schulzweida, U.: CDO user guide - version 1.9.6 (2019). https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf
Stonebraker, M., Brown, P., Becla, J., Zhang, D.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013). https://doi.org/10.1109/MCSE.2013.19
Article Google Scholar
Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The Architecture of SciDB. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 1–16. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22351-8_1
Chapter Google Scholar
Wilson, B., et al.: SciSpark: highlyinteractive in-memory science data analytics. In: 2016 IEEE InternationalConference on Big Data (Big Data), pp. 2964–2973, December 2016. https://doi.org/10.1109/BigData.2016.7840948
Zender, C.S.: Analysis of self-describing gridded geoscience data with netCDF Operators (NCO). Environ. Model. Softw. 23(10), 1338–1342 (2008). https://doi.org/10.1016/j.envsoft.2008.03.004
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the EU H2020 Excellence in SImulation of Weather and Climate in Europe (ESiWACE) project (Grant Agreement 675191). Moreover, the authors would like to acknowledge Antonio Aloisio for his editing and proofreading work on this paper.

Author information

Authors and Affiliations

Euro-Mediterranean Center on Climate Change Foundation, Lecce, Italy
Sandro Fiore, Donatello Elia, Cosimo Palazzo, Fabrizio Antonio, Alessandro D’Anca & Giovanni Aloisio
University of Salento, Lecce, Italy
Donatello Elia & Giovanni Aloisio
University of Chicago & Argonne National Laboratory, Chicago, USA
Ian Foster

Authors

Sandro Fiore
View author publications
You can also search for this author in PubMed Google Scholar
Donatello Elia
View author publications
You can also search for this author in PubMed Google Scholar
Cosimo Palazzo
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Antonio
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro D’Anca
View author publications
You can also search for this author in PubMed Google Scholar
Ian Foster
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Aloisio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandro Fiore .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Swiss National Supercomputing Centre, Lugano, Ticino, Switzerland
Sadaf Alam
University of Tennessee at Knoxville, Knoxville, TN, USA
Heike Jagode

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fiore, S. et al. (2019). Towards High Performance Data Analytics for Climate Change. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-34356-9_20
Published: 03 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics