STARE into the future of GeoData integrative analysis

Abstract

Different kinds of observations feature different strengths, e.g. visible-infrared imagery for clouds and radar for precipitation, and, when integrated, better constrain scientific models and hypotheses. Even critical, fundamental operations such as cross-calibrations of related sensors operating on different platforms or orbits, e.g. spacecraft and aircraft, are integrative analyses. The great variety of Earth Science data types and the spatiotemporal irregularity of important low-level (ungridded) data has so far made their integration a customized, tedious process which scales in neither variety nor volume. Generic, higher-level (gridded) data products are easier to use, at the cost of being farther from the original observations and having to settle with grids, interpolation assumptions, and uncertainties that limit their applicability. The root cause of the difficulty in scalably bringing together diverse data is the current rectilinear geo-partitioning of Earth Science data into conventional arrays indexed using consecutive integer indices and then packaged into files. Such indices suffice for archival, search, and retrieval, but lack a common geospatial semantics, which is mitigated by adding on floating-point encoded longitude-latitude information for registration. An alternative to floating-point, the SpatioTemporal Adaptive Resolution Encoding (STARE) provides an integer encoding for geo-spatiotemporal location and neighborhood that transcends the use of files and native array indexing, allowing diverse data to be organized on scalable, distributed computing and storage platforms.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Memoization

References

  1. Clementini E, Sharma J, Egenhofer MJ (1994) Modelling topological spatial relations: strategies for query processing. Comput Graph 18(6):815–822. https://doi.org/10.1016/0097-8493(94)90007-8

    Article  Google Scholar 

  2. Dirmeyer PA, Wu J, Norton HE, Dorigo WA, Quiring SM, Ford TW, Santanello JA, Bosilovich MG, Koster RD (2016) Confronting weather and climate models with observational data from soil moisture networks over the United States. J Hydrometeorol 1049-1067. https://doi.org/10.1175/JHM-D-15-0196.1

  3. ERFA (2020) Essential Routines of Fundamental Astronomy derived from the International Astronomical Union’s Standards of Fundamental Astronomy (SOFA) (https://github.com/liberfa/erfa, https://iausofa.org). Accessed 20 Feb 2020

  4. Fekete G, Kuo KS (2015) Indexing Earth with Trixels, presented at the 8th XLDB Conference, May 19-20, 2015 Stanford University, CA, USA

  5. Gelaro R, McCarty W, Suárez MJ, Todling R, Molod A, Takacs L, Randles CA, Darmenov A, Bosilovich MG, Reichle R, Wargan K, Coy L, Cullather R, Draper C, Akella S, Buchard V, Conaty A, da Silva AM, Gu W, Kim GK, Koster R, Lucchesi R, Merkova D, Nielsen JE, Partyka G, Pawson S, Putman W, Rienecker M, Schubert SD, Sienkiewicz M, Zhao B (2017) The modern-era retrospective analysis for research and applications, version 2 (MERRA-2). J Clim 30:5419–5454. https://doi.org/10.1175/JCLI-D-16-0758.1

    Article  Google Scholar 

  6. GeoData (2020) https://github.com/SpatioTemporal/GeoData. Accessed Jan 2021

  7. GeoPandas (2020) https://geopandas.org; https://github.com/geopandas/geopandas. Accessed Feb 2020

  8. Gibb R (2019) 19170–1 UML update DGGS DWG, 113th OGC technical committee Toulouse, France, 18 November 2019

  9. Gorey C (2017) The volume of data NASA has to manage is mind-boggling, Silicon Republic, 26 Oct 2017, https://www.siliconrepublic.com/enterprise/nasa-data-figures. Accessed 17 Feb 2020

  10. Gray J, Szalay AS , Thakar AR, Fekete G, O'Mullane W, Nieto-Santisteban MA, Heber G, Rots AH (2004) “There Goes the Neighborhood: Relational Algebra for Spatial Data Search,” Microsoft Research Technical Report, MSR-TR-2004-32, April 2004. (arXiv.org. pp. arXiv:cs–0408031, Aug-2004)

  11. Griessbaum, N, Frew J, Rilee ML, Kuo KS, Gallagher J, Neumiller K (2020) “STARE dataframes for geospatial analysis - a high level STARE interface,” Earth Science Information Partners (ESIP), Winter Meeting, Bethesda, MD. 2–7 January 2020. STAREPandas is available at https://github.com/SpatioTemporal/STAREPandas

  12. HDF (2020) https://hdfgroup.org. Accessed Feb 2020

  13. HDF EOS (2020) Tools and Information Center. https://hdfeos.org. Accessed Feb 2020

  14. Herring JR (ed.) (2010) OpenGIS® implementation standard for geographic information - simple feature access - part 1: common architecture. OGC 06-103r4, open geospatial consortium, Inc., https://www.opengeospatial.org/standards/sfa

  15. Humphreys P (2008) Computational and conceptual emergence. Philos Sci 75:584–594

    Article  Google Scholar 

  16. Klein L, Taaheri A (2016) HDF-EOS5 data model, file format and library, ESDS-RFC-008v1.1, https://cdn.earthdata.nasa.gov/conduit/upload/4880/ESDS-RFC-008-v1.1.pdf. Accessed 31 Jan 2018

  17. Kleinhans MG, Buskes CJ, de Regt HW (2010) Philosophy of earth science. In: Allhoff J (ed) Philosophies of the sciences: a guide. Wiley-Blackwell, Oxford, pp 213–286

    Google Scholar 

  18. Kondor D, Dobos L, Csabai I, Bodor A, Vattay G, Budavári T, Szalay AS (2014) Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh. In Proceedings of the 26th International Conference on Scientific and Statistical Database Management (SSDBM '14). Association for Computing Machinery, New York, NY, USA, Article 4:1–4. https://doi.org/10.1145/2618243.2618245

  19. Konikow LF, Bredehoeft JD (1992) Ground-water models cannot be validated. Adv Water Resour 15:75–83

    Article  Google Scholar 

  20. Kunszt PZ, Szalay AS, Thakar AR (2001) The hierarchical triangular mesh. In: Banday A, Zaroubi S, Bartelmann M (eds) Mining the sky. ESO ASTROPHYSICS SYMPOSIA (European Southern Observatory). Springer, Berlin, Heidelberg. https://doi.org/10.1007/10849171_8

  21. Kuo KS, Rilee ML (2017) STARE – Toward unprecedented geo-data interoperability, 2017 Conference on Big Data from Space. Toulouse, France. 28–30 November 2017. STARE is available at https://github.com/SpatioTemporal/STARE

  22. Kuo KS, Yu H, Pan Y, Rilee ML (2019) Leveraging STARE for Co-aligned Data Locality with netCDF and Python MPI. 2019 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Yokohama, Japan, 2019, pp. 10063-10066. https://doi.org/10.1109/IGARSS.2019.8900423

  23. Kuo, KS, Pan Y, Zhu F, Rilee ML, Yu H (2018) A Big Earth Data Platform Exploiting Transparent Multimodal Parallelization. 2018 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 22–27 July 2018, Valencia, Spain 10.1109/IGARSS.2018.8518304

  24. Lee HB, Ghia U, Bayyuk S, Oberkampf WL, Roy CJ, Benek JA, Rumsey CL, Powers JM, Bush RH, Mani M (2016) Development and use of engineering standards for computational fluid dynamics for complex aerospace systems, 16th AIAA aviation technology, integration, and operations conference (2016 AIAA aviation); June 13–17, 2016, Washington, DC, United States

  25. Modis Characterization Support Team MCST (2012) MODIS level 1B products data dictionary, NASA/Goddard Space Flight Center, Greenbelt, MD MCST Internal Memorandum # M1055-REV D, July 20, 2012. https://mcst.gsfc.nasa.gov/sites/default/files/file_attachments/M1055_PDD_D_072712final.pdf

  26. NetCDF (2020) https://www.opengeospatial.org/standards/netcdf. Accessed Feb 2020

  27. Nishihama M, Wolfe R, Solomon D, Patt F, Blanchette J, Fleig A, and Masuoka E (1997) MODIS level 1A earth location: algorithm theoretical basis document. Greenbelt, MD: NASA Goddard Space Flight Center. https://modis.gsfc.nasa.gov/data/atbd/atbd_mod28_v3.pdf

  28. OGC (2020) Discrete Global Grid Systems SWG, https://www.opengeospatial.org/projects/groups/dggsswg. Accessed Feb 2020

  29. Oreskes N, Shrader-Frechette K, Belitz K (1994) Verification, validation, and confirmation of numerical models in the earth sciences. Science 263:641–646. https://doi.org/10.1126/science.263.5147.641

    Article  Google Scholar 

  30. Pugh W (1990) Skip lists: a probabilistic alternative to balanced trees. Commun ACM 33:668–676. https://doi.org/10.1145/78973.78977

    Article  Google Scholar 

  31. Purss MBJ, Gibb G, Samavati F, Peterson P, Ben J (2016) The OGC® discrete global grid system core standard: a framework for rapid geospatial integration. 2016 IEEE international geoscience and remote sensing symposium (IGARSS) 10–15 July 2016 https://doi.org/10.1109/IGARSS.2016.7729935

  32. Rilee M, Kuo KS, Frew J, Griessbaum N, Gallagher J (2020a) STARE towards integrative analysis with minimized data wrangling hassle. 2020 IEEE international geoscience and remote sensing symposium (IGARSS), virtual symposium. Paper TU2.R7.8, 29 September 2020

  33. Rilee M, Griessbaum N, Kuo KS, Frew J, Wolfe R (2020b) STARE-based Integrative Analysis of Diverse Data Using Dask Parallel Programming Demo Paper. Proceedings of the 28th International Conference on Advances in Geographic Information Systems. Association for Computing Machinery, New York, NY, USA, 417–420. https://doi.org/10.1145/3397536.3422346

  34. Rilee ML, Kuo KS, Clune T, Oloso A, Brown PG, Yu H (2016) Addressing the big-earth-data variety challenge with the hierarchical triangular mesh. 2016 IEEE Int’l. Conf. On Big Data (Big Data, IEEE), 1006–1011. https://www.sugarsync.com/pf/D7103074_07457104_9374790)

  35. Ruiz A (2017) The 80/20 Data Science Dilemma. https://www.infoworld.com/article/3228245/the-80-20-data-science-dilemma.html

  36. Seaman C (2013) Beginner’s guide to VIIRS imagery data, http://rammb.cira.colostate.edu/projects/npp/Beginner_Guide_to_VIIRS_Imagery_Data.pdf also https://ncc.nesdis.noaa.gov/VIIRS/. Accessed Feb 2020

  37. Stanford K (2017) Underdetermination of scientific theory, the Stanford encyclopedia of philosophy (winter 2017 edition). Edward N. Zalta (ed.), https://plato.stanford.edu/archives/win2017/entries/scientific-underdetermination/

  38. STARE (2020) https://github.com/SpatioTemporal/STARE. Accessed Feb 2020

  39. Stensrud DJ (2007) Parameterization schemes: keys to understanding numerical weather prediction models. Cambridge University Press, Cambridge

    Google Scholar 

  40. Szalay AS, Gray J, Fekete G, Kunszt PZ, Kukol P, Thakar A (2005) Indexing the sphere with the hierarchical triangular mesh, Micr. Res. Tech. Rpt., MSR-TR-2005-123

  41. Wimsatt WC (1997) Aggregativity: reductive heuristics for finding emergence. Philos Sci 64:S372–S384

    Article  Google Scholar 

  42. Yu L, Rilee ML, Pan Y, Zhu F, Kuo KS, Yu H (2017) Visual analytics with unparalleled variety scaling for big earth data. In: 2017 IEEE international conference on big data (big data), Boston, MA, pp 514–521 https://ieeexplore.ieee.org/document/8257966/

  43. Zarr (2020) https://zarr.readthedocs.io/en/stable/index.html. Accessed Feb 2020

Download references

Acknowledgments

We are grateful for the support provided by the National Aeronautics and Space Administration Advancing Collaborative Connections for Earth System Science (ACCESS-17) program, award ID 80NSSC18M0118. We gratefully acknowledge helpful comments from this paper’s reviewers.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michael L. Rilee.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rilee, M.L., Kuo, KS., Frew, J. et al. STARE into the future of GeoData integrative analysis. Earth Sci Inform (2021). https://doi.org/10.1007/s12145-021-00568-8

Download citation

Keywords

  • STARE
  • Big data
  • Geolocation
  • DGGS
  • Data fusion
  • Integration