An Insight into Theory-Guided Climate Data Science—A Literature Review

  • Rafiya SheikhEmail author
  • Sunita Jahirabadkar
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 38)


Data science models, though successful in a large number of commercial domains, have found limited applications in scientific problems that involve complex physical phenomena. Most of these problems comprise of multi-spectral data composites. Climate science and hydrology is one such scientific domain that faces several big data challenges. Climate data poses many challenges in research because of its spatiotemporal characteristics, high degree of variance, and predominantly its physical nature. One such challenging data in climate science and hydrology is precipitation data. Precipitation data is vast, and generated at a fast pace from several sources, but due to the lack of underlying principles, the models in data science to address climatic issues such as precipitation are dysfunctional. These challenges call for a novel approach that integrates domain knowledge and data science models. To do so, the paper surveys an evolving paradigm of theory-guided data science (TGDS). It is a new paradigm in data science and analytics that aims to improve the generalization of data science models and improve their effectiveness in scientific discovery. The authors, through the survey, present the challenges imposed by climate data, which is representative of the precipitation data, and limitations of traditional data science methods. The paper suggests a shift in data science practices to adapt theory-guided data science for climate and hydrology domain of precipitation data, by providing insights on TGDS, its models and approaches.


Data science Theory-guided Knowledge discovery Climate change Climate science Precipitation 


  1. 1.
    Brad B, Jacques B, Michael C, Richard D, Angela H, James M, Charles R (2011) Big data: the next frontier for innovation, competition, and productivity. The McKinsey Global InstituteGoogle Scholar
  2. 2.
    Economist (2010) The data deluge. Special SupplementGoogle Scholar
  3. 3.
    Szalay A, Bell G, Hey T (2009) Beyond the data deluge. Science 323(5919):1297–1298CrossRefGoogle Scholar
  4. 4.
    Halevy A, Pereira F, Norvig P (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2):8–12CrossRefGoogle Scholar
  5. 5.
    Anderson C (2008) The end of theory: the data deluge makes the scientific method obsolete. Wired MagGoogle Scholar
  6. 6.
    A guide to earth science data: summary and research challenges. IEEE, 11 Nov 2015
  7. 7.
    Faghmous JH, Kumar V (2014) A big data guide to understanding climate change: the case for theory-guided data science. Big Data 2(3). Scholar
  8. 8.
    Karpatne A, Banerjee A, Ganguly A, Atluri G, Faghmous J, Steinbach M, Samatova N, Shekhar S, Kumar V (2017) Theory-guided data science: a new paradigm for scientific discovery. IEEE Trans Knowl Data Eng 29(10):2318–2331.
  9. 9.
    Banerjee A, Shekhar S, Faghmous JH (2010) Theory-guided data science for climate change. Published by IEEE Computer Society in November 2014Google Scholar
  10. 10.
    Lazer D, Kennedy R, King G, Vespignani A (2014) The parable of Google flu: traps in big data analysis. Science 343(6176):1203–1205CrossRefGoogle Scholar
  11. 11.
    National Climatic Data Center (NCDC). Last Accessed on 25/2/2017
  12. 12.
    SPARC Data Center. Last Accessed on 25/2/2017
  13. 13.
    Modern-Era Retrospective analysis for Research and Applications (MERRA). Last Accessed on 25/2/2017
  14. 14.
    Reanalysis intercomparison and observations. Last Accessed on 25/2/2017
  15. 15.
    Meehl GA, Taylor KE, Stouffer RJ (2012) An overview of cmip5 and the experiment design. Bull Am Meteor Soc 93(4):485–498CrossRefGoogle Scholar
  16. 16.
    Kumar V (2016) AAAI Symposium, 17 Nov 2016, University of MinnesotaGoogle Scholar
  17. 17.
    Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning. Springer series in statistics, vol 1. Springer, BerlinGoogle Scholar
  18. 18.
    Xiao H, Wu JL, Wang JX (2016) Physics-informed machine learning for predictive turbulence modeling: using data to improve RANS modeled reynolds stresses. arXiv preprint. arXiv:1606.07987

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer EngineeringM.K.S.S.S. Cummins College of Engineering for WomenPuneIndia

Personalised recommendations