Skip to main content

What Is Data Science?

  • Chapter
  • First Online:

Abstract

Data science, a new discovery paradigm, is potentially one of the most significant advances of the early twenty-first century. Originating in scientific discovery, it is being applied to every human endeavor for which there is adequate data. While remarkable successes have been achieved, even greater claims have been made. Benefits, challenge, and risks abound. The science underlying data science has yet to emerge. Maturity is more than a decade away. This claim is based firstly on observing the centuries-long developments of its predecessor paradigms—empirical, theoretical, and Jim Gray’s Fourth Paradigm of Scientific Discovery (Hey et al., The fourth paradigm: data-intensive scientific discovery Edited by Microsoft Research, 2009) (aka eScience, data-intensive, computational, procedural)—and secondly on my studies of over 150 data science use cases, several data science-based startups, and, on my scientific advisory role for Insight (https://www.insight-centre.org/), a Data Science Research Institute (DSRI) that requires that I understand the opportunities, state of the art, and research challenges for the emerging discipline of data science. This chapter addresses essential questions for a DSRI: What is data science? What is world-class data science research? A companion chapter (Brodie, On Developing Data Science, in Braschler et al. (Eds.), Applied data science – Lessons learned for the data-driven business, Springer 2019) addresses the development of data science applications and of the data science discipline itself.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Brodie, M. L. (2014a, June). The first law of data science: Do umbrellas cause rain? KDnuggets.

    Google Scholar 

  • Brodie, M. L. (2014b, October). Piketty revisited: Improving economics through data science – How data curation can enable more faithful data science (in much less time). KDnuggets.

    Google Scholar 

  • Brodie, M. L. (2015a, June). Understanding data science: An emerging discipline for data-intensive discovery. In S. Cutt (Ed.), Getting data right: Tackling the challenges of big data volume and variety. Sebastopol, CA: O’Reilly Media.

    Google Scholar 

  • Brodie, M. L. (2015b, July). Doubt and verify: Data science power tools. KDnuggets. Republished on ODBMS.org.

    Google Scholar 

  • Brodie, M. L. (2015c, November). On political economy and data science: When a discipline is not enough. KDnuggets. Republished ODBMS.org November 20, 2015.

    Google Scholar 

  • Brodie, M. L. (2018, January 1). Why understanding truth is important in data science? KDnuggets. Republished Experfy.com, February 16, 2018.

    Google Scholar 

  • Brodie, M. L. (2019). On developing data science, to appear. In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.

    Google Scholar 

  • Cambridge Mobile Telematics. (2018, April 2). Distraction 2018: Data from over 65 million trips shows that distracted driving is increasing.

    Google Scholar 

  • Castanedo, F. (2015, August). Data preparation in the big data era: Best practices for data integration. Boston: O’Reilly.

    Google Scholar 

  • Dasu, T., & Johnson, T. (2003). Exploratory data mining and cleaning. Hoboken, NJ: Wiley-IEEE.

    Book  Google Scholar 

  • Data Science. (2018). Opportunities to transform chemical sciences and engineering. A Chemical Sciences Roundtable Workshop, National Academies of Science, February 27–28, 2018.

    Google Scholar 

  • Demirkan, H., & Dal, B. (2014, July/August). The data economy: Why do so many analytics projects fail? Analytics Magazine.

    Google Scholar 

  • Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1–15). Berlin: Springer.

    Google Scholar 

  • Dingus, T. A., et al. (2016). Driver crash risk factors and prevalence evaluation using naturalistic driving data. Proceedings of the National Academy of Sciences, 113(10), 2636–2641. https://doi.org/10.1073/pnas.1513271113.

    Article  Google Scholar 

  • Economist. (2017a, April 12). How Germany’s Otto uses artificial intelligence. The Economist.

    Google Scholar 

  • Economist. (2017b, May 4). The World’s most valuable resource. The Economist.

    Google Scholar 

  • Economist. (2018a, January 6). Many happy returns: New data reveal long-term investment trends. The Economist.

    Google Scholar 

  • Economist. (2018b, February 24). Economists cannot avoid making value judgments: Lessons from the “repugnant” market for organs. The Economist.

    Google Scholar 

  • Economist. (2018c, March 28). In algorithms we trust: How AI is spreading throughout the supply chain. The Economist.

    Google Scholar 

  • Eriksson, J., Girod, L., Hull, B., Newton, R., Madden, S., & Balakrishnan, H. (2008) The pothole patrol: Using a mobile sensor network for road surface monitoring. In Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services (MobiSys ’08). ACM, New York, NY.

    Google Scholar 

  • Forrester. (2015, November 9). Predictions 2016: The path from data to action for marketers: How marketers will elevate systems of insight. Forrester Research.

    Google Scholar 

  • Forrester. (2017, March 7). The Forrester wave: Predictive analytics and machine learning solutions, Q1 2017.

    Google Scholar 

  • Gartner G00301536. (2017, February 14). 2017 magic quadrant for data science platforms.

    Google Scholar 

  • Gartner G00310700. (2016, September 19). Survey analysis: Big data investments begin tapering in 2016. Gartner.

    Google Scholar 

  • Gartner G00315888. (2017, December 14). Market guide for data preparation. Gartner.

    Google Scholar 

  • Gartner G00326671. (2017, June 7). Critical capabilities for data science platforms. Gartner.

    Google Scholar 

  • Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). The fourth paradigm: Data-intensive scientific discovery Edited by Microsoft Research.

    Google Scholar 

  • Jenkins, J. M., Caldwell, D. A., Chandrasekaran, H., Twicken, J. D., Bryson, S. T., Quintana, E. V., et al. (2010). Overview of the Kepler science processing pipeline. The Astrophysical Journal Letters, 713(2), L87.

    Article  Google Scholar 

  • Liu, J. T. (2012). Shadow theory, data model design for data integration. CoRR, 1209, 2012. arXiv:1209.2647.

    Google Scholar 

  • Lohr, S. (2014, August 17). For big-data scientists, ‘Janitor Work’ is key hurdle to insights. New York Times.

    Google Scholar 

  • Lohr, S., & Singer, N. (2016). How data failed us in calling an election. The New York Times, 10, 2016.

    Google Scholar 

  • Mayo, M. (2017, May 31) Data preparation tips, tricks, and tools: An interview with the insiders. KDnuggets.

    Google Scholar 

  • Nagarajan, M. et al. (2015). Predicting future scientific discoveries based on a networked analysis of the past literature. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, pp. 2019–2028.

    Google Scholar 

  • NSF. (2016, December). Realizing the potential of data science. Final Report from the National Science Foundation Computer and Information Science and Engineering Advisory Committee Data Science Working Group.

    Google Scholar 

  • Pearl, J. (2009a). Causality: Models, reasoning, and inference. New York: Cambridge University Press.

    Book  Google Scholar 

  • Pearl, J. (2009b). Epilogue: The art and science of cause and effect. In J. Pearl (Ed.), Causality: Models, reasoning, and inference (pp. 401–428). New York: Cambridge University Press.

    Chapter  Google Scholar 

  • Pearl, J. (2009c). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.

    Article  MathSciNet  Google Scholar 

  • Piketty, T. (2014). Capital in the 21st century. Cambridge: The Belknap Press.

    Google Scholar 

  • Press, G. (2016, May 23). Cleaning big data: Most time-consuming, least enjoyable data science task, survey says. Forbes.

    Google Scholar 

  • Reimsbach-Kounatze, C. (2015). The proliferation of “big data” and implications for official statistics and statistical agencies: A preliminary analysis. OECD Digital Economy Papers, No. 245, OECD Publishing, Paris. https://doi.org/10.1787/5js7t9wqzvg8-en

  • Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al. (2017). Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. ArXiv E-Prints, cs.AI.

    Google Scholar 

  • Singh, G., et al. (2007). Optimizing workflow data footprint special issue of the scientific programming journal dedicated to dynamic computational workflows: Discovery, optimisation and scheduling.

    Google Scholar 

  • Spangler, S., et al. (2014). Automated hypothesis generation based on mining scientific literature. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14). ACM, New York, NY, pp. 1877–1886.

    Google Scholar 

  • Stoica, I., et al. (2017, October 16). A Berkeley view of systems challenges for AI. Technical Report No. UCB/EECS-2017-159.

    Google Scholar 

  • Thakur, A. (2016, July 21). Approaching (almost) any machine learning problem. The Official Blog of Kaggle.com.

    Google Scholar 

  • Veeramachaneni, K. (2016, December 7). Why you’re not getting value from your data science. Harvard Business Review.

    Google Scholar 

  • Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: A revolution that will transform supply chain design and management. Journal of Business Logistics, 34, 77–84. https://doi.org/10.1111/jbl.12010.

    Article  Google Scholar 

  • Winship, C., & Morgan, S. L. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25(1), 659–706. https://doi.org/10.1146/annurev.soc.25.1.659.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael L. Brodie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Brodie, M.L. (2019). What Is Data Science?. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11821-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11820-4

  • Online ISBN: 978-3-030-11821-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics