Abstract
Data science, a new discovery paradigm, is potentially one of the most significant advances of the early twenty-first century. Originating in scientific discovery, it is being applied to every human endeavor for which there is adequate data. While remarkable successes have been achieved, even greater claims have been made. Benefits, challenge, and risks abound. The science underlying data science has yet to emerge. Maturity is more than a decade away. This claim is based firstly on observing the centuries-long developments of its predecessor paradigms—empirical, theoretical, and Jim Gray’s Fourth Paradigm of Scientific Discovery (Hey et al., The fourth paradigm: data-intensive scientific discovery Edited by Microsoft Research, 2009) (aka eScience, data-intensive, computational, procedural)—and secondly on my studies of over 150 data science use cases, several data science-based startups, and, on my scientific advisory role for Insight (https://www.insight-centre.org/), a Data Science Research Institute (DSRI) that requires that I understand the opportunities, state of the art, and research challenges for the emerging discipline of data science. This chapter addresses essential questions for a DSRI: What is data science? What is world-class data science research? A companion chapter (Brodie, On Developing Data Science, in Braschler et al. (Eds.), Applied data science – Lessons learned for the data-driven business, Springer 2019) addresses the development of data science applications and of the data science discipline itself.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brodie, M. L. (2014a, June). The first law of data science: Do umbrellas cause rain? KDnuggets.
Brodie, M. L. (2014b, October). Piketty revisited: Improving economics through data science – How data curation can enable more faithful data science (in much less time). KDnuggets.
Brodie, M. L. (2015a, June). Understanding data science: An emerging discipline for data-intensive discovery. In S. Cutt (Ed.), Getting data right: Tackling the challenges of big data volume and variety. Sebastopol, CA: O’Reilly Media.
Brodie, M. L. (2015b, July). Doubt and verify: Data science power tools. KDnuggets. Republished on ODBMS.org.
Brodie, M. L. (2015c, November). On political economy and data science: When a discipline is not enough. KDnuggets. Republished ODBMS.org November 20, 2015.
Brodie, M. L. (2018, January 1). Why understanding truth is important in data science? KDnuggets. Republished Experfy.com, February 16, 2018.
Brodie, M. L. (2019). On developing data science, to appear. In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.
Cambridge Mobile Telematics. (2018, April 2). Distraction 2018: Data from over 65 million trips shows that distracted driving is increasing.
Castanedo, F. (2015, August). Data preparation in the big data era: Best practices for data integration. Boston: O’Reilly.
Dasu, T., & Johnson, T. (2003). Exploratory data mining and cleaning. Hoboken, NJ: Wiley-IEEE.
Data Science. (2018). Opportunities to transform chemical sciences and engineering. A Chemical Sciences Roundtable Workshop, National Academies of Science, February 27–28, 2018.
Demirkan, H., & Dal, B. (2014, July/August). The data economy: Why do so many analytics projects fail? Analytics Magazine.
Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1–15). Berlin: Springer.
Dingus, T. A., et al. (2016). Driver crash risk factors and prevalence evaluation using naturalistic driving data. Proceedings of the National Academy of Sciences, 113(10), 2636–2641. https://doi.org/10.1073/pnas.1513271113.
Economist. (2017a, April 12). How Germany’s Otto uses artificial intelligence. The Economist.
Economist. (2017b, May 4). The World’s most valuable resource. The Economist.
Economist. (2018a, January 6). Many happy returns: New data reveal long-term investment trends. The Economist.
Economist. (2018b, February 24). Economists cannot avoid making value judgments: Lessons from the “repugnant” market for organs. The Economist.
Economist. (2018c, March 28). In algorithms we trust: How AI is spreading throughout the supply chain. The Economist.
Eriksson, J., Girod, L., Hull, B., Newton, R., Madden, S., & Balakrishnan, H. (2008) The pothole patrol: Using a mobile sensor network for road surface monitoring. In Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services (MobiSys ’08). ACM, New York, NY.
Forrester. (2015, November 9). Predictions 2016: The path from data to action for marketers: How marketers will elevate systems of insight. Forrester Research.
Forrester. (2017, March 7). The Forrester wave: Predictive analytics and machine learning solutions, Q1 2017.
Gartner G00301536. (2017, February 14). 2017 magic quadrant for data science platforms.
Gartner G00310700. (2016, September 19). Survey analysis: Big data investments begin tapering in 2016. Gartner.
Gartner G00315888. (2017, December 14). Market guide for data preparation. Gartner.
Gartner G00326671. (2017, June 7). Critical capabilities for data science platforms. Gartner.
Hey, T., Tansley, S., & Tolle, K. (Eds.). (2009). The fourth paradigm: Data-intensive scientific discovery Edited by Microsoft Research.
Jenkins, J. M., Caldwell, D. A., Chandrasekaran, H., Twicken, J. D., Bryson, S. T., Quintana, E. V., et al. (2010). Overview of the Kepler science processing pipeline. The Astrophysical Journal Letters, 713(2), L87.
Liu, J. T. (2012). Shadow theory, data model design for data integration. CoRR, 1209, 2012. arXiv:1209.2647.
Lohr, S. (2014, August 17). For big-data scientists, ‘Janitor Work’ is key hurdle to insights. New York Times.
Lohr, S., & Singer, N. (2016). How data failed us in calling an election. The New York Times, 10, 2016.
Mayo, M. (2017, May 31) Data preparation tips, tricks, and tools: An interview with the insiders. KDnuggets.
Nagarajan, M. et al. (2015). Predicting future scientific discoveries based on a networked analysis of the past literature. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15). ACM, New York, NY, pp. 2019–2028.
NSF. (2016, December). Realizing the potential of data science. Final Report from the National Science Foundation Computer and Information Science and Engineering Advisory Committee Data Science Working Group.
Pearl, J. (2009a). Causality: Models, reasoning, and inference. New York: Cambridge University Press.
Pearl, J. (2009b). Epilogue: The art and science of cause and effect. In J. Pearl (Ed.), Causality: Models, reasoning, and inference (pp. 401–428). New York: Cambridge University Press.
Pearl, J. (2009c). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.
Piketty, T. (2014). Capital in the 21st century. Cambridge: The Belknap Press.
Press, G. (2016, May 23). Cleaning big data: Most time-consuming, least enjoyable data science task, survey says. Forbes.
Reimsbach-Kounatze, C. (2015). The proliferation of “big data” and implications for official statistics and statistical agencies: A preliminary analysis. OECD Digital Economy Papers, No. 245, OECD Publishing, Paris. https://doi.org/10.1787/5js7t9wqzvg8-en
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al. (2017). Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. ArXiv E-Prints, cs.AI.
Singh, G., et al. (2007). Optimizing workflow data footprint special issue of the scientific programming journal dedicated to dynamic computational workflows: Discovery, optimisation and scheduling.
Spangler, S., et al. (2014). Automated hypothesis generation based on mining scientific literature. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14). ACM, New York, NY, pp. 1877–1886.
Stoica, I., et al. (2017, October 16). A Berkeley view of systems challenges for AI. Technical Report No. UCB/EECS-2017-159.
Thakur, A. (2016, July 21). Approaching (almost) any machine learning problem. The Official Blog of Kaggle.com.
Veeramachaneni, K. (2016, December 7). Why you’re not getting value from your data science. Harvard Business Review.
Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: A revolution that will transform supply chain design and management. Journal of Business Logistics, 34, 77–84. https://doi.org/10.1111/jbl.12010.
Winship, C., & Morgan, S. L. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25(1), 659–706. https://doi.org/10.1146/annurev.soc.25.1.659.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Brodie, M.L. (2019). What Is Data Science?. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-11821-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11820-4
Online ISBN: 978-3-030-11821-1
eBook Packages: Computer ScienceComputer Science (R0)