Advertisement

On Developing Data Science

  • Michael L. BrodieEmail author
Chapter

Abstract

Understanding phenomena based on the facts—on the data—is a touchstone of data science. The power of evidence-based, inductive reasoning distinguishes data science from science. Hence, this chapter argues that, in its initial stages, data science applications and the data science discipline itself be developed inductively and deductively in a virtuous cycle.

The virtues of the twentieth Century Virtuous Cycle (aka virtuous hardware-software cycle, Intel-Microsoft virtuous cycle) that built the personal computer industry (National Research Council, The new global ecosystem in advanced computing: Implications for U.S. competitiveness and national security. The National Academies Press, Washington, DC, 2012) were being grounded in reality and being self-perpetuating—more powerful hardware enabled more powerful software that required more powerful hardware, enabling yet more powerful software, and so forth. Being grounded in reality—solving genuine problems at scale—was critical to its success, as it will be for data science. While it lasted, it was self-perpetuating, due to a constant flow of innovation, and to benefitting all participants—producers, consumers, the industry, the economy, and society. It is a wonderful success story for twentieth Century applied science. Given the success of virtuous cycles in developing modern technology, virtuous cycles grounded in reality should be used to develop data science, driven by the wisdom of the sixteenth Century proverb, Necessity is the mother of invention.

This chapter explores this hypothesis using the example of the evolution of database management systems over the last 40 years. For the application of data science to be successful and virtuous, it should be grounded in a cycle that encompasses industry (i.e., real problems), research, development, and delivery. This chapter proposes applying the principles and lessons of the virtuous cycle to the development of data science applications; to the development of the data science discipline itself, for example, a data science method; and to the development of data science education; all focusing on the critical role of collaboration in data science research and management, thereby addressing the development challenges faced by the more than 150 Data Science Research Institutes (DSRIs) worldwide. A companion chapter (Brodie, What is Data Science, in Braschler et al (Eds.), Applied data science – Lessons learned for the data-driven business, Springer 2019), addresses essential questions that DSRIs should answer in preparation for the developments proposed here: What is data science? What is world-class data science research?

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgments

Thanks to Dr. Thilo Stadelmann, Zurich University of Applied Sciences, Institute for Applied Information Technology in the Swiss Fachhochschule system, for insights into these ideas; and to Dr. He H. (Anne) Ngu, Texas State University, for insights into applying these principles and pragmatics to the development of Texas State University’s Twenty-First Century Applied PhD Program in Computer Science.

References

  1. ACM. (2015). Michael Stonebraker, 2014 Turing Award Citation, Association of Computing Machinery, April 2015. http://amturing.acm.org/award_winners/stonebraker_1172121.cfm
  2. AJTR. (2018). American Journal of Translational Research, e-Century Publishing Corporation. http://www.ajtr.org
  3. Angwin, J., Larson, J., Mattu, S., Kirchner, L., Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks, ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
  4. Braschler, M., Stadelmann, T., & Stockinger, K. (Eds.). (2019). Applied data science – Lessons learned for the data-driven business. Berlin: Springer.Google Scholar
  5. Brodie, M. L. (2015). Understanding data science: An emerging discipline for data-intensive discovery. In S. Cutt (Ed.), Getting data right: Tackling the challenges of big data volume and variety. Sebastopol, CA: O’Reilly Media.Google Scholar
  6. Brodie, M. L. (2019a). What is data science? In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.Google Scholar
  7. Brodie, M. L. (Ed.). (2019b, January). Making databases work: The pragmatic wisdom of Michael Stonebraker. ACM Books series (Vol. 22). San Rafael, CA: Morgan & Claypool.Google Scholar
  8. Chipman, I., (2016). How data analytics is going to transform all industries. Stanford Engineering Magazine, February 13, 2016.Google Scholar
  9. Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.CrossRefGoogle Scholar
  10. Davenport, T. H., & Patil, D. J. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(10), 70–76.Google Scholar
  11. Demirkan, H. & Dal, B. (2014). The data economy: Why do so many analytics projects fail? Analytics Magazine, July/August 2014Google Scholar
  12. Dohzen, T., Pamuk, M., Seong, S. W., Hammer, J., & Stonebraker, M. (2006). Data integration through transform reuse in the Morpheus project (pp. 736–738). ACM SIGMOD International Conference on Management of Data, Chicago, IL, June 27–29, 2006.Google Scholar
  13. Economist. (2017). Who’s afraid of disruption? The business world is obsessed with digital disruption, but it has had little impact on profits, The Economist, September 30, 2017.Google Scholar
  14. Economist. (March 2018a). GrAIt expectations, Special Report AI in Business, The Economist, March 31, 2018.Google Scholar
  15. Economist. (March 2018b). External providers: Leave it to the experts, Special report AI in business, The Economist, March 31, 2018.Google Scholar
  16. Economist. (March 2018c). The future: Two-faced, Special report AI in business, The Economist, March 31, 2018.Google Scholar
  17. Economist. (March 2018d). Supply chains: In algorithms we trust, Special report AI in business, The Economist, March 31, 2018.Google Scholar
  18. Economist. (March 2018e). America v China: The battle for digital supremacy: America’s technological hegemony is under threat from China, The Economist, March 15, 2018.Google Scholar
  19. Economist. (2018f). A study finds nearly half of jobs are vulnerable to automation, The Economist, April 24, 2018.Google Scholar
  20. Fang, F. C., & Casadevall, A. (2010). Lost in translation-basic science in the era of translational research. Infection and Immunity, 78(2), 563–566.CrossRefGoogle Scholar
  21. Forrester. (2015a). Brief: Why data-driven aspirations fail. Forrester Research, Inc., October 7, 2015.Google Scholar
  22. Forrester. (2015b). Predictions 2016: The path from data to action for marketers: How marketers will elevate systems of insight. Forrester Research, November 9, 2015.Google Scholar
  23. Forrester. (2017). The Forrester WaveTM: Data preparation tools, Q1 2017, Forrester, March 13, 2017.Google Scholar
  24. Gartner G00310700. (2016). Survey analysis: Big data investments begin tapering in 2016, Gartner, September 19, 2016.Google Scholar
  25. Gartner G00316349. (2016). Predicts 2017: Analytics strategy and technology, Gartner, report G00316349, November 30, 2016.Google Scholar
  26. Gartner G00301536. (2017). 2017 Magic quadrant for data science platforms, 14 February 2017.Google Scholar
  27. Gartner G00315888. (2017) Market guide for data preparation, Gartner, 14 December 2017.Google Scholar
  28. Gartner G00326671. (2017). Critical capabilities for data science platforms, Gartner, June 7, 2017.Google Scholar
  29. Gartner G00326456. (2018). Magic quadrant for data science and machine-learning platforms, 22 February 2018.Google Scholar
  30. Gartner G00326555. (2018). Magic quadrant for analytics and business intelligence platforms, 26 February 2018.Google Scholar
  31. Gartner G00335261. (2018) Critical capabilities for data science and machine learning platforms, 4 April 2018.Google Scholar
  32. Harari, Y. N. (2016). Homo Deus: A brief history of tomorrow, Random House, 2016.Google Scholar
  33. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124.CrossRefGoogle Scholar
  34. Lee, K-F., The real threat of artificial intelligence. New York Times, June 24, 2017.Google Scholar
  35. Lohr, S. & Singer, N. (2016) How data failed us in calling an election. New York Times, November 10, 2016.Google Scholar
  36. Marr, B., (2017). How big data is transforming every business. In Every Industry, Forbes.com, November 21, 2017.
  37. Meierhofer, J., Stadelmann, T., & Cieliebak, M. (2019). Data products. In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.Google Scholar
  38. Nagarajan, M., et al. (2015). Predicting future scientific discoveries based on a networked analysis of the past literature. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15) (pp. 2019–2028). New York, NY: ACM.CrossRefGoogle Scholar
  39. National Research Council. (2012). The new global ecosystem in advanced computing: Implications for U.S. competitiveness and national security. Washington, DC: The National Academies Press.Google Scholar
  40. Naumann, F. (2018). Genealogy of relational database management systems. Hasso-Plattner Institüt, Universität, Potsdam. https://hpi.de/naumann/projects/rdbms-genealogy.html
  41. Nedelkoska, L., & Quintini, G. (2018) Automation, skills use and training. OECD Social, Employment and Migration Working Papers, No. 202, OECD Publishing, Paris, doi: https://doi.org/10.1787/2e2f4eea-en.
  42. New York Times. (2018). H&M, a Fashion Giant, has a problem: $4.3 Billion in unsold clothes. New York Times, March 27, 2018.Google Scholar
  43. O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. New York, NY: Crown Publishing Group.zbMATHGoogle Scholar
  44. Olson, M. (2019). Stonebraker and open source, to appear in (Brodie 2019b)Google Scholar
  45. Palmer, A. (2019) How to create & run a Stonebraker Startup – The Real Story, to appear in (Brodie 2019b).Google Scholar
  46. Piatetsky, G. (2016). Trump, failure of prediction, and lessons for data scientists, KDnuggets, November 2016.Google Scholar
  47. Ramanathan, A. (2016). The data science delusion, Medium.com, November 18, 2016.
  48. Russel, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Boston, MA: Pearson Education.Google Scholar
  49. Spangler, S., et al. (2014). Automated hypothesis generation based on mining scientific literature. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14) (pp. 1877–1886). New York, NY: ACM.Google Scholar
  50. STM. (2018). Science Translational Medicine, a journal of the American Association for the Advancement of Science.Google Scholar
  51. Stonebraker, M. (2019a). How to start a company in 5 (not so) easy steps, to appear in (Brodie 2019b).Google Scholar
  52. Stonebraker, M. (2019b). Where do good ideas come from and how to exploit them? to appear in (Brodie 2019b).Google Scholar
  53. Stonebraker, M., & Kemnitz, G. (1991). The postgres next generation database management system. Communications of the ACM, 34(10), 78–92.CrossRefGoogle Scholar
  54. Stonebraker, M., Wong, E., Kreps, P., & Held, G. (1976). The design and implementation of INGRES. ACM Transactions on Database Systems, 1(3), 189–222.CrossRefGoogle Scholar
  55. Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., et al. (2005). C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 2005.Google Scholar
  56. Stonebraker, M., Castro Fernandez, R., Deng, D., & Brodie, M. L. (2016a). Database decay and what to do about it. Communications of the ACM, 60(1), 10–11.CrossRefGoogle Scholar
  57. Stonebraker, M., Deng, D., & Brodie, M. L. (2016b). Database decay and how to avoid it. In Proceedings of the IEEE International Conference on Big Data (pp. 1–10), Washington, DC.Google Scholar
  58. Stonebraker, M., Deng, D., & Brodie, M. L. (2017). Application-database co-evolution: A new design and development paradigm. In New England Database Day (pp. 1–3).Google Scholar
  59. van der Aalst, W. M. P. (2014). Data scientist: The engineer of the future. In K. Mertins, F. Bénaben, R. Poler, & J.-P. Bourrières (Eds.) Presented at the Enterprise Interoperability VI (pp. 13–26). Cham: Springer International Publishing.Google Scholar
  60. Veeramachaneni, K. (2016). Why you’re not getting value from your data science. Harvard Business Review, December 7, 2016.Google Scholar
  61. Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computer Science and Artificial Intelligence LaboratoryMassacheusetts Institute of TechnologyCambridgeUSA

Personalised recommendations