Skip to main content

Data Science Challenges

  • Chapter
  • First Online:
Data Science Thinking

Part of the book series: Data Analytics ((DAANA))

Abstract

What are the greatest challenges of big data and data science? This question itself is problematic as data science is at a very early stage and has been built on existing disciplines. This chapter explores this important issue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    More discussion on involving and synthesizing various complexities and intelligences is available in Sects. 5.3.2 and 5.3.3.

References

  1. ASA: Ethical guidelines for statistical practice, American statistical association (2016). URL https://www.certifiedanalytics.org/ethics.php

  2. Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques. Springer (2016)

    Book  Google Scholar 

  3. BI: Behavioral insights (2014). URL http://www.behaviouralinsights.co.uk/

  4. Bynum, T.: Computer and information ethics. In: The Stanford encyclopedia of philosophy (ed. Zalta EN) (2015). URL See http://plato.stanford.edu/archives/win2015/entries/ethics-computer/

  5. Cao, L.: Domain driven data mining: Challenges and prospects. IEEE Trans. on Knowledge and Data Engineering 22(6), 755–769 (2010)

    Article  Google Scholar 

  6. Cao, L.: In-depth behavior understanding and use: The behavior informatics approach. Information Science 180(17), 3067–3085 (2010)

    Article  Google Scholar 

  7. Cao, L.: Combined mining: Analyzing object and pattern relations for discovering and constructing complex but actionable patterns. WIREs Data Mining and Knowledge Discovery 3(2), 140–155 (2013)

    Article  Google Scholar 

  8. Cao, L.: Non-iidness learning in behavioral and social data. The Computer Journal 57(9), 1358–1370 (2014)

    Article  Google Scholar 

  9. Cao, L.: Coupling learning of complex interactions. J. Information Processing and Management 51(2), 167–186 (2015)

    Article  Google Scholar 

  10. Cao, L.: Metasynthetic Computing and Engineering of Complex Systems. Springer (2015)

    Google Scholar 

  11. Cao, L.: Data science: Challenges and directions (2016). Technical Report, UTS Advanced Analytics Institute

    Google Scholar 

  12. Cao, L., (Eds), P.S.Y.: Behavior Computing: Modeling, Analysis, Mining and Decision. Springer (2012)

    Google Scholar 

  13. Cao, L., Ou, Y., Yu, P.S.: Coupled behavior analysis with applications. IEEE Trans. on Knowledge and Data Engineering 24(8), 1378–1392 (2012)

    Article  Google Scholar 

  14. Cao, L., Yu, P.S., Kumar, V.: Nonoccurring behavior analytics: A new area. IEEE Intelligent Systems 30(6), 4–11 (2015)

    Article  Google Scholar 

  15. Cao, L., Yu, P.S., Zhang, C., Zhao, Y.: Domain Driven Data Mining. Springer (2010)

    Google Scholar 

  16. Ceglar, A., Roddick, J.: Association mining. ACM Computing Surveys 38(2), 5 (2006)

    Article  Google Scholar 

  17. Chemuturi, M.: Mastering Software Quality Assurance: Best Practices, Tools and Techniques for Software Developers. J. Ross Publishing (2010)

    Google Scholar 

  18. Deeplearning: Deeplearning (2016). URL www.deeplearning.net/

  19. Drew, C.: Data science ethics in government. Phil. Trans. R. Soc. A 374 (2016)

    Google Scholar 

  20. DSA: Data science code of professional conduct, data science association (2016). URL http://www.datascienceassn.org/code-of-conduct.html

  21. (Ed.), M.P.: Similarity-based pattern analysis and recognition. Springer (2013)

    Google Scholar 

  22. Ehling, M., Korner, T.: Handbook on Data Quality Assessment Methods and Tools (eds.). EUROSTAT, Wiesbaden (2007)

    Google Scholar 

  23. Faghmous, J.H., Kumar, V.: A big data guide to understanding climate change: The case for theory-guided data science. Big Data 2(3), 155–163 (2014)

    Article  Google Scholar 

  24. Floridi, L.: The ethics of information. Oxford University Press (2013)

    Google Scholar 

  25. Floridi, L., Taddeo, M.: What is data ethics. Phil. Trans. R. Soc. A 374(2083) (2016)

    Google Scholar 

  26. G. Szkely, e.a.: Measuring and testing independence by correlation of distances. Annals of Statistics 35(6), 2769–2794 (2007)

    Google Scholar 

  27. Galin, D.: Software Quality Assurance: From Theory to Implementation. Pearson (2003)

    Google Scholar 

  28. Ganiz, M., George, C., Pottenger, W.: Higher order naive bayes: A novel non-iid approach to text classification. IEEE Transactions on Knowledge and Data Engineering 23(7), 1022–1034 (2011)

    Article  Google Scholar 

  29. Google: Deepmind (2016). URL https://deepmind.com/

  30. H. Lu, e.a.: Beyond intratransaction association analysis. ACM Transactions on Information Systems 18(4), 423–454 (2000)

    Google Scholar 

  31. Hazena, B.T., Booneb, C.A., Ezellc, J.D., Jones-Farmer, L.A.: Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics 154, 72–80 (2014)

    Article  Google Scholar 

  32. INFORMS: Informs code of ethics for certified analytics professionals. URL https://www.certifiedanalytics.org/ethics.php

  33. J. Hair, e.a.: Multivariate data analysis (7th Edition). Prentice Hall (2009)

    Google Scholar 

  34. Kan, S.H.: Metrics and Models in Software Quality Engineering, 2nd Edition. Addison-Wesley Professional (2002)

    Google Scholar 

  35. Kenett, R.S., Shmueli, G.: Information Quality: The Potential of Data and Analytics to Generate Knowledge. Wiley (2016)

    Google Scholar 

  36. Kramer, A., Guillory, J., Hancock, J.: Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl. Acad. Sci. 111(24), 8788–8790 (2014)

    Article  Google Scholar 

  37. Kurzweil, R.: How to Create a Mind: The Secret of Human Thought Revealed. Penguin Books (2013)

    Google Scholar 

  38. Leonelli, S.: Locating ethics in data science: responsibility and accountability in global and distributed knowledge production systems. Phil. Trans. R. Soc. A 374 (2016)

    Google Scholar 

  39. Loshin, D.: Enterprise Knowledge Management. Morgan Kaufmann (2001)

    Google Scholar 

  40. Miller, K., Taddeo, M.: The ethics of information technologies. In: Library of Essays on the Ethics of Emerging Technologies (ed.). NY: Routledge (2017)

    Google Scholar 

  41. MIT: Checklist for software quality (2011). URL http://web.mit.edu/~6.170/www/quality.html

  42. Mitchell, M.: Complexity: A Guided Tour. Oxford University Press (2011)

    Google Scholar 

  43. Mittelstadt, B., Floridi, L.: The ethics of big data: current and foreseeable issues in biomedical contexts. Sci. Eng. Ethics 22, 303–341 (2015)

    Article  Google Scholar 

  44. von Neumann, J., Kurzweil, R.: The Computer and the Brain, 3rd Edition. Yale University Press (2012)

    Google Scholar 

  45. Neville, J., Jensen, D.: Relational dependency networks. The Journal of Machine Learning Research 8, 653–692 (2007)

    MATH  Google Scholar 

  46. O’Leary, D.E.: Ethics for big data and analytics. IEEE Intelligent Systems 31(4), 81–84 (2016)

    Article  Google Scholar 

  47. Pearson, K.: Report on certain enteric fever inoculation statistics. Br Med J. 2(2288), 1243–1246 (1904)

    Article  Google Scholar 

  48. Philip, J.C.: Computer Generated Artificial Life: A Biblical And Logical Analysis (Integrated Apologetics), 10th edition. Philip Communications (2015)

    Google Scholar 

  49. Qian, X., Yu, J., Dai, R.: A new discipline of science-the study of open complex giant system and its methodology. Chin. J. Syst. Eng. Electron. 4(2), 2–12 (1993)

    Google Scholar 

  50. Redman, T.: Data Quality: The Field Guide. Digital Press (2001)

    Google Scholar 

  51. Rowley, J.: The wisdom hierarchy: representations of the DIKW hierarchy. Journal of Information and Communication Science 33(2), 163–180 (2007)

    Article  Google Scholar 

  52. Schulmeyer, G.G., Mcmanus, J.I.: Handbook of Software Quality Assurance, 3rd Edition. Prentice Hall PTR (1998)

    Google Scholar 

  53. Sebastian-Coleman, L.: Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework. Morgan Kaufmann (2013)

    Google Scholar 

  54. Suchma, L.: Human-Machine Reconfigurations: Plans and Situated Actions. Cambridge University Press (2006)

    Google Scholar 

  55. Taddeo, M., (eds.), L.F.: The ethical impact of data science. Phil. Trans. R. Soc. A 374 (2016). URL http://rsta.royalsocietypublishing.org/content/374/2083

  56. Taleb, N.N.: The Black Swan: The Impact of the Highly Improbable. Random House, New York (2007)

    Google Scholar 

  57. USAID: Usaid recommended data quality assessment (dqa) checklist (2016). URL https://usaidlearninglab.org/sites/default/files/resource/files/201sae.pdf

  58. Wang, C., Cao, L., Chi, C.: Formalization and verification of group behavior interactions. IEEE Trans. Systems, Man, and Cybernetics: Systems 45(8), 1109–1124 (2015)

    Google Scholar 

  59. Wei Wei Junfu Yin, J.L., Cao, L.: Modeling asymmetry and tail dependence among multiple variables by using partial regular vine. In: SDM2014 (2014)

    Google Scholar 

  60. Wikipedia: General data protection regulation (2016). URL https://en.wikipedia.org/wiki/General_Data_Protection_Regulation

  61. Wikipedia: National data protection authority (2016). URL https://en.wikipedia.org/wiki/National_data_protection_authority

  62. Wikipedia: Accuracy, precision, recall and specificity (2017). URL https://en.wikipedia.org/wiki/Precision_and_recall

  63. Wikipedia: Data quality (2017). URL https://en.wikipedia.org/wiki/Data_quality

  64. Woodall P., B.A., Parlikad, A.: Data quality assessment: The hybrid approach. Information & Management 50(7), 369–382 (2013)

    Google Scholar 

  65. Woodall P., O.M., A., B.: A classification of data quality assessment and improvement methods. International Journal of Information Quality 3(4), 298–321 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cao, L. (2018). Data Science Challenges. In: Data Science Thinking. Data Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-95092-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95092-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95091-4

  • Online ISBN: 978-3-319-95092-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics