Abstract
Academic programmes at South African Higher Education Institutions have predominantly educated students in managing and storing data using relational database technology. However, this is no longer sufficient. South Africa as a country will need to educate more students to manage and process structured, semi-structured and unstructured data. The main purpose of this study was to examine the status of data scientists, a role typically associated with managing these new data sets, in South Africa. The study examined the skills, knowledge and qualifications these data scientists require to do their daily tasks, and offers suggestions that ought to be considered when designing a curriculum for an academic programme in data science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Example answers included: “transforming and shifting data”, “processing of large volumes of data (ETL/data pipeline)”, “data preparation for statistical models”, as well as “data warehousing, reporting, ETL development”.
References
ACM and IEEE 2013: Computer Science Curricula 2013: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science. ACM (2013). https://doi.org/10.1145/2534860
Anderson, P., Bowring, J., McCauley, R., Pothering, G., Starr, C.: An undergraduate degree in data science: curriculum and a decade of implementation experience. In: Proceedings of the 45th ACM Technical Symposium on Computer Science Education, SIGCSE 2014, pp. 145–150 (2014)
Berman, J.J.: Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)
Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12 (2011)
Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
College of Charleston: Data Science Program Information (2017)
Daniel, B., Butson, R.: Foundations of big data and analytics in higher education. In: Proceedings of the International Conference on Analytics Driven Solutions, ICAS 2014 (2014)
Davenport, T.H., Barth, P., Bean, R.: How big data is different. MIT Sloan Manag. Rev. 54, 22–24 (2012)
Davenport, T.H., Patil, D.J.: Data scientist: the sexiest job of the 21st century (2012). http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century. Accessed 25 Nov 2013
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008)
de Veaux, R.D., Agarwal, M., Averett, M., Baumer, B.S., Bray, A., Bressoud, T.C., Bryant, L., Cheng, L.Z., Francis, A., Gould, R., Kim, A.Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R.J., Sondjaja, M., Tiruviluamala, N., Uhlig, P.X., Washington, T.M., Wesley, C.L., White, D., Ye, P.: Curriculum guidelines for undergraduate programs in data science. Ann. Rev. Stat. Appl. 4(1), 15–30 (2017)
Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013)
Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east executive summary: a universe of opportunities and challenges. Technical report, EMC (2012)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, no. 5, pp. 29–43 (2003)
Gittlen, S.: Could data scientist be your next job? Technical report, Computerworld (2012)
Gopalkrishnan, V., Steier, D.: Big data, big business: bridging the gap. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining, BigMine 2012, pp. 7–11 (2012)
Granville, V.: Developing Analytic Talent: Becoming a Data Scientist. Wiley, Hoboken (2014)
Harris, J.G., Shetterley, N., Alter, A.E., Schnell, K.: The team solution to the data scientist shortage. Technical report, Accenture Institute for High Performance (2013)
Holtz, D.: 8 skills you need to be a data scientist. Technical report, Udacity (2014)
Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., West, M.J.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)
Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)
IBM: What is big data? Technical report (2015)
ITWeb: Business intelligence survey 2013 results. Technical report (2013)
Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)
Jukić, N., Sharma, A., Nestorov, S., Jukić, B.: Augmenting data warehouses with big data. Inf. Syst. Manag. 32, 200–209 (2015)
Kim, B.G., Trimi, S., Chung, J.H.: Big-data applications in the government sector. Commun. ACM 57(3), 78–85 (2014)
Kim, W., Jeong, O.R., Kim, C.: A holistic view of big data. Int. J. Data Warehouse. Min. 10(3), 59–69 (2014)
Kotzé, E.: Augmenting a data warehousing curriculum with emerging big data technologies. In: Liebenberg, J., Gruner, S. (eds.) SACLA 2017. CCIS, vol. 730, pp. 128–143. Springer, Cham (2017)
Kotzé, E.: An overview of big data and data science education at South African universities. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie, 35(1) (2016). https://doi.org/10.4102/satnt.v35i1.1387
Krishnan, K.: Data Warehousing in the Age of Big Data. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)
Lopez, J.A.: Best practices for turning big data into big insights. Bus. Intell. J. 4(17), 17–21 (2012)
Lutu, P.: Big data and NoSQL databases: new opportunities for database systems curricula. In: Proceedings of the 44th Annual Southern African Computer Lecturers’ Association, SACLA’2015, pp. 204–209, Johannesburg (2015)
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey (2011)
Marshall, L., Eloff, J.H.P.: Towards an interdisciplinary master’s degree programme in big data and data science: a South African perspective. In: CCIS, vol. 642, pp. 131–139 (2016)
Mills, R.J., Chudoba, K.M., Olsen, D.H.: IS programs responding to industry demands for data scientists: a comparison between 2011 and 2016. J. Inf. Syst. Educ. 27(2), 131–141 (2016)
Minelli, M., Chambers, M., Dhiraj, A.: Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses. Wiley, Hoboken (2013)
Moyo, A.: South Africa snubs big data. Technical report, iWeek (2014)
Normandeau, K.: Beyond volume, variety and velocity is the issue of big data veracity. Technical report (2013)
North-West University: BMI (2016). http://natural-sciences.nwu.ac.za/bmi
NVivo: Qualitative Data Analysis Software (Version 11). QSR International (2016)
Patil, D.J.: Building Data Science Teams. O’Reilly, Sebastopol (2011)
Pieterse, I.: How big data is changing business. Technical report, iWeek (2014)
Rouse, M.: Data scientist. Technical report, Search Business Analytics (2011)
SAQA: http://www.saqa.org.za/
Sol Plaatjie University: Bachelor of Science in Data Science. Technical report (2016)
Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen-Sarma, J., Murthy, R., Liu, H.: Artefact data warehousing and analytics infrastructure at Facebook. In: Proceedings of the SIGMOD Conference, pp. 1013–1020. ACM (2010)
University of Pretoria: Master’s Degree in Big Data Science at the University of Pretoria. Technical report (2016)
van Biljon, A., Kotzé, E.: How big is big data and where will you find it? Technical report, EngineerIT (2015)
van der Aalst, W.M.P.: Data scientist: the engineer of the future. In: Mertins, K., Bénaben, F., Poler, R., Bourrières, J.-P. (eds.) Enterprise Interoperability VI. PIC, vol. 7, pp. 13–26. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04948-9_2
Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC 2013, pp. 1–16. ACM (2013)
Wamba, S.F., Akter, S., Edwards, A., Chopin, G., Gnanzou, D.: How “big data” can make big impact: findings from a systematic review and a longitudinal case study. Int. J. Prod. Econ. 165, 234–246 (2015). https://doi.org/10.1016/j.ijpe.2014.12.031
Watson, H.J., Marjanovic, O.: Big data: the fourth data management generation. Bus. Intell. J. 18(3), 4–9 (2014)
White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Sebastopol (2015)
Yin, S., Kaynak, O.: Big data for modern industry: challenges and trends. IEEE 103(2), 143–146 (2015)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (2010)
Acknowledgements
Thanks to all respondents who made this study possible. Thanks to Anelize van Biljon for her helpful comments on drafts of this paper. Last but not least thank the anonymous reviewers of SACLA’2017 for their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kotzé, E. (2017). A Survey of Data Scientists in South Africa. In: Liebenberg, J., Gruner, S. (eds) ICT Education. SACLA 2017. Communications in Computer and Information Science, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-69670-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-69670-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69669-0
Online ISBN: 978-3-319-69670-6
eBook Packages: Computer ScienceComputer Science (R0)