Abstract
The demand for graduates with big data and data warehousing skills far exceeds the supply of students graduating with these skills. This paper addresses this problem by means of a pilot study in which big data topics were integrated into a classical data warehouse course at postgraduate level. Courses like this could be helpful in supporting hands-on learning experience with big data warehousing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For readers from outside South Africa: the South African ‘honours’ degree is an extension of the classical ‘B.Sc.’ degree which enables a student to commence with Master-studies thereafter. While already considered ‘postgraduate’ in South Africa, the ‘honours’ degree in South Africa is reasonably well comparable to the final study-year in the (longer) U.S.American ‘B.Sc.’ curriculum.
- 2.
References
ACM, IEEE: computer science curricula 2013: curriculum guidelines for undergraduate degree programs in computer science. Technical Report, ACM (2013)
Arnott, D., Dodson, G.: Decision support systems failure. Decis. Support Syst. 4, 763–790 (2008)
Awadallah, A., Graham, D.: Hadoop and the data warehouse: when to use which. Technical Report, Cloudera (2012)
Botma, E., Kotzé, E.: Feasibility of a low-cost computing cluster in comparison to a high-performance computing cluster: a developing country perspective. In: Proceedings CONF-IRM 2016, p. 44, Cape Town (2016)
Capriolo, E., Wampler, D., Rutherglen, J.: Programing Hive. O’Reilly, Sebastopol (2012)
Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12 (2011)
Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. ACM SIGMOD Rec. 26(1), 65–74 (1997)
Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)
Chen, M., Mao, S., Zhang, Y., Leung, V.C.: Big Data: Related Technologies, Challenges and Future Prospect. Springer, Heidelberg (2014)
Cloudera: Download QuickStarts for CDH 5.8. Technical Report, Cloudera (2017)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008)
Eckerson, W.: Big data analytics: profiling the use of analytical platforms in user organizations. Technical Report, TDWI (2011)
Gartner Consult: What is big data? – gartner IT glossary – Big Data. Technical Report (2016)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM SIGOPS Operating Syst. Rev. 37(5), 29–43 (2003)
Hadoop: Hadoop documentation and open source release. Technical Report, Apache (2017)
Hortonworks: Hadoop sandbox – hortonworks. Technical Report (2017)
Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., West, M.J.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)
Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)
IBM: Hadoop Dev – Try it. https://developer.ibm.com/hadoop/try-it/
Inmon, W.H.: Building the Data Warehouse, 4th edn. Wiley, Hoboken (2005)
Intel: extract, transform, and load big data with apache Hadoop. Technical Report, Intel (2013)
Jukić, N., Sharma, A., Nestorov, S., Jukić, B.: Augmenting data warehouses with big data. Inf. Syst. Manag. 32, 200–209 (2015)
Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting,Cleaning, Conforming, and Delivering Data. Wiley, Hoboken (2004)
Kimball, R., Ross, M.: The Data Warehouse Toolkit: the Complete Guide to Dimensional Modelling, 2nd edn. Wiley, Hoboken (2002)
Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B.: The Data Warehouse Lifecycle Toolkit: Practical Techniques for Building Data Warehouse and Business Intelligence Systems, 2nd edn. Wiley, Hoboken (2008)
Kimball, R., Strehlo, K.: Why decision support fails and how to fix it. Datamation 40(11), 40 (1994)
Kotzé, E.: A Survey of Data Scientists in South Africa. This volume: CCIS 730, (2017)
Kotzé, E.: An overview of big data and data science education at South African universities. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie 35(1) (2016). https://doi.org/10.4102/satnt.v35i1.1387
Krishnan, K.: Data Warehousing in the Age of Big Data. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)
Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. ACM SIGMOD Record 40(4), 11 (2011)
Loshin, D.: Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)
Lunt, B.M., Ekstrom, J.J., Gorka, S., Hislop, G., Kamali, R., Lawson, E., le Blanc, R., Miller, J., Reichgelt, H.: Information technology 2008 curriculum guidelines for undergraduate degree programs in information technology. Technical Report, ACM and IEEE Computer Society (2008)
Lutu, P.: Big data and nosql databases: new opportunities for database systems curricula. In: SACLA 2015 Proceedings 44th Annual Southern African Computer Lecturers Association, pp. 204–209, Johannesburg (2015)
Mackey, A.L.: Incorporating big data technology into computing curriculum: conference tutorial. J. Comput. Sci. Coll. 31(5), 38–39 (2016)
MapR: MapR Sandbox for Hadoop and MapR. https://www.mapr.com/products/mapr-sandbox-hadoop
Marshall, L., Eloff, J.H.P.: Towards an interdisciplinary master’s degree programme in big data and data science: a south african perspective. CCIS 642, 131–139 (2016)
Mills, R.J., Chudoba, K.M., Olsen, D.H.: IS programs responding to industry demands for data scientists: a comparison between 2011 and 2016. J. Inf. Syst. Edu. 27(2), 131–141 (2016)
Ponniah, P.: Data Warehousing Fundamentals for IT Professionals, 2nd edn. Wiley, Hoboken (2010)
Power, D.J.: Understanding data-driven decision support systems. Inf. Syst. Manag. 25, 149–154 (2008)
Project Gutenberg: Free eBooks by Project Gutenberg. https://www.gutenberg.org/
Provost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision making. Data Sci. Big Data 1(1), 51–59 (2013)
Rahman, N., Iverson, S.: Big data business intelligence in bank risk analysis. Int. J. Bus. Intell. Res. 6(2), 55–77 (2015)
Russom, P.: Evolving data warehouse architectures. Technical Report (2014)
Silva, Y.N., Dietrich, S.W., Reed, J.M., Tsosie, L.M.: Integrating big data into the computing curricula. In: SIGCSE 2014 Proceedings 45th ACM Technical Symposium on Computer Science Education, pp. 139–144 (2014)
Sharda, R., Delen, D., Turban, E.: Business Intelligence and Analytics: Systems for Decision Support, 10th edn. Pearson Education, Harlow (2014)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive – A petabyte scale data warehouse using Hadoop. In: Proceedings of International Conference on Data Engineering, pp. 996–1005 (2010)
Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen-Sarma, J., Murthy, R., Liu, H.: Artefact data warehousing and analytics infrastructure at facebook. In: Proceedings SIGMOD Conference, pp. 1013–1020, ACM (2010)
Topi, H., Valacich, J.S., Wright, R.T., Kaiser, K., Nunamaker, J.F., Sipior, J.C., de Vreede, G.J.: IS 2010: curriculum guidelines for undergraduate degree programs in information systems. Commun. Assoc. Inf. Syst. 26(1), 359–428 (2010)
Trauth, E.M., Farwell, D.W., Lee, D.: The IS expectation gap: industry expectations versus academic preparation. MIS Q. 17, 293 (1993)
Vaisman, A., Zimanyi, E.: Data Warehouse Systems Design and Implementation. Springer, Heidelberg (2014)
Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: SOCC 2013 Proceedings 4th Annual Symposium on Cloud Computing, pp. 1–16, ACM (2013)
Watson, H.J.: Tutorial: big data analytics: concepts, technologies, and applications. Commun. Assoc. Inf. Syst. 34(1), 64 (2014)
Watson, H.J., Wixom, B.H.: The current state of business intelligence. IEEE Comput. 40, 96–99 (2007)
White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Sebastopol (2015)
Wixom, B.H., Ariyachandra, T., Douglas, D., Goul, M., Gupta, B., Iyer, L., Kulkarni, U., Mooney, B.J.G., Phillips-Wren, G., Turetken, O.: The current state of business intelligence in academia: the arrival of big data. Commun. Assoc. Info. Syst. 34, 1–13 (2014)
Acknowledgements
Thanks to Ian van der Linde for having configured the cluster with Apache Hadoop and Apache Hive. Thanks also to the anonymous reviewers for their valuable comments and suggestions that have improved the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kotzé, E. (2017). Augmenting a Data Warehousing Curriculum with Emerging Big Data Technologies. In: Liebenberg, J., Gruner, S. (eds) ICT Education. SACLA 2017. Communications in Computer and Information Science, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-69670-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-69670-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69669-0
Online ISBN: 978-3-319-69670-6
eBook Packages: Computer ScienceComputer Science (R0)