Skip to main content

Augmenting a Data Warehousing Curriculum with Emerging Big Data Technologies

  • Conference paper
  • First Online:
ICT Education (SACLA 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 730))

Abstract

The demand for graduates with big data and data warehousing skills far exceeds the supply of students graduating with these skills. This paper addresses this problem by means of a pilot study in which big data topics were integrated into a classical data warehouse course at postgraduate level. Courses like this could be helpful in supporting hands-on learning experience with big data warehousing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For readers from outside South Africa: the South African ‘honours’ degree is an extension of the classical ‘B.Sc.’ degree which enables a student to commence with Master-studies thereafter. While already considered ‘postgraduate’ in South Africa, the ‘honours’ degree in South Africa is reasonably well comparable to the final study-year in the (longer) U.S.American ‘B.Sc.’ curriculum.

  2. 2.

    http://www.seanlahman.com/baseball-archive/statistics/.

References

  1. ACM, IEEE: computer science curricula 2013: curriculum guidelines for undergraduate degree programs in computer science. Technical Report, ACM (2013)

    Google Scholar 

  2. Arnott, D., Dodson, G.: Decision support systems failure. Decis. Support Syst. 4, 763–790 (2008)

    Google Scholar 

  3. Awadallah, A., Graham, D.: Hadoop and the data warehouse: when to use which. Technical Report, Cloudera (2012)

    Google Scholar 

  4. Botma, E., Kotzé, E.: Feasibility of a low-cost computing cluster in comparison to a high-performance computing cluster: a developing country perspective. In: Proceedings CONF-IRM 2016, p. 44, Cape Town (2016)

    Google Scholar 

  5. Capriolo, E., Wampler, D., Rutherglen, J.: Programing Hive. O’Reilly, Sebastopol (2012)

    Google Scholar 

  6. Cattell, R.: Scalable SQL and NoSQL data stores. ACM SIGMOD Rec. 39(4), 12 (2011)

    Article  Google Scholar 

  7. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. ACM SIGMOD Rec. 26(1), 65–74 (1997)

    Article  Google Scholar 

  8. Chen, H., Chiang, R., Storey, V.: Business intelligence and analytics: from big data to big impact. MIS Q. 36(4), 1165–1188 (2012)

    Google Scholar 

  9. Chen, M., Mao, S., Zhang, Y., Leung, V.C.: Big Data: Related Technologies, Challenges and Future Prospect. Springer, Heidelberg (2014)

    Book  Google Scholar 

  10. Cloudera: Download QuickStarts for CDH 5.8. Technical Report, Cloudera (2017)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008)

    Article  Google Scholar 

  12. Eckerson, W.: Big data analytics: profiling the use of analytical platforms in user organizations. Technical Report, TDWI (2011)

    Google Scholar 

  13. Gartner Consult: What is big data? – gartner IT glossary – Big Data. Technical Report (2016)

    Google Scholar 

  14. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. ACM SIGOPS Operating Syst. Rev. 37(5), 29–43 (2003)

    Article  Google Scholar 

  15. Hadoop: Hadoop documentation and open source release. Technical Report, Apache (2017)

    Google Scholar 

  16. Hortonworks: Hadoop sandbox – hortonworks. Technical Report (2017)

    Google Scholar 

  17. Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., West, M.J.: Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6(1), 51–81 (1988)

    Article  Google Scholar 

  18. Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014)

    Article  Google Scholar 

  19. IBM: Hadoop Dev – Try it. https://developer.ibm.com/hadoop/try-it/

  20. Inmon, W.H.: Building the Data Warehouse, 4th edn. Wiley, Hoboken (2005)

    Google Scholar 

  21. Intel: extract, transform, and load big data with apache Hadoop. Technical Report, Intel (2013)

    Google Scholar 

  22. Jukić, N., Sharma, A., Nestorov, S., Jukić, B.: Augmenting data warehouses with big data. Inf. Syst. Manag. 32, 200–209 (2015)

    Article  Google Scholar 

  23. Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit: Practical Techniques for Extracting,Cleaning, Conforming, and Delivering Data. Wiley, Hoboken (2004)

    Google Scholar 

  24. Kimball, R., Ross, M.: The Data Warehouse Toolkit: the Complete Guide to Dimensional Modelling, 2nd edn. Wiley, Hoboken (2002)

    Google Scholar 

  25. Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B.: The Data Warehouse Lifecycle Toolkit: Practical Techniques for Building Data Warehouse and Business Intelligence Systems, 2nd edn. Wiley, Hoboken (2008)

    Google Scholar 

  26. Kimball, R., Strehlo, K.: Why decision support fails and how to fix it. Datamation 40(11), 40 (1994)

    Google Scholar 

  27. Kotzé, E.: A Survey of Data Scientists in South Africa. This volume: CCIS 730, (2017)

    Google Scholar 

  28. Kotzé, E.: An overview of big data and data science education at South African universities. Suid-Afrikaanse Tydskrif vir Natuurwetenskap en Tegnologie 35(1) (2016). https://doi.org/10.4102/satnt.v35i1.1387

  29. Krishnan, K.: Data Warehousing in the Age of Big Data. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)

    Google Scholar 

  30. Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. ACM SIGMOD Record 40(4), 11 (2011)

    Article  Google Scholar 

  31. Loshin, D.: Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph. Elsevier/Morgan Kaufman, Amsterdam/Burlington (2013)

    Google Scholar 

  32. Lunt, B.M., Ekstrom, J.J., Gorka, S., Hislop, G., Kamali, R., Lawson, E., le Blanc, R., Miller, J., Reichgelt, H.: Information technology 2008 curriculum guidelines for undergraduate degree programs in information technology. Technical Report, ACM and IEEE Computer Society (2008)

    Google Scholar 

  33. Lutu, P.: Big data and nosql databases: new opportunities for database systems curricula. In: SACLA 2015 Proceedings 44th Annual Southern African Computer Lecturers Association, pp. 204–209, Johannesburg (2015)

    Google Scholar 

  34. Mackey, A.L.: Incorporating big data technology into computing curriculum: conference tutorial. J. Comput. Sci. Coll. 31(5), 38–39 (2016)

    Google Scholar 

  35. MapR: MapR Sandbox for Hadoop and MapR. https://www.mapr.com/products/mapr-sandbox-hadoop

  36. Marshall, L., Eloff, J.H.P.: Towards an interdisciplinary master’s degree programme in big data and data science: a south african perspective. CCIS 642, 131–139 (2016)

    Google Scholar 

  37. Mills, R.J., Chudoba, K.M., Olsen, D.H.: IS programs responding to industry demands for data scientists: a comparison between 2011 and 2016. J. Inf. Syst. Edu. 27(2), 131–141 (2016)

    Google Scholar 

  38. Ponniah, P.: Data Warehousing Fundamentals for IT Professionals, 2nd edn. Wiley, Hoboken (2010)

    Book  Google Scholar 

  39. Power, D.J.: Understanding data-driven decision support systems. Inf. Syst. Manag. 25, 149–154 (2008)

    Article  Google Scholar 

  40. Project Gutenberg: Free eBooks by Project Gutenberg. https://www.gutenberg.org/

  41. Provost, F., Fawcett, T.: Data science and its relationship to big data and data-driven decision making. Data Sci. Big Data 1(1), 51–59 (2013)

    Article  Google Scholar 

  42. Rahman, N., Iverson, S.: Big data business intelligence in bank risk analysis. Int. J. Bus. Intell. Res. 6(2), 55–77 (2015)

    Article  Google Scholar 

  43. Russom, P.: Evolving data warehouse architectures. Technical Report (2014)

    Google Scholar 

  44. Silva, Y.N., Dietrich, S.W., Reed, J.M., Tsosie, L.M.: Integrating big data into the computing curricula. In: SIGCSE 2014 Proceedings 45th ACM Technical Symposium on Computer Science Education, pp. 139–144 (2014)

    Google Scholar 

  45. Sharda, R., Delen, D., Turban, E.: Business Intelligence and Analytics: Systems for Decision Support, 10th edn. Pearson Education, Harlow (2014)

    Google Scholar 

  46. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive – A petabyte scale data warehouse using Hadoop. In: Proceedings of International Conference on Data Engineering, pp. 996–1005 (2010)

    Google Scholar 

  47. Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen-Sarma, J., Murthy, R., Liu, H.: Artefact data warehousing and analytics infrastructure at facebook. In: Proceedings SIGMOD Conference, pp. 1013–1020, ACM (2010)

    Google Scholar 

  48. Topi, H., Valacich, J.S., Wright, R.T., Kaiser, K., Nunamaker, J.F., Sipior, J.C., de Vreede, G.J.: IS 2010: curriculum guidelines for undergraduate degree programs in information systems. Commun. Assoc. Inf. Syst. 26(1), 359–428 (2010)

    Google Scholar 

  49. Trauth, E.M., Farwell, D.W., Lee, D.: The IS expectation gap: industry expectations versus academic preparation. MIS Q. 17, 293 (1993)

    Article  Google Scholar 

  50. Vaisman, A., Zimanyi, E.: Data Warehouse Systems Design and Implementation. Springer, Heidelberg (2014)

    Google Scholar 

  51. Vavilapalli, V.K., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H.: Apache Hadoop YARN. In: SOCC 2013 Proceedings 4th Annual Symposium on Cloud Computing, pp. 1–16, ACM (2013)

    Google Scholar 

  52. Watson, H.J.: Tutorial: big data analytics: concepts, technologies, and applications. Commun. Assoc. Inf. Syst. 34(1), 64 (2014)

    Google Scholar 

  53. Watson, H.J., Wixom, B.H.: The current state of business intelligence. IEEE Comput. 40, 96–99 (2007)

    Article  Google Scholar 

  54. White, T.: Hadoop: The Definitive Guide, 4th edn. O’Reilly, Sebastopol (2015)

    Google Scholar 

  55. Wixom, B.H., Ariyachandra, T., Douglas, D., Goul, M., Gupta, B., Iyer, L., Kulkarni, U., Mooney, B.J.G., Phillips-Wren, G., Turetken, O.: The current state of business intelligence in academia: the arrival of big data. Commun. Assoc. Info. Syst. 34, 1–13 (2014)

    Google Scholar 

Download references

Acknowledgements

Thanks to Ian van der Linde for having configured the cluster with Apache Hadoop and Apache Hive. Thanks also to the anonymous reviewers for their valuable comments and suggestions that have improved the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduan Kotzé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kotzé, E. (2017). Augmenting a Data Warehousing Curriculum with Emerging Big Data Technologies. In: Liebenberg, J., Gruner, S. (eds) ICT Education. SACLA 2017. Communications in Computer and Information Science, vol 730. Springer, Cham. https://doi.org/10.1007/978-3-319-69670-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69670-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69669-0

  • Online ISBN: 978-3-319-69670-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics