Skip to main content

Agile Approach to Develop Data Lake Based Systems

  • Chapter
  • First Online:
Towards Industry 4.0 — Current Challenges in Information Systems

Part of the book series: Studies in Computational Intelligence ((SCI,volume 887))

Abstract

The chapter presents the incremental approach and agile principles as the alternative methodology for developing an analytical system in light data lake environment. The evaluation of the proposed LDL system building procedure was carried out in a case study in the European restaurant operating in several European countries. The obtained models of data analysis performed in sprints are reliable and the whole approach is very effective. In addition, this approach gives companies great flexibility as they can develop each dimension of the system independently and with varying degrees of intensity, according to the needs and financial resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A good example of presenting numerous JN-based projects is the annual JupyetrCon conference: (see: https://conferences.oreilly.com/jupyter/jup-ny).

References

  1. Reinsel, D., Gantz, J., & Rydning, J. (2018). The digitization of the world from edge to core. An IDC White Paper.

    Google Scholar 

  2. McAfee, A., & Brynjolfsson, E. (2012). Big data: The management revolution. Harvard Business Review.

    Google Scholar 

  3. Miloslavskaya, N., & Tolstoy, A. (2016). Big data, fast data and data lake concepts. Procedia Engineering, 2017(88), 300–305.

    Google Scholar 

  4. Gorelik, A. (2019). The enterprise big data lake: Delivering the promise of big data and data science. O’Reilly Media.

    Google Scholar 

  5. Hagstroem, M., Roggendorf, M., Saleh, T., & Sharma, J. (2017). A smarter way to jump into data lakes. Retrieved from https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/a-smarter-way-to-jump-into-data-lakes. Accessed September 11, 2019.

  6. Tomcy, J., & Pankaj, M. (2017). Data lake for enterprises. Packt Publishing.

    Google Scholar 

  7. Collier, K. W. (2011). Agile analytics: A value-driven approach to business intelligence and data warehousing. Pearson Education. ISBN 9780321669544.

    Google Scholar 

  8. Sakovich, N. (2018). Waterfall vs. Agile: A comparison of software development methodologies. SamSolutions.

    Google Scholar 

  9. Terrizzano, I. G., & Schwarz, P. M., Roth, M., Colino, J. E. (2015). Data wrangling: The challenging journey from the wild to the lake. In CIDR.

    Google Scholar 

  10. Ravat, F., & Zhao, Y. (2019). Data lakes: Trends and perspectives. In Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., & Khalil, I. (Eds.), Database and expert systems applications. DEXA 2019. Lecture notes in computer science (vol. 11706). Springer, Cham.

    Google Scholar 

  11. Davenport, T. (2017). How analytics has changed in the last 10 years (and how it’s staved the same). Harvard Business Review. https://bit.ly/2sG6FUb. Accessed August 20, 2019.

  12. Naregsian, F., Zhu, E., & Miller, R. J. (2019). Data lake management: Challenges and opportunities. PVLDB, 12. https://doi.org/10.14778/3352063.3352116.

    Article  Google Scholar 

  13. Llave, M. R. (2018). Data lakes in business intelligence: Reporting from the trenches. Procedia Computer Science, 138, 516–524.

    Article  Google Scholar 

  14. Sitarska-Buba, M., & Zygala, R. (2020). Data lake: Strategic challenges for small and medium sized enterprises. In Hernes, M., Rot, A., & Jelonek, D. (Eds.), Towards Industry 4.0—current challenges in information systems. Lecture Notes in Computational Intelligence. Springer (in print).

    Google Scholar 

  15. Carvalho, L. A., Wang, R., Gil, Y., & Garijo, D. (2017). NiW: converting notebooks into workflows to capture dataflow and provenance. In Conference on Knowledge Capture (K-CAP).

    Google Scholar 

  16. Zhang, Y., & Ives G. (2019). Juneau: Data lake management for jupyter.

    Google Scholar 

  17. LaPlante, A., & Sharma, B. (2016). Architecting data lakes. O’Reilly Media, Inc. ISBN: 9781492042518.

    Google Scholar 

  18. Khine, P. P., & Wang, Z. S. (2017). Data lake: A new ideology in big data era. In ITM Web of Conferences WCSN 2017 (vol. 17, pp. 1–6), Wuhan, China. https://doi.org/10.1051/itmconf/2018170302.

  19. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0, Step-by-step data mining guide. Retrieved from https://pdfs.semanticscholar.org/5406/1a4aa0cb241a726f54d0569efae1c13aab3a.pdf?_ga=2.72896979.1919626922.1564398401-1661122695.1564398401. Accessed October 10, 2019.

  20. Peng, R., & Matsui, E. (2017). The art of data science. http://leanpub.com/artofdatascience. Accessed August 21, 2019.

  21. Fuentes, A. (2018). Hands-on predictive analytics with python. Packt Publishing. ISBN: 9781789138719.

    Google Scholar 

  22. Davenport, T. H. (2014). Big data @ work: Dispelling the myths, uncovering the opportunities. Harvard Business Review Press.

    Google Scholar 

  23. Larsona, D., & Chang, V. (2016). A review and future direction of agile, business intelligence, analytics and data science. International Journal of Information Management, 36(5), 700–710.

    Article  Google Scholar 

Download references

Acknowledgements

The project is financed by the Ministry of Science and Higher Education in Poland under the programme “Regional Initiative of Excellence” 2019–2022 project number 015/RID/2018/19 total funding amount 10,721,040.00 PLN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monika Sitarska-Buba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gryncewicz, W., Sitarska-Buba, M., Zygała, R. (2020). Agile Approach to Develop Data Lake Based Systems. In: Hernes, M., Rot, A., Jelonek, D. (eds) Towards Industry 4.0 — Current Challenges in Information Systems. Studies in Computational Intelligence, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-40417-8_12

Download citation

Publish with us

Policies and ethics