Skip to main content

Reference Architecture for Running Large Scale Data Integration Experiments

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2021)

Abstract

This paper contributes a reference architecture of a reusable infrastructure for scientific experiments on data processing and data integration. The architecture is based on containerization and is integrated with an external machine learning cloud service to build performance models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.tpc.org/tpcds/.

  2. 2.

    https://www.ibm.com/cloud/watson-studio/autoai.

References

  1. Auer, F., Ros, R., Kaltenbrunner, L., Runeson, P., Felderer, M.: Controlled experimentation in continuous experimentation: knowledge and challenges. Inf. Softw. Technol. 134, 106551 (2021)

    Google Scholar 

  2. Deng, Q., Goudarzi, M., Buyya, R.: Fogbus2: a lightweight and distributed container-based framework for integration of IoT-enabled systems with edge and cloud computing. In: International Workshop on Big Data in Emergent Distributed Environments (BiDEDE) @SIGMOD. ACM (2021)

    Google Scholar 

  3. Hernández, Á.B., Pérez, M.S., Gupta, S., Muntés-Mulero, V.: Using machine learning to optimize parallelism in big data applications. Future Gener. Comput. Syst. 86, 1076–1092 (2018)

    Google Scholar 

  4. Herodotou, H., et al.: Starfish: a self-tuning system for big data analytics. In: CIDR, pp. 261–272 (2011)

    Google Scholar 

  5. Majithia, S., Walker, D.W., Gray, W.A.: Automating scientific experiments on the semantic grid. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 365–379. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30475-3_26

  6. Popescu, A.D., Ercegovac, V., Balmin, A., Branco, M., Ailamaki, A.: Same queries, different data: can we predict runtime performance? In: ICDE Workshops, pp. 275–280 (2012)

    Google Scholar 

  7. Vrijders, S., Staessens, D., Capitani, M., Maffione, V.: Rumba: a Python framework for automating large-scale recursive internet experiments on GENI and FIRE+. In: IEEE Conference on Computer Communications Workshops, pp. 324–329 (2018)

    Google Scholar 

  8. Wroe, C., et al.: Automating experiments using semantic data on a bioinformatics grid. IEEE Intell. Syst. 19(1), 48–55 (2004)

    Google Scholar 

Download references

Acknowledgement

The work of Robert Wrembel is partially supported by IBM Shared University Reward 2019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Wrembel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bodziony, M., Wrembel, R. (2021). Reference Architecture for Running Large Scale Data Integration Experiments. In: Strauss, C., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2021. Lecture Notes in Computer Science(), vol 12923. Springer, Cham. https://doi.org/10.1007/978-3-030-86472-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86472-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86471-2

  • Online ISBN: 978-3-030-86472-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics