Skip to main content

Modelling Processes of Big Data Analytics

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2015 (WISE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9418))

Included in the following conference series:

  • 1502 Accesses

Abstract

Analytics tasks in scientific and industrial environments are to be performed in some order that, as a whole, represent the rationale of a specific process on the data. The challenge to process the data is, beyond there mere size, their dispersion and the variety of their formats. The data analysis may include a range of tasks to be executed on a range of query engines, which are created by various users, such as business analysts, engineers, end-users etc. The users, depending on their role and expertise, may need or care for a different level of abstraction with respect to the execution of the individual tasks and overall process. Therefore, a system for Big Data analytics should enable the expression of tasks in an abstract manner, adaptable to the user role, interest and expertise. In this work we discuss the modelling of Big Data Analytics. We propose a novel representation model for analytics tasks and overall processes, that encapsulates their declaration, but, also, their execution semantics. The model allows for the definition of analytics processes with a varying level of abstraction, adaptable to the user role. Our motivation derives from real use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.asap-fp7.eu.

  2. 2.

    Replication of tasks using many associative vertices that correspond to the same task of an original vertex may be needed for optimisation of workflow execution.

  3. 3.

    Note that such tasks may involve also human interaction and may be performed online or offline.

References

  1. Tuchinda, R., Thakkar, S., Gil, Y., Deelman, E.: Artemis: integrating scientific data on the grid. In: IAAA, pp. 25–29 (2004)

    Google Scholar 

  2. Simitsis, A., Wilkinson, K., Dayal, U., Hsu, M.: HFMS: managing the lifecycle and complexity of hybrid analytic data flows. In: ICDE 2013, pp. 1174–1185 (2013)

    Google Scholar 

  3. Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Optimizing analytic data flows for multiple execution engines. In: SIGMOD 2012, pp. 829–840 (2012)

    Google Scholar 

  4. Pegasus. http://pegasus.isi.edu/

  5. Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., da Silva, R.F., Livny, M., Wenger, K.: Pegasus: a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)

    Article  Google Scholar 

  6. Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. In: SC 2012, pp. 22:1–22:11 (2012)

    Google Scholar 

  7. Oinn, T., Addis, M., Ferris, J., Marvin, D., Carver, T., Pocock, M.R., Wipat, A.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinform. 20, 3045–3054 (2004)

    Article  Google Scholar 

  8. Wolstencroft, K., Haines, R., Fellows, D., Williams, A.R., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., de la Hidalga, A.N., Vargas, M.P.B., Sufi, S., Goble, C.A.: The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41, 557–561 (2013)

    Article  Google Scholar 

  9. Apache tez. http://hortonworks.com/hadoop/tez/

  10. Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J.-C., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)

    Article  Google Scholar 

  11. Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In: SoCC 2010, pp. 119–130 (2010)

    Google Scholar 

Download references

Acknowledgment

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant Agreement \(n^o\) 619706 ASAP.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Verena Kantere .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kantere, V., Filatov, M. (2015). Modelling Processes of Big Data Analytics. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9418. Springer, Cham. https://doi.org/10.1007/978-3-319-26190-4_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26190-4_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26189-8

  • Online ISBN: 978-3-319-26190-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics