Abstract
Analytics tasks in scientific and industrial environments are to be performed in some order that, as a whole, represent the rationale of a specific process on the data. The challenge to process the data is, beyond there mere size, their dispersion and the variety of their formats. The data analysis may include a range of tasks to be executed on a range of query engines, which are created by various users, such as business analysts, engineers, end-users etc. The users, depending on their role and expertise, may need or care for a different level of abstraction with respect to the execution of the individual tasks and overall process. Therefore, a system for Big Data analytics should enable the expression of tasks in an abstract manner, adaptable to the user role, interest and expertise. In this work we discuss the modelling of Big Data Analytics. We propose a novel representation model for analytics tasks and overall processes, that encapsulates their declaration, but, also, their execution semantics. The model allows for the definition of analytics processes with a varying level of abstraction, adaptable to the user role. Our motivation derives from real use cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Replication of tasks using many associative vertices that correspond to the same task of an original vertex may be needed for optimisation of workflow execution.
- 3.
Note that such tasks may involve also human interaction and may be performed online or offline.
References
Tuchinda, R., Thakkar, S., Gil, Y., Deelman, E.: Artemis: integrating scientific data on the grid. In: IAAA, pp. 25–29 (2004)
Simitsis, A., Wilkinson, K., Dayal, U., Hsu, M.: HFMS: managing the lifecycle and complexity of hybrid analytic data flows. In: ICDE 2013, pp. 1174–1185 (2013)
Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Optimizing analytic data flows for multiple execution engines. In: SIGMOD 2012, pp. 829–840 (2012)
Pegasus. http://pegasus.isi.edu/
Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., da Silva, R.F., Livny, M., Wenger, K.: Pegasus: a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)
Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. In: SC 2012, pp. 22:1–22:11 (2012)
Oinn, T., Addis, M., Ferris, J., Marvin, D., Carver, T., Pocock, M.R., Wipat, A.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinform. 20, 3045–3054 (2004)
Wolstencroft, K., Haines, R., Fellows, D., Williams, A.R., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., de la Hidalga, A.N., Vargas, M.P.B., Sufi, S., Goble, C.A.: The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41, 557–561 (2013)
Apache tez. http://hortonworks.com/hadoop/tez/
Alexandrov, A., Bergmann, R., Ewen, S., Freytag, J.-C., Hueske, F., Heise, A., Kao, O., Leich, M., Leser, U., Markl, V., Naumann, F., Peters, M., Rheinländer, A., Sax, M.J., Schelter, S., Höger, M., Tzoumas, K., Warneke, D.: The stratosphere platform for big data analytics. VLDB J. 23(6), 939–964 (2014)
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In: SoCC 2010, pp. 119–130 (2010)
Acknowledgment
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant Agreement \(n^o\) 619706 ASAP.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kantere, V., Filatov, M. (2015). Modelling Processes of Big Data Analytics. In: Wang, J., et al. Web Information Systems Engineering – WISE 2015. WISE 2015. Lecture Notes in Computer Science(), vol 9418. Springer, Cham. https://doi.org/10.1007/978-3-319-26190-4_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-26190-4_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26189-8
Online ISBN: 978-3-319-26190-4
eBook Packages: Computer ScienceComputer Science (R0)