Abstract
A major issue that arises when designing data-analysis pipelines is that of identifying the services (or what we refer to as modules in this paper) that are suitable for performing data preparation steps, which represents \(80\%\) of the modules that compose data analysis workflows. Such modules are ubiquitous and are used to perform, amongst other things, operations such as record retrieval, format transformation, data combination to name a few. To assist scientists in the task of discovering suitable modules, we examine, in this paper, a solution that utilizes semantic annotations describing the inputs and outputs of modules together with data examples that characterize modules’ behavior as ingredients for the discovery of data preparation modules. The discovery strategy that we devised is iterative in that it allows scientists to explore existing modules by providing feedback on data examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexe, B., Cate, B.T., Kolaitis, P.G., Tan, W.C.: Characterizing schema mappings via data examples. ACM Trans. Database Syst. 36(4), 23:1–23:48 (2011)
Belhajjame, K.: Annotating the behavior of scientific modules using data examples: a practical approach. In: EDBT, pp. 726–737. OpenProceedings.org (2014)
Ebaid, A., et al.: NADEEF: a generalized data cleaning system. PVLDB 6(12), 1218–1221 (2013)
Goble, C., et al.: BioCatalogue: a curated web service registry for the life science community. In: Microsoft eScience Conference (2008)
Gruber, T.: Ontology. In: Encyclopedia of Database Systems (2009)
Konstantinou, N., et al.: The VADA architecture for cost-effective data wrangling. In: SIGMOD Conference, pp. 1599–1602. ACM (2017)
Kuropka, D., Tröger, P., Staab, S., Weske, M. (eds.): Semantic Service Provisioning. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78617-7
Stonebraker, M., et al.: Data curation at scale: the data tamer system. In: CIDR (2013). www.cidrdb.org
Studer, R., Grimm, S., Abecker, A. (eds.): Semantic Web Services, Concepts, Technologies, and Applications. Springer, Berlin (2007). https://doi.org/10.1007/3-540-70894-4
van Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Belhajjame, K. (2020). On Discovering Data Preparation Modules Using Examples. In: Kafeza, E., Benatallah, B., Martinelli, F., Hacid, H., Bouguettaya, A., Motahari, H. (eds) Service-Oriented Computing. ICSOC 2020. Lecture Notes in Computer Science(), vol 12571. Springer, Cham. https://doi.org/10.1007/978-3-030-65310-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-65310-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65309-5
Online ISBN: 978-3-030-65310-1
eBook Packages: Computer ScienceComputer Science (R0)