Abstract
Scientific workflows help in designing, managing, monitoring, and executing in-silico experiments. Since scientific workflows often are complex, sharing them by means of public workflow repositories has become an important issue for the community. However, due to the increasing numbers of workflows available in such repositories, users have a crucial need for assistance in discovering the right workflow for a given task. To this end, identification of functional elements shared between workflows as a first step to derive meaningful similarity measures for workflows is a key point. In this paper, we present the results of a study we performed on the probably largest open workflow repository, myExperiment.org. Our contributions are threefold: (i) We discuss the critical problem of identifying same or similar (sub-)workflows and workflow elements, (ii) We study, for the first time, the problem of cross-author reuse and (iii) We provide a detailed analysis on the frequency of re-use of elements between workflows and authors, and identify characteristics of shared elements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, R., Carver, K., Pocock, M.G., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflow. Bioinformatics 20(1), 3045–3054 (2003)
Bowers, S., Ludäscher, B.: Actor-oriented design of scientific workflows. In: 24th Int. Conf. on Conceptual Modeling (2005)
Freire, J., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing Rapidly-Evolving Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)
Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 11, R86 (2010)
Cohen-Boulakia, S., Leser, U.: Search, Adapt, and Reuse: The Future of Scientific Workflow Management Systems. SIGMOD Record 40(2) (2011)
Berners-Lee, T., Hendler, J.: Publishing on the Semantic Web. Nature, 1023–1025 (2001)
Roure, D.D., Goble, C.A., Stevens, R.: The design and realisation of the myexperiment virtual research environment for social sharing of workflows. Future Generation Computer Systems 25(5), 561–567 (2009)
Mates, P., Santos, E., Freire, J., Silva, C.T.: CrowdLabs: Social Analysis and Visualization for the Sciences. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 555–564. Springer, Heidelberg (2011)
Goderis, A., Sattler, U., Lord, P., Goble, C.A.: Seven Bottlenecks to Workflow Reuse and Repurposing. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 323–337. Springer, Heidelberg (2005)
Xiang, X., Madley, G.: Improving the Reuse of Scientific Workflows and Their By-products. In: IEEE Int. Conf. on Web Services (2007)
Tversky, A.: Features of Similarity. Psychological Review 84, 327–352 (1977)
Tan, W., Zhang, J., Foster, I.: Network Analysis of Scientific Workflows: a Gateway to Reuse. IEEE Computer 43(9), 54–61 (2010)
Wassink, I., Vet, P.E.V.D., Wolstencroft, K., Neerincx, P.B.T., Roos, M., Rauwerda, H., Breit, T.M.: Analysing Scientific Workflows: Why Workflows Not Only Connect Web Services. In: IEEE Congress on Services (2009)
Stoyanovich, J., Taskar, B., Davidson, S.: Exploring repositories of scientific workflows. In: 1st Int. Workshop on Workflow Approaches to New Data-centric Science (2010)
Goderis, A., Li, P., Goble, C.: Workflow discovery: the problem, a case study from e-Science and a graph-based solution. In: IEEE Int. Conf. on Web Services (2006)
Zipf, G.: The Psycho-Biology of Language. MIT Press, Cambridge (1935)
Silva, V., Chirigati, F., Maia, K., Ogasawara, E., Oliveira, D., Braganholo, V., Murta, L., Mattoso, M.: Similarity-based Workflow Clustering. J. of Computational Interdisciplinary Science (2010)
Gil, Y., Kim, J., Florez, G., Ratnakar, V., Gonzalez-Calero, P.A.: Workflow matching using semantic metadata. In: 5th Int. Conf. on Knowledge Capture (2009)
Missier, P., Ludaescher, B., Dey, S., Wang, M., McPhillips, T., Bowers, S., Agun, M.: Golden-Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository. In: 7th Int. Digital Curation Conf. (2011)
Salton, G., McGill, M. (eds.): Introduction to Modern Information Retrieval. McGraw-Hill (1983)
Zhang, J., Tan, W., Alexander, J., Foster, I., Madduri, R.: Recommend-As-You-Go: A Novel Approach Supporting Services-Oriented Scientific Workflow Reuse. In: IEEE Int. Conf. on Services Computing (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Starlinger, J., Cohen-Boulakia, S., Leser, U. (2012). (Re)Use in Public Scientific Workflow Repositories. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-31235-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31234-2
Online ISBN: 978-3-642-31235-9
eBook Packages: Computer ScienceComputer Science (R0)