Abstract
The increasingly computationally- and data-intensive nature of experimental science motivates recent interest in workflows, as a way to specify complex data processing and integration pipelines in a fairly intuitive way. Such workflows orchestrate the invocation of data retrieval services in a way that resembles, to some extent, Search Computing query plans. While the former are manually specified, however, the latter are the result of an automated translation process. Using lessons learnt from experience in workflow design, in this chapter we discuss some of the requirements on service curation that make automated, on-demand data integration processes possible and realistic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research (2009)
Via, M., Gignoux, C., Burchard, E.G.: The 1000 Genomes Project: new opportunities for research and social challenges. Genome medicine 2(1), 3 (2010)
Southan, C., Cameron, G.: Beyond the Tsunami: Developing the Infrastructure to Deal with Life Sciences Data, Microsoft Corp., pp. 117–123
Koboldt, D.C., Ding, L., Mardis, E., Wilson, R.: Challenges of sequencing human genomes. Briefings in bioinformatics (Epub ahead of print) (June 2010)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: An overview of workflow system features and capabilities. Future Generation Computer Systems 25(5), 528–540 (2009)
Taylor, I.J., Deelman, E., Gannon, D., Shields, M. (eds.): Workflows for e-science, Scientific workflows for Grids. Springer, Heidelberg (2006)
Ludascher, B., Altintas, I., Bowers, S., Cummings, J.: Scientific Process Automation and Workflow Management. In: Computational Science. Chapman & Hall, Boca Raton (2010)
Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, reloaded. In: Gertz, M., Hey, T., Ludaescher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 471–481. Springer, Heidelberg (2010)
Swainston, N., Jameson, D., Li, P., Spasic, I., Mendes, P., Paton, N.: Integrative Information Management for Systems Biology. In: Lambrix, P., Kemp, G. (eds.) DILS 2010. LNCS, vol. 6254, pp. 164–178. Springer, Heidelberg (2010)
Herrgård, M.J., Swainston, N., Dobson, P.: A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nature Biotechnology 26(10), 1155–1160 (2008)
Dada, J.O., Mendes, P.: Design and Architecture of Web Services for Simulation of Biochemical Systems. In: Paton, N.W., Missier, P., Hedeler, C. (eds.) DILS 2009. LNCS, vol. 5647, pp. 182–195. Springer, Heidelberg (2009)
Novère, N.L., Finney, A., Hucka, M., Bhalla, U.S., Campagne, F., Collado-Vides, J., Crampin, E.J., Halstead, M., Klipp, E., Mendes, P., Nielsen, P., Sauro, H., Shapiro, B., Snoep, J.L., Spence, H.D., Wanner, B.L.: Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnology 23(12), 1509–1515 (2005)
Gil, Y., Gonzalez-Calero, P., Kim, J., Moody, J., Ratnakar, V.: A Semantic Framework for Automatic Generation of Computational Workflows Using Distributed Data and Component Catalogs. Journal of Experimental and Theoretical Artificial Intelligence (to appear, 2010)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, B.G., Good, J., Laity, A.C., Jacob, J.C., Katz, D.S.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13(3), 219–237 (2005)
Nekrutenko, A.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 11(8), R86 (2010)
Abouelhoda, M., Alaa, S., Ghanem, M.: Meta-workflows: pattern-based interoperability between Galaxy and Taverna. In: Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, Wands 2010, pp. 1–8. ACM, New York (2010)
Taylor, I.: Triana Generations. e-Science, 143 (2006)
Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming Scientific and Distributed Workflow with Triana Services. Concurrency and Computation: Practice and Experience (Special Issue: Workflow in Grid Systems) 18, 1021–1037 (2006)
Ludäscher, B., Altintas, I., Berkley, C.: Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice and Experience 18, 1039–1065 (2005)
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)
Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., Lopez, R., Goble, C.: BioCatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Research (May 2010)
Missier, P., Wolstencroft, K., Tanoh, F., Li, P., Bechhofer, S., Belhajjame, K., Goble, C.: Functional Units: Abstractions for Web Service Annotations. In: Procs. IEEE 2010 Fourth International Workshop on Scientific Workflows (SWF 2010), Miami, FL (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Missier, P., Paton, N., Li, P. (2011). Workflows for Information Integration in the Life Sciences. In: Ceri, S., Brambilla, M. (eds) Search Computing. Lecture Notes in Computer Science, vol 6585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19668-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-19668-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19667-6
Online ISBN: 978-3-642-19668-3
eBook Packages: Computer ScienceComputer Science (R0)