Abstract
Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientific workflows and is used in the Linked Environments for Atmospheric Discovery (LEAD) project. This article presents a performance analysis of the Karma service as compared against the contemporary PReServ provenance service. Our study finds that Karma scales exceedingly well for collecting and querying provenance records, showing linear or sub-linear scaling with increasing number of provenance records and clients when tested against workloads in the order of 10,000 application-service invocations and over 36 concurrent clients.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Personal communication with Paul Groth, University of Southampton (2006)
Simple Linux Utility for Resource Management (SLURM) Reference Manual. Technical Report UCRL-WEB-201386, Lawrence Livermore National Laboratory (2006)
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006)
Bose, R., Frew, J.: Lineage Retrieval for Scientific Data Processing: A Survey. ACM Computing Surveys 37(1), 1–28 (2005)
Box, D., Cabrera, L.F., Critchley, C., Curbera, F., Ferguson, D., Geller, A., Graham, S., Hull, D., Kakivaya, G., Lewis, A., Lovering, B., Mihic, M., Niblett, P., Orchard, D., Saiyed, J., Samdarshi, S., Schlimmer, J., Sedukhin, I., Shewchuk, J., Smith, B., Weerawarana, S., Wortendyke, D.: Web Services Eventing (WS-Eventing) (August 2004)
Braun, U., Garfinkel, S., Holland, D.A., Muniswamy-Reddy, K.-K., Seltzer, M.I.: Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)
Freire, J., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing Rapidly-Evolving Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006)
Groth, P., Luck, M., Moreau, L.: A Protocol for Recording Provenance in Service-oriented Grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)
Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.-P., Moreau, L.: Recording and Using Provenance in a Protein Compressibility Experiment. In: HPDC (2005)
Huang, Y., Slominski, A., Herath, C., Gannon, D.: WS-Messenger: A Web Services based Messaging System for Service-Oriented Grid Computing. In: CCGrid (2006)
Kandaswamy, G., Fang, L., Huang, Y., Shirasuna, S., Marru, S., Gannon, D.: Building Web Services for Scientific Grid Applications. IBM Journal of Research and Development 50(2/3), 249–260 (2006)
Myers, J.D., Pancerella, C., Lansing, C., Schuchardt, K.L., Didier, B.: Multi-Scale Science: Supporting Emerging Practice with Semantically Derived Provenance. In: Semantic Web Technologies for Searching and Retrieving Scientific Data Workshop (2003)
Plale, B.: Resource Requirements Study for LEAD Storage Repository. Technical Report 001, Linked Environments for Atmospheric Discovery (2005)
Plale, B., Gannon, D., Reed, D., Graves, S., Droegemeier, K., Wilhelmson, B., Ramamurthy, M.: Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3515, pp. 624–631. Springer, Heidelberg (2005)
Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Record 34(3), 31–36 (2005)
Simmhan, Y.L., Plale, B., Gannon, D.: A Framework for Collecting Provenance in Data-Centric Scientific Workflows. In: ICWS (2006)
Simmhan, Y.L., Plale, B., Gannon, D.: Towards a Quality Model for Effective Data Selection in Collaboratories. In: IEEE Workshop on Scientific Workflows and Dataflows (SciFlow) (2006)
Zhao, J., Goble, C., Stevens, R.: An Identity Crisis in The Life Sciences. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 254–269. Springer, Heidelberg (2006)
Zhao, Y., Wilde, M., Foster, I.T.: Applying the Virtual Data Provenance Model. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 148–161. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Simmhan, Y.L., Plale, B., Gannon, D., Marru, S. (2006). Performance Evaluation of the Karma Provenance Framework for Scientific Workflows. In: Moreau, L., Foster, I. (eds) Provenance and Annotation of Data. IPAW 2006. Lecture Notes in Computer Science, vol 4145. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11890850_23
Download citation
DOI: https://doi.org/10.1007/11890850_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46302-3
Online ISBN: 978-3-540-46303-0
eBook Packages: Computer ScienceComputer Science (R0)