Skip to main content

Scalability Issues in Designing and Implementing Semantic Provenance Management Systems

  • Conference paper
Data Management in Cloud, Grid and P2P Systems (Globe 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7450))

Abstract

Provenance is a key metadata for assessing electronic documents trustworthiness. Most of the applications exchanging and processing documents on the web or in the cloud become provenance aware and provide heterogeneous, decentralized and not interoperable provenance data. A new type of system emerges, called provenance management system (or PMS). These systems offer a unified way to model, collect and query provenance data from various applications.

This work presents such a system based on semantic web technologies and focuses on scalability issues. In fact, modern infrastructure such as cloud can produce huge volume of provenance data and scalability becomes a major issue.

We describe here an implementation of our PMS based on an NoSQL DBMS coupled with the map-reduce parallel model and present different experimentations illustrating how it scales linearly depending on the size of the processed logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sakka, M.A., Defude, B., Tellez, J.: A semantic framework for the management of enriched provenance logs. In: Proc. of the 26th AINA Conference. IEEE Computer Society (2012)

    Google Scholar 

  2. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y.L., Stephan, E., Bussche, J.V.: The open provenance model core specification (v1.1). In: FGCS (2010)

    Google Scholar 

  3. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)

    Article  Google Scholar 

  4. Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: friends or foes? Commun. ACM 53(1), 64–71 (2010)

    Article  Google Scholar 

  5. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 165–178. ACM, New York (2009)

    Chapter  Google Scholar 

  6. Kiran Kumar, M.R.: Foundations for Provenance-Aware Systems. PhD thesis, Harvard University (2010)

    Google Scholar 

  7. Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of ACM SIGMOD, pp. 1345–1350 (2008)

    Google Scholar 

  8. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34, 31–36 (2005)

    Article  Google Scholar 

  9. Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: A survey. Computing in Science and Engineering, 11–21 (2008)

    Google Scholar 

  10. Groth, P., Jiang, S., Miles, S., Munroe, S., Tan, V., Tsasakou, S., Moreau, L.: An architecture for provenance systems. Technical report (February 2006), http://eprints.ecs.soton.ac.uk/13196 (access on December 2011)

  11. Sudha, R., Jun, L.: A new perspective on semantics of data provenance. In: The First International Workshop on Role of Semantic Web in Provenance Management, SWPM 2009 (2009)

    Google Scholar 

  12. Sahoo, S.S., Sheth, A., Henson, C.: Semantic provenance for escience: Managing the deluge of scientific data. IEEE Internet Computing 12, 46–54 (2008)

    Article  Google Scholar 

  13. Sahoo, S.S., Barga, R., Sheth, A., Thirunarayan, K., Hitzler, P.: Prom: A semantic web framework for provenance management in science. Technical Report KNOESIS-TR-2009, Kno.e.sis Center (2009)

    Google Scholar 

  14. Hartig, O.: Provenance information in the web of data. In: Second Workshop on Linked Data on the Web, LDOW (2009)

    Google Scholar 

  15. Zhao, J., Simmhan, Y., Gomadam, K., Prasanna, V.K.: Querying provenance information in distributed environments. IJCA 18(3), 196–215 (2011)

    Google Scholar 

  16. Chebotko, A., Lu, S., Fei, X., Fotouhi, F.: Rdfprov: A relational rdf store for querying and managing scientific workflow provenance. Data Knowl. Eng., 836–865 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sakka, M.A., Defude, B. (2012). Scalability Issues in Designing and Implementing Semantic Provenance Management Systems. In: Hameurlain, A., Hussain, F.K., Morvan, F., Tjoa, A.M. (eds) Data Management in Cloud, Grid and P2P Systems. Globe 2012. Lecture Notes in Computer Science, vol 7450. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32344-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32344-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32343-0

  • Online ISBN: 978-3-642-32344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics