Document Provenance in the Cloud: Constraints and Challenges

  • Mohamed Amin Sakka
  • Bruno Defude
  • Jorge Tellez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6164)


The amounts of digital information are growing in size and complexity. With the emergence of distributed services over internet and the booming of electronic exchanges, the need to identify information origins and its lifecycle history becomes essential. Essential because it’s the only factor ensuring information integrity and probative value. That’s why in different areas like government, commerce, medicine and science, tracking data origins is essential and can serve for informational, quality, forensics, regulatory compliance, rights protection and intellectual property purposes. Managing information provenance is a complex task and it has been extensively treated in databases, file system and scientific workflows. However, provenance in the cloud is a more challenging task due to specific problems related to the cloud added to the traditional ones.


Cloud Computing Electronic Document Data Provenance Information Provenance Open Archival Information System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Shubha, U., Nabar, C.U., Sugihara, T., Widom, J.: ULDBs: Databases with Uncertainty and Lineage. In: Trio: A System for Data, Uncertainty, and Lineage. VLDB 2006, pp. 1151–1154 (2006)Google Scholar
  2. 2.
    Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An Annotation Management System for Relational Databases. In: VLDB, pp. 900–911 (2004)Google Scholar
  3. 3.
    Braun, U., Shinnar, A., Seltzer, M.: Securing provenance. In: Third USENIX Workshop on Hot Topics in Security (HotSec) (July 2008)Google Scholar
  4. 4.
    Buneman, P., Khanna, S., Tan, W.C.: Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Cameron, G.: Provenance and Pragmatics. In: Workshop on Data Provenance and Annotation (2003)Google Scholar
  6. 6.
    Cheney, J., Chong, S., Foster, N., Seltzer, M., Vansummeren, S.: Provenance: A Future History. In: International Conference on Object Oriented Programming, Systems, Languages and Applications, pp. 957–964 (2009)Google Scholar
  7. 7.
    Da Silva, P.P., McGuinness, D.L., McCool, R.: Knowledge Provenance Infrastructure. IEEE Data Engineering Bulletin 26, 26–32 (2003)Google Scholar
  8. 8.
    Davidson, S., Cohen-Boulakia, S., Eyal, A., Ludascher, B., McPhillips, T., Bowers, S., Freire, J.: Provenance in Scientific Workflow Systems. IEEE Data Engineering Bulletin 32, 44–50 (2007)Google Scholar
  9. 9.
    Goble, C.: Position Statement: Musings on Provenance, Workow and Semantic Web Annotations for Bioinformatics. In: Workshop on Data Derivation and Provenance (2002)Google Scholar
  10. 10.
    Grand challenges in computing research conference (2008), UK Computing Society:
  11. 11.
    Hasan, R., Yurcik, W., Myagmar, S.: The Evolution of Storage Service Providers: Techniques and Challenges to Outsourcing Storage. In: Proceedings of the 2005 ACM workshop on Storage Security and Survivability (2005)Google Scholar
  12. 12.
    Hassan, R., Sion, R., Winslett, M.: Preventing History Forgery with Secure Provenance. ACM Transactions on Storage (2009)Google Scholar
  13. 13.
    Hassan, R., Sion, R., Winslett, M.: Remembrance: The Unbearable Sentience of Being Digital. In: Fourth Biennial Conference on Innovative Data Systems Research (2009)Google Scholar
  14. 14.
    INFOSEC Research Council (IRC) Hard problem list. Technical report (November 2005),
  15. 15.
    ISO 14721:2003. Space data and information transfer systems - Open Archival Information System Reference model (OAIS),
  16. 16.
    Miles, S., Groth, P.T., Munroe, S., Jiang, S., Assandri, T., Moreau, L.: Extracting causal graphs from an open provenance data model. Concurrency and Computation: Practice and Experience 20(5), 577–586 (2008)CrossRefGoogle Scholar
  17. 17.
    MoReq2 specifications. Model Requirements for the management of electronic records Update and Extension (2008),
  18. 18.
    Muniswamy-Reddy, K.K., Holland, D.A., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: USENIX Annual Technical Conference, General Track, pp. 43–56 (2006)Google Scholar
  19. 19.
    NF Z42-013. Electronic archival storage-Specifications relative to the design and operation of information processing systems in view of ensuring the storage and integrity of the recording stored in these systems,
  20. 20.
    Sar, C., Cao, P.: Lineage file system. Technical Report (January 2005),
  21. 21.
    Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Record (Special section on scientific workflows) 34(3), 31–36 (2005)Google Scholar
  22. 22.
    ISO Standard for using PDF format for the long-term archiving of electronic documents ISO-19005-1 - Document management - Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF/A-1),
  23. 23.
    Moreau, L., Plale, B., Miles, S., Goble, C., Missier, P., Barga, R., Simmhan, Y., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P., Bowers, S., Ludaescher, B., Kwasnikowska, N., Van den Bussche, J., Ellkvist, T., Freire, J., Groth, P.: The Open Provenance Model (v1.01) specifications. Future Generation Computer Systems (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mohamed Amin Sakka
    • 1
    • 2
  • Bruno Defude
    • 1
  • Jorge Tellez
    • 2
  1. 1.Novapost, Novapost R&DParisFrance
  2. 2.TELECOM& Management SudParis, CNRS UMR SamovarEvry cedexFrance

Personalised recommendations