Abstract
Automatic provenance collection describes systems that observe processes and data transformations inferring, collecting, and maintaining provenance about them. Automatic collection is a powerful tool for analysis of objects and processes, providing a level of transparency and pervasiveness not found in more conventional provenance systems. Unfortunately, automatic collection is also difficult. We discuss the challenges we encountered and the issues we exposed as we developed an automatic provenance collector that runs at the operating system level.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Barham, P.T., Donnelly, A., Isaacs, R., Mortier, R.: Using magpie for request extraction and workload modelling. In: OSDI, pp. 259–272 (2004)
Muniswamy-Reddy, K.-K. Seltzer, M.: Coping with cycles in provenance, http://www.eecs.harvard.edu/~syrah/pass/pubs/cycles.pdf
Braun, U., Shinnar, A.: A Security Model for Provenance. Technical Report TR-04-06, Harvard University (January 2006)
Cornell, B., Dinda, P., Bustamante, F.: Wayback: A User-level Versioning File System for Linux. In: Proceedings of the USENIX 2004 Annual Technical Conference, FREENIX Track (2004)
Edmonds, R.: Justice department hid parts of report criticizing diversity effort. Associated Press (October 31, 2003)
Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration. In: CIDR, Asilomar, CA (January 2003)
Frew, J., Bose, R.: Earth system science workbench: A data management infrastructure for earth science products. In: Proceedings of the 13th International Conference on Scientific and Statistical Database Management, pp. 180–189. IEEE Computer Society, Los Alamitos (2001)
GenePattern, http://www.broad.mit.edu/cancer/software/genepattern
Heydon, A., Levin, R., Mann, T., Yu, Y.: The Vesta Approach to Software Configuration Management. Technical Report 168, Compaq Systems Research Center (March 2001)
Hitz, D., Lau, J., Malcolm, M.: File System Design for an NFS File Server Appliance. In: Proceedings of the USENIX Winter Technical Conference, January 1994, pp. 235–245 (1994)
Muniswamy-Reddy, K., Wright, C.P., Himmer, A., Zadok, E.: A Versatile and User-Oriented Versioning File System. In: Proceedings of the Third USENIX Conference on File and Storage Technologies (FAST 2004), San Francisco, CA (March/April 2004)
Lee, E.K., Thekkath, C.A.: Petal: Distributed virtual disks. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-7), Cambridge, MA, pp. 84–92 (1996)
Lineage File System, http://crypto.stanford.edu/~cao/lineage.html
McCoy, K.: VMS File System Internals. Digital Press (1990)
Microsoft. How to use ntfs alternate data streams (July 13, 2004)
Muchnick, S.: Advanced Compiler Design and Implementation, ch. 8. Morgan Kaufmann, San Francisco (1997)
Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: Proceedings of the 2006 USENIX Annual Technical Conference (June 2006)
Nost. Definition of the flexible image transport system (FITS) (1999)
Organisation for Economic Co-operation and Development. Guidelines on the protection of privacy and transborder flows of personal data (1980)
Pancerella, C., et al.: Metadata in the Collaboratory for Multi-scale Chemical Science. In: Dublin Core Conference, Seattle, WA (2003)
Peterson, Z.N.J., Burns, R.C.: Ext3cow: The design, Implementation, and Analysis of Metadat for a Time-Shifting File System. Technical Report HSSL-2003-03, Computer Science Department, The Johns Hopkins University (2003), http://hssl.cs.jhu.edu/papers/peterson-ext3cow03.pdf
Provenance aware service oriented architecture, http://twiki.pasoa.ecs.soton.ac.uk/bin/view/PASOA/WebHome
Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of First USENIX conference on File and Storage Technologies, January 2002, pp. 89–101 (2002)
Santry, D.J., Feeley, M.J., Hutchinson, N.C., Veitch, A.C.: Elephant: The file system that never forgets. In: Workshop on Hot Topics in Operating Systems, pp. 2–7 (1999)
Seward, J.: Valgrind, an open-source memory debugger for GNU/Linux (2005), http://valgrind.org
Shankland, S., Ard, S.: Document shows SCO prepped lawsuit against BofA. News.Com. (March 4, 2004)
Vahdat, A., Anderson, T.: Transparent result caching. Technical Report CSD-97-974, 8 (1997)
Wan, M., Rajasekar, A., Schroeder, W.: An Overview of the SRB 3.0: the Federated MCAT (September 2003), http://www.npaci.edu/DICE/SRB/FedMcat.html
Weitzner, D.J., Abelson, H., Berners-Lee, T., Hanson, C., Hendler, J., Kagal, L., McGuinness, D.L., Sussman, G.J., Waterman, K.K.: Transparent accountable data mining: New strategies for privacy protection. Technical report, Massachusets Institute of Technology Computer Science and Artificial Intelligence Laboratory (2006)
Wong, E.: Web site lists Iran coup names. The New York Times (June 24, 2000)
Zhao, J., Goble, M., Greenwood, C., Wroe, C., Stevens, R.: Annotating, linking and browsing provenance logs for e-science
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Braun, U., Garfinkel, S., Holland, D.A., Muniswamy-Reddy, KK., Seltzer, M.I. (2006). Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds) Provenance and Annotation of Data. IPAW 2006. Lecture Notes in Computer Science, vol 4145. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11890850_18
Download citation
DOI: https://doi.org/10.1007/11890850_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46302-3
Online ISBN: 978-3-540-46303-0
eBook Packages: Computer ScienceComputer Science (R0)