Abstract
Provenance is an increasingly important aspect of data management that is often underestimated and neglected by practitioners. In our work, we target the problem of reconstructing provenance of files in a shared folder setting, assuming that only standard filesystem metadata are available. We propose a content-based approach that is able to reconstruct provenance automatically, leveraging several similarity measures and edit distance algorithms, adapting and integrating them into a multi-signal pipeline. We discuss our research methodology and show some promising preliminary results.
Advisors: Paul Groth, Frank van Harmelen
Chapter PDF
References
van der Aalst, W., van Dongen, B.F., Herbst, J., Maruster, L., Schimm, G.: Workflow mining: A survey of issues and approaches. Data & Knowledge Engineering 47(2), 237–267 (2003)
Barbier, G., Liu, H.: Information Provenance in Social Media. In: Salerno, J., Yang, S.J., Nau, D., Chai, S.-K. (eds.) SBP 2011. LNCS, vol. 6589, pp. 276–283. Springer, Heidelberg (2011)
Baryannis, G., Plexousakis, D.: Automated Web Service Composition: State of the Art and Research Challenges. Tech. Rep. October, Tech. Rep. 409, ICS-FORTH (October 2010)
Bendersky, M., Croft, W.B.: Finding text reuse on the web. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining (2009)
Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)
Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences, SEQUENCES 1997 (1997)
Chawathe, S., Garcia-Molina, H.: Meaningful change detection in structured data. ACM SIGMOD Record, 26–37 (1997)
Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: Why, how, and where. Found. Trends Databases 1, 379–474 (2009)
Deolalikar, V., Laffitte, H.: Provenance as data mining: combining file system metadata with content analysis. In: First Workshop on Theory and Practice of Provenance, p. 10. USENIX Association (2009)
Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J., Schlaefer, N., Welty, C.: Building Watson: An overview of the DeepQA project. AI Magazine 31(3), 59–79 (2010)
Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: A survey. Computing in Science and Engg. 10, 11–21 (2008)
Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. Concurrency and Computation: Practice and Experience 20(5), 485–496 (2008)
Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Analysis and Applications 13(1), 113–129 (2009)
Govindan, K., Wang, X., Khan, M., Dogan, G., Zeng, K., Davis, C.: PRONET: Network Trust Assessment Based on Incomplete Provenance. In: IEEE The Premier International Conference for Military Communications (2011)
Groth, P., Gil, Y., Magliacane, S.: Automatic Metadata Annotation through Reconstructing Provenance. In: ESWC (2012)
Holland, D.A., Seltzer, M.I., Braun, U., Muniswamy-Reddy, K.K.: Passing the provenance challenge. Concurrency and Computation: Practice and Experience 20(5), 531–540 (2008)
Hu, B., Rakthanmanon, T., Campana, B., Mueen, A., Keogh, E.: Image mining of historical manuscripts to establish provenance. In: SIAM Conference on Data Mining, SDM (2012)
Huq, M.R., Wombacher, A., Apers, P.M.G.: Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part II. LNCS, vol. 6861, pp. 118–127. Springer, Heidelberg (2011)
Lux, M., Chatzichristofis, S.A.: Lire: lucene image retrieval: an extensible java cbir library. In: Proceedings of the 16th ACM International Conference on Multimedia, pp. 1085–1088 (2008)
Moreau, L.: The foundations for provenance on the web. Found. Trends Web Sci. 2, 99–241 (2010)
Moreau, L., Missier, P.: PROV-DM: The PROV Data Model, http://www.w3.org/TR/prov-dm/
Nies, T.D., Coppens, S., Deursen, D.V., Mannens, E., Walle, R.V.D.: Automatic Discovery of High-Level Provenance using Semantic Similarity. In: IPAW 2012 (2012)
Noy, N.F., Kunnatur, S., Klein, M., Musen, M.A.: Tracking Changes During Ontology Evolution. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 259–273. Springer, Heidelberg (2004)
Rao, J., Su, X.: A Survey of Automated Web Service Composition Methods. In: Cardoso, J., Sheth, A.P. (eds.) SWSWPC 2004. LNCS, vol. 3387, pp. 43–54. Springer, Heidelberg (2005)
Wu, B., Szekely, P., Knoblock, C.A.: Learning data transformation rules through examples: Preliminary results. In: Ninth International Workshop on Information Integration on the Web, IIWeb 2012 (2012)
Zhao, J., Gomadam, K., Prasanna, V.: Predicting Missing Provenance using Semantic Associations in Reservoir Engineering. In: 2011 Fifth IEEE International Conference on Semantic Computing (ICSC), pp. 141–148. IEEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Magliacane, S. (2012). Reconstructing Provenance. In: Cudré-Mauroux, P., et al. The Semantic Web – ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7650. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35173-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-35173-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35172-3
Online ISBN: 978-3-642-35173-0
eBook Packages: Computer ScienceComputer Science (R0)