Incremental Inference of Provenance Types

Kohan Marzagão, David; Huynh, Trung Dong; Moreau, Luc

doi:10.1007/978-3-030-80960-7_9

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12839))

Included in the following conference series:

652 Accesses

Abstract

Long-running applications nowadays are increasingly instrumented to continuously log provenance. In that context, we observe an emerging need for processing fragments of provenance continuously produced by applications. Thus, there is an increasing requirement for processing of provenance incrementally, while the application is still running, to replace batch processing of a complete provenance dataset available only after the application has completed. A type of processing of particular interest is summarising provenance graphs, which has been proposed as an effective way of extracting key features of provenance and storing them in an efficient manner. To that goal, summarisation makes use of provenance types, which, in loose terms, are an encoding of the neighbourhood of nodes.

This paper shows that the process of creating provenance summaries of continuously provided data can benefit from a mode of incremental processing of provenance types. We also introduce the concept of a library of types to reduce the need for storing copies of the same string representations for types multiple times. Further, we show that the computational complexity associated with the task of inferring types is, in most common cases, the best possible: only new nodes have to be processed. We also identify and analyse the exception scenarios. Finally, although our library of types, in theory, can be exponentially large, we present empirical results that show it is quite compact in practice.

This work is supported by a Department of Navy award (Award No. N62909-18-1-2079) issued by the Office of Naval Research. The United States Government has a royalty-free license throughout the world in all copyrightable material contained herein.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Provenance in a Modifiable Data Set

A Framework for Provenance-Preserving History Distribution and Incremental Reduction

Dynamic Provenance for SPARQL Updates

Notes

1.
Note that when application types are included, a provenance expression may have more than one label, for e.g. \(lab(v) = \{\text {ag}, \text {Prov:Operator}\}\). When \(lab(v)\) is a singleton set, we will abuse notation and omit the set-brackets.
2.
Note that removing a node automatically removes all edges connected to it.
3.
For readability, we index elements of map \(\mathcal {T}_{k}\) with k.
4.
Recall that nodes may have more than one label.

References

Chirigati, F., Shasha, D., Freire, J.: Reprozip: using provenance to support computational reproducibility. In: Presented as part of the 5th USENIX Workshop on the Theory and Practice of Provenance (2013)
Google Scholar
Fan, W., Wang, X., Wu, Y.: Incremental graph pattern matching. ACM Trans. Database Syst. 38(3) (2013). https://doi.org/10.1145/2489791
Gil, Y., et al.: PROV model primer. W3C Working Group Note (2013)
Google Scholar
Glavic, B., Sheykh Esmaili, K., Fischer, P.M., Tatbul, N.: Ariadne: managing fine-grained provenance on data streams. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems, DEBS 2013, pp. 39–50. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2488222.2488256
Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases. In: 23rd International Conference on Very Large Data Bases (VLDB 1997) (1997). http://ilpubs.stanford.edu:8090/232/
Gou, X., Zou, L., Zhao, C., Yang, T.: Fast and accurate graph stream summarization. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1118–1129. IEEE (2019)
Google Scholar
Groth, P., Moreau, L. (eds.): PROV-Overview. An Overview of the PROV Family of Documents. W3C Working Group Note NOTE-PROV-overview-20130430, World Wide Web Consortium, April 2013. http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
Han, X., Pasquier, T., Ranjan, T., Goldstein, M., Seltzer, M.: Frappuccino: fault-detection through runtime analysis of provenance. In: 9th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2017) (2017)
Google Scholar
Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Google Scholar
Ma, X., Fox, P., Tilmes, C., Jacobs, K., Waple, A.: Capturing provenance of global change information. Nat. Clim. Chang. 4, 409–413 (2014). https://doi.org/10.1038/nclimate2141
Article Google Scholar
Mariconti, E., Onwuzurike, L., Andriotis, P., Cristofaro, E.D., Ross, G.J., Stringhini, G.: Mamadroid: detecting android malware by building Markov chains of behavioral models. CoRR abs/1612.04433 (2016). http://arxiv.org/abs/1612.04433
Moreau, L.: The foundations for provenance on the web. Found. Trends Web Sci. 2(2–3), 99–241 (2010). https://doi.org/10.1561/1800000010
Article MathSciNet Google Scholar
Moreau, L.: Aggregation by provenance types: a technique for summarising provenance graphs. In: Graphs as Models 2015 (An ETAPS 2015 Workshop), pp. 129–144. Electronic Proceedings in Theoretical Computer Science, London, UK, April 2015. https://doi.org/10.4204/EPTCS.181.9
Ramchurn, S., Huynh, T.D., Venanzi, M., Shi, B.: Collabmap: crowdsourcing maps for emergency planning. In: Proceedings of the 3rd Annual ACM Web Science Conference, WebSci 2013, pp. 326–335 (2013). https://doi.org/10.1145/2464464.2464508
Ramchurn, S.D., et al.: A disaster response system based on human-agent collectives. J. Artif. Intell. Res. 57, 661–708 (2016)
Article Google Scholar
Shervashidze, N., Schweitzer, P., Leeuwen, E.J.V., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-Lehman graph kernels. J. Mach. Learn. Re. 12(Sep), 2539–2561 (2011)
Google Scholar
Song, C., Ge, T.: Labeled graph sketches. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1312–1315. IEEE (2018)
Google Scholar
Vijayakumar, N.N., Plale, B.: Towards low overhead provenance tracking in near real-time stream filtering. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 46–54. Springer, Heidelberg (2006). https://doi.org/10.1007/11890850_6
Chapter Google Scholar
Vries, G.K.D.: A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8188, pp. 606–621. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40988-2_39
Chapter Google Scholar
Yao, Y., Holder, L.: Scalable SVM-based classification in dynamic graphs. In: 2014 IEEE International Conference on Data Mining, pp. 650–659, December 2014. https://doi.org/10.1109/ICDM.2014.69

Download references

Author information

Authors and Affiliations

King’s College London, London, WC2B 4BG, UK
David Kohan Marzagão, Trung Dong Huynh & Luc Moreau

Authors

David Kohan Marzagão
View author publications
You can also search for this author in PubMed Google Scholar
Trung Dong Huynh
View author publications
You can also search for this author in PubMed Google Scholar
Luc Moreau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Kohan Marzagão .

Editor information

Editors and Affiliations

Illinois Institute of Technology, Chicago, IL, USA
Boris Glavic
Fluminense Federal University, Niterói, Brazil
Vanessa Braganholo
Northern Illinois University, DeKalb, IL, USA
David Koop

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kohan Marzagão, D., Huynh, T.D., Moreau, L. (2021). Incremental Inference of Provenance Types. In: Glavic, B., Braganholo, V., Koop, D. (eds) Provenance and Annotation of Data and Processes. IPAW IPAW 2020 2021. Lecture Notes in Computer Science(), vol 12839. Springer, Cham. https://doi.org/10.1007/978-3-030-80960-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-80960-7_9
Published: 09 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80959-1
Online ISBN: 978-3-030-80960-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incremental Inference of Provenance Types

Abstract

Access this chapter

Similar content being viewed by others

Provenance in a Modifiable Data Set

A Framework for Provenance-Preserving History Distribution and Incremental Reduction

Dynamic Provenance for SPARQL Updates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Incremental Inference of Provenance Types

Abstract

Access this chapter

Similar content being viewed by others

Provenance in a Modifiable Data Set

A Framework for Provenance-Preserving History Distribution and Incremental Reduction

Dynamic Provenance for SPARQL Updates

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation