Provenance Storage, Querying, and Visualization in PBase

Cuevas-Vicenttín, Víctor; Kianmajd, Parisa; Ludäscher, Bertram; Missier, Paolo; Chirigati, Fernando; Wei, Yaxing; Koop, David; Dey, Saumen

doi:10.1007/978-3-319-16462-5_24

Víctor Cuevas-Vicenttín¹⁵,
Parisa Kianmajd¹⁵,
Bertram Ludäscher¹⁵,
Paolo Missier¹⁶,
Fernando Chirigati¹⁷,
Yaxing Wei¹⁸,
David Koop¹⁷ &
…
Saumen Dey¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8628))

Included in the following conference series:

International Provenance and Annotation Workshop

1749 Accesses
2 Citations

Abstract

We present PBase, a repository for scientific workflows and their corresponding provenance information that facilitates the sharing of experiments among the scientific community. PBase is interoperable since it uses ProvONE, a standard provenance model for scientific workflows. Workflows and traces are stored in RDF, and with the support of SPARQL and the tree cover encoding, the repository provides a scalable infrastructure for querying the provenance data. Furthermore, through its user interface, it is possible to: visualize workflows and execution traces; visualize reachability relations within these traces; issue SPARQL queries; and visualize query results.

You have full access to this open access chapter, Download conference paper PDF

A Brief Tour Through Provenance in Scientific Workflows and Databases

P-PIF: a ProvONE provenance interoperability framework for analyzing heterogeneous workflow specifications and provenance traces

Article 11 December 2017

Ajinkya Prabhune, Aaron Zweig, … Michael Gertz

Analyzing Provenance Across Heterogeneous Provenance Graphs

Keywords

1 Introduction

In the past few years, scientific workflows have been often used to define and execute a range of experiments. As science is collaborative, the need arises for a repository that allows multiple users to store and query scientific workflow provenance information. Additionally, such a repository must be interoperable, in the sense that workflow traces may come from different systems, and scalable as the number and the size of traces grow, providing an efficient query evaluation.

This paper presents PBase [CKL+14], which addresses three main key points: facilitate the sharing of scientific workflows and their corresponding execution traces among the scientific community; allow user interaction so that users can further explore the repository data; and provide both sharing and interaction in an interoperable and scalable manner. Our repository achieves these goals by: (i) making use of ProvONE [Dat14a], a standard provenance model that brings the advantages of the emerging W3C PROV standard [W3C13] and that addresses the interoperability challenge; (ii) defining a representative set of queries, identified in collaboration with climate scientists, that characterizes the required functionality and user interaction; and (iii) providing a scalable infrastructure based on TDB, the RDF triplestore of the Jena Framework^{Footnote 1} that supports SPARQL, an expressive query language, and its efficient evaluation. PBase also incorporates the tree cover encoding proposed by Agrawal et al. [ABJ89] to improve the performance of reachability queries.

To the best of our knowledge, PBase is the first repository to address all the aforementioned challenges.

2 PBase Features

Interoperability. PBase uses ProvONE [Dat14a] to represent both prospective provenance (i.e. workflow specifications) and retrospective provenance (i.e. execution traces). ProvONE is an extension of the W3C PROV [W3C13] standard and it is specified through an ontology serialized in OWL-2. Its goal is to be expressive enough to cover most workflow models used by different scientific workflow management systems, which allows PBase to work in an interoperable manner.

User Interaction. An essential feature for a provenance repository is to visualize a workflow and its various execution traces. PBase uses a Web GUI for this purpose (see Fig. 1). Furthermore, in collaboration with climate scientists, we have identified a series of queries, specified in SPARQL, that are representative for the functionalities that they require (such queries are available in [Dat14b]). As users may not be familiar with SPARQL, PBase also allows these queries to be issued from the GUI interface through their textual description. When the results of a query are generated, besides presenting them in a text representation, the provenance nodes corresponding to the results are highlighted. To see the lineage of a particular node in a workflow or trace, users can select this node and use the option to highlight its ancestors and descendants.

Scalability. We adopt RDF to store workflows and execution traces—in particular, we use TDB from the Jena Framework. As an example, XML traces from VisTrails^{Footnote 2} can be uploaded through the Web and they are automatically translated into ProvONE RDF and stored in TDB. As mentioned before, PBase uses SPARQL to issue queries in the repository, which allows for an expressive and efficient evaluation. The tree cover encoding [ABJ89] is also implemented: it enables determining reachability relations between nodes by simply comparing integer range intervals, thus avoiding more costly graph explorations and enhancing the performance of PBase.

3 Conclusion

We have presented PBase, a repository for scientific workflows and their corresponding execution traces. It can be regarded as a step towards a repository supporting sophisticated provenance querying and analytics over a large collection of traces. PBase was developed in the context of DataONE^{Footnote 3}, a large scale and federated data infrastructure serving the Earth Sciences community, and our ultimate goal is to incorporate it into this infrastructure.

Notes

References

Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, SIGMOD 1989, pp. 253–262. ACM, New York (1989)
Google Scholar
Cuevas-Vicenttín, V., Kianmajd, P., Ludäscher, B., Missier, P., Chirigati, F.S., Wei, Y., Koop, D., Dey, S.C.: The PBase scientific workflow provenance repository. Int. J. Digit. Curation 9(2), 28–38 (2014)
Article Google Scholar
DataONE Provenance Working Group. ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance (2014). http://purl.org/provone
DataONE Provenance Working Group. The ProvONE Scientific Workflow Provenance Dataset (2014). http://purl.org/provone/provbench
W3C Provenance Working Group. PROV Overview (2013). http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/

Download references

Acknowledgments

The authors thank: members of the DataONE Provenance Working Group, for helping in the specification of PBase; and members of the DataONE EVA Working Group, for their collaboration. This work was supported by NSF Award OCI-0830944 (DataONE).

Author information

Authors and Affiliations

University of California at Davis, Davis, USA
Víctor Cuevas-Vicenttín, Parisa Kianmajd, Bertram Ludäscher & Saumen Dey
Newcastle University, Newcastle upon Tyne, UK
Paolo Missier
New York University, New York, USA
Fernando Chirigati & David Koop
Oak Ridge National Laboratory, Oak Ridge, USA
Yaxing Wei

Authors

Víctor Cuevas-Vicenttín
View author publications
You can also search for this author in PubMed Google Scholar
Parisa Kianmajd
View author publications
You can also search for this author in PubMed Google Scholar
Bertram Ludäscher
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Missier
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Chirigati
View author publications
You can also search for this author in PubMed Google Scholar
Yaxing Wei
View author publications
You can also search for this author in PubMed Google Scholar
David Koop
View author publications
You can also search for this author in PubMed Google Scholar
Saumen Dey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Víctor Cuevas-Vicenttín .

Editor information

Editors and Affiliations

University of Illinois, Urbana-Champaign, USA
Bertram Ludäscher
Indiana University, Bloomington, USA
Beth Plale

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cuevas-Vicenttín, V. et al. (2015). Provenance Storage, Querying, and Visualization in PBase. In: Ludäscher, B., Plale, B. (eds) Provenance and Annotation of Data and Processes. IPAW 2014. Lecture Notes in Computer Science(), vol 8628. Springer, Cham. https://doi.org/10.1007/978-3-319-16462-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-16462-5_24
Published: 21 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16461-8
Online ISBN: 978-3-319-16462-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Provenance Storage, Querying, and Visualization in PBase

Abstract

Similar content being viewed by others

A Brief Tour Through Provenance in Scientific Workflows and Databases

P-PIF: a ProvONE provenance interoperability framework for analyzing heterogeneous workflow specifications and provenance traces

Analyzing Provenance Across Heterogeneous Provenance Graphs

Keywords

1 Introduction

2 PBase Features

3 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Provenance Storage, Querying, and Visualization in PBase

Abstract

Similar content being viewed by others

A Brief Tour Through Provenance in Scientific Workflows and Databases

P-PIF: a ProvONE provenance interoperability framework for analyzing heterogeneous workflow specifications and provenance traces

Analyzing Provenance Across Heterogeneous Provenance Graphs

Keywords

1 Introduction

2 PBase Features

3 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation