Abstract
In the contemporary world, data provenance is an acute issue in the world of www due to its openness of the Web and the ease of copying and combining interlinked data from different database sources. The term data provenance is defined as lineage of data and movement between databases. Scientists and enterprises use their own analytical tools to process the data provenance. In the current scenario, workflow management systems are popular in scientific domains due to the level of standardization of data formats and analysis. Using graph visualizations, scientists can easily view the data provenance associated with a scientific workflow of any data to understand the methodology and to validate the results. In this paper, we emphasize on a tool-based PROV-DM for collecting provenance data in the XML file and visualizing it as directed graph. We also propose an approach named IDPC-XML for processing and managing the internal data using XML file. This tool collects data provenance obtrusively in a local system using self-generated log and also collects provenance data in XML format which can be visualized as a directed graph to understand the convergence. Relevant case studies of IDPC-XML are discussed and further research scope is pinpointed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Moreau, L., Ludäscher, B., Altintas, I., Barga, R.S., Bowers, S., Callahan, S., Davidson, S.: Special issue: the first provenance challenge. Concur. Computat. Pract. Exp. 20(5), 409–418 (2008)
Cheney, J.: Provenance, XML and the scientific web. In: ACM SIGPLAN Workshop on Programming Language Technology and XML (PLAN-X 2009) (2009)
Ram, S., Liu, J.: A new perspective on the semantics of data provenance. In: Proceedings of the First International Conference on Semantic Web in Provenance Management, vol. 526, pp. 35–40. (Oct 2009). CEUR-WS.org
Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data, pp. 1345–1350. ACM (June 2008)
Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, pp. 539–550. ACM (June 2006)
Buneman, P., Khanna, S., Tan, W.C.: Data provenance: some basic issues. In: International Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 87–93. Springer, Berlin, Heidelberg (Dec 2000)
Borkin, M.A., Yeh, C.S., Boyd, M., Macko, P., Gajos, K.Z., Seltzer, M., Pfister, H.: Evaluation of filesystem provenance visualization tools. IEEE Trans. Visual Comput. Graph. 19(12), 2476–2485 (2013)
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)
Foster, I., Vöckler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: International Conference on Scientific and Statistical Database Management, vol. 37
Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 745–747. ACM (June 2006)
Simmhan, Y.L., Plale, B., Gannon, D.: Towards a quality model for effective data selection in collaboratories. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp. 72–72. IEEE (2006)
Yang, C., Yang, G., Gehani, A., Yegneswaran, V., Tariq, D., Gu, G.: Using provenance patterns to vet sensitive behaviors in Android apps. In: International Conference on Security and Privacy in Communication Systems, pp. 58–77. Springer International Publishing (Oct 2015)
Goble, C., Wroe, C., Stevens, R.: The myGrid project: services, architecture and demonstrator. In: Proceedings of the UK e-Science All Hands Meeting, pp. 595–602
Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.P., Moreau, L.: Recording and using provenance in a protein compressibility experiment. In: Proceedings of 14th IEEE International Symposium on High-Performance Distributed Computing, 2005 on HDPC-14, pp. 201–208. IEEE
Mouallem, P., Barreto, R., Klasky, S., Podhorszki, N., Vouk, M.: Tracking files in the Kepler provenance framework. In: International Conference on Scientific and Statistical Database Management, pp. 273–282. Springer, Berlin, Heidelberg (June 2009)
Simmhan, Y.L., Plale, B., Gannon, D.: A framework for collecting provenance in data-centric scientific workflows. In: 2006 IEEE International Conference on Web Services (ICWS’06) pp. 427–436. IEEE
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Plale, B.: The open provenance model core specification (v1. 1). Fut. Generat. Comp. Syst. 27(6), 743–756 (2011)
Suriarachchi, I., Zhou, Q., Plale, B.: Komadu: a capture and visualization system for scientific data provenance. J. Open Res. Softw. 3.1 (2015)
Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modeling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776. ACM (Mar 2013)
Moreau, L., Missier, P.: Prov-dm: the proven data model (2013)
Asuncion, H.U.: Automated data provenance capture in spreadsheets, with case studies. Fut. Generat. Comput. Syst. 29(8), 2169–2181 (2013)
Videla, A., Jason, J.W.: RabbitMQ in action. Manning (2012)
Vinoski, S.: Advanced message queuing protocol. IEEE Internet Comput. 10(6), 87 (2006)
Hua, H., Curt, T., Stephan, Z.: Prov-xml: the prov xml schema (2013)
Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., Ideker, T.: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3), 431–432 (2011)
Acknowledgements
This work is partially supported by Indian Institute of Technology (ISM), Govt. of India. The authors wish to express their gratitude and thanks to the Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for providing their support in arranging necessary computing facilities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ramesh, D., Biswas, H., Vallamdas, V.K. (2018). IDPC-XML: Integrated Data Provenance Capture in XML. In: Reddy Edla, D., Lingras, P., Venkatanareshbabu K. (eds) Advances in Machine Learning and Data Science. Advances in Intelligent Systems and Computing, vol 705. Springer, Singapore. https://doi.org/10.1007/978-981-10-8569-7_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-8569-7_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8568-0
Online ISBN: 978-981-10-8569-7
eBook Packages: EngineeringEngineering (R0)