Skip to main content

IDPC-XML: Integrated Data Provenance Capture in XML

  • Conference paper
  • First Online:
Advances in Machine Learning and Data Science

Abstract

In the contemporary world, data provenance is an acute issue in the world of www due to its openness of the Web and the ease of copying and combining interlinked data from different database sources. The term data provenance is defined as lineage of data and movement between databases. Scientists and enterprises use their own analytical tools to process the data provenance. In the current scenario, workflow management systems are popular in scientific domains due to the level of standardization of data formats and analysis. Using graph visualizations, scientists can easily view the data provenance associated with a scientific workflow of any data to understand the methodology and to validate the results. In this paper, we emphasize on a tool-based PROV-DM for collecting provenance data in the XML file and visualizing it as directed graph. We also propose an approach named IDPC-XML for processing and managing the internal data using XML file. This tool collects data provenance obtrusively in a local system using self-generated log and also collects provenance data in XML format which can be visualized as a directed graph to understand the convergence. Relevant case studies of IDPC-XML are discussed and further research scope is pinpointed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Moreau, L., Ludäscher, B., Altintas, I., Barga, R.S., Bowers, S., Callahan, S., Davidson, S.: Special issue: the first provenance challenge. Concur. Computat. Pract. Exp. 20(5), 409–418 (2008)

    Article  Google Scholar 

  2. Cheney, J.: Provenance, XML and the scientific web. In: ACM SIGPLAN Workshop on Programming Language Technology and XML (PLAN-X 2009) (2009)

    Google Scholar 

  3. Ram, S., Liu, J.: A new perspective on the semantics of data provenance. In: Proceedings of the First International Conference on Semantic Web in Provenance Management, vol. 526, pp. 35–40. (Oct 2009). CEUR-WS.org

  4. Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data, pp. 1345–1350. ACM (June 2008)

    Google Scholar 

  5. Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, pp. 539–550. ACM (June 2006)

    Google Scholar 

  6. Buneman, P., Khanna, S., Tan, W.C.: Data provenance: some basic issues. In: International Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 87–93. Springer, Berlin, Heidelberg (Dec 2000)

    Chapter  Google Scholar 

  7. Borkin, M.A., Yeh, C.S., Boyd, M., Macko, P., Gajos, K.Z., Seltzer, M., Pfister, H.: Evaluation of filesystem provenance visualization tools. IEEE Trans. Visual Comput. Graph. 19(12), 2476–2485 (2013)

    Article  Google Scholar 

  8. Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)

    Article  Google Scholar 

  9. Foster, I., Vöckler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: International Conference on Scientific and Statistical Database Management, vol. 37

    Google Scholar 

  10. Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 745–747. ACM (June 2006)

    Google Scholar 

  11. Simmhan, Y.L., Plale, B., Gannon, D.: Towards a quality model for effective data selection in collaboratories. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp. 72–72. IEEE (2006)

    Google Scholar 

  12. Yang, C., Yang, G., Gehani, A., Yegneswaran, V., Tariq, D., Gu, G.: Using provenance patterns to vet sensitive behaviors in Android apps. In: International Conference on Security and Privacy in Communication Systems, pp. 58–77. Springer International Publishing (Oct 2015)

    Google Scholar 

  13. Goble, C., Wroe, C., Stevens, R.: The myGrid project: services, architecture and demonstrator. In: Proceedings of the UK e-Science All Hands Meeting, pp. 595–602

    Google Scholar 

  14. Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.P., Moreau, L.: Recording and using provenance in a protein compressibility experiment. In: Proceedings of 14th IEEE International Symposium on High-Performance Distributed Computing, 2005 on HDPC-14, pp. 201–208. IEEE

    Google Scholar 

  15. Mouallem, P., Barreto, R., Klasky, S., Podhorszki, N., Vouk, M.: Tracking files in the Kepler provenance framework. In: International Conference on Scientific and Statistical Database Management, pp. 273–282. Springer, Berlin, Heidelberg (June 2009)

    Google Scholar 

  16. Simmhan, Y.L., Plale, B., Gannon, D.: A framework for collecting provenance in data-centric scientific workflows. In: 2006 IEEE International Conference on Web Services (ICWS’06) pp. 427–436. IEEE

    Google Scholar 

  17. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Plale, B.: The open provenance model core specification (v1. 1). Fut. Generat. Comp. Syst. 27(6), 743–756 (2011)

    Article  Google Scholar 

  18. Suriarachchi, I., Zhou, Q., Plale, B.: Komadu: a capture and visualization system for scientific data provenance. J. Open Res. Softw. 3.1 (2015)

    Google Scholar 

  19. Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modeling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776. ACM (Mar 2013)

    Google Scholar 

  20. Moreau, L., Missier, P.: Prov-dm: the proven data model (2013)

    Google Scholar 

  21. Asuncion, H.U.: Automated data provenance capture in spreadsheets, with case studies. Fut. Generat. Comput. Syst. 29(8), 2169–2181 (2013)

    Article  Google Scholar 

  22. Videla, A., Jason, J.W.: RabbitMQ in action. Manning (2012)

    Google Scholar 

  23. Vinoski, S.: Advanced message queuing protocol. IEEE Internet Comput. 10(6), 87 (2006)

    Article  Google Scholar 

  24. Hua, H., Curt, T., Stephan, Z.: Prov-xml: the prov xml schema (2013)

    Google Scholar 

  25. Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., Ideker, T.: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3), 431–432 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

This work is partially supported by Indian Institute of Technology (ISM), Govt. of India. The authors wish to express their gratitude and thanks to the Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for providing their support in arranging necessary computing facilities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dharavath Ramesh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramesh, D., Biswas, H., Vallamdas, V.K. (2018). IDPC-XML: Integrated Data Provenance Capture in XML. In: Reddy Edla, D., Lingras, P., Venkatanareshbabu K. (eds) Advances in Machine Learning and Data Science. Advances in Intelligent Systems and Computing, vol 705. Springer, Singapore. https://doi.org/10.1007/978-981-10-8569-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8569-7_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8568-0

  • Online ISBN: 978-981-10-8569-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics