IDPC-XML: Integrated Data Provenance Capture in XML

  • Dharavath Ramesh
  • Himangshu Biswas
  • Vijay Kumar Vallamdas
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 705)


In the contemporary world, data provenance is an acute issue in the world of www due to its openness of the Web and the ease of copying and combining interlinked data from different database sources. The term data provenance is defined as lineage of data and movement between databases. Scientists and enterprises use their own analytical tools to process the data provenance. In the current scenario, workflow management systems are popular in scientific domains due to the level of standardization of data formats and analysis. Using graph visualizations, scientists can easily view the data provenance associated with a scientific workflow of any data to understand the methodology and to validate the results. In this paper, we emphasize on a tool-based PROV-DM for collecting provenance data in the XML file and visualizing it as directed graph. We also propose an approach named IDPC-XML for processing and managing the internal data using XML file. This tool collects data provenance obtrusively in a local system using self-generated log and also collects provenance data in XML format which can be visualized as a directed graph to understand the convergence. Relevant case studies of IDPC-XML are discussed and further research scope is pinpointed.


Data provenance PROV-DM IDPC-XML Visualization XML 



This work is partially supported by Indian Institute of Technology (ISM), Govt. of India. The authors wish to express their gratitude and thanks to the Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for providing their support in arranging necessary computing facilities.


  1. 1.
    Moreau, L., Ludäscher, B., Altintas, I., Barga, R.S., Bowers, S., Callahan, S., Davidson, S.: Special issue: the first provenance challenge. Concur. Computat. Pract. Exp. 20(5), 409–418 (2008)CrossRefGoogle Scholar
  2. 2.
    Cheney, J.: Provenance, XML and the scientific web. In: ACM SIGPLAN Workshop on Programming Language Technology and XML (PLAN-X 2009) (2009)Google Scholar
  3. 3.
    Ram, S., Liu, J.: A new perspective on the semantics of data provenance. In: Proceedings of the First International Conference on Semantic Web in Provenance Management, vol. 526, pp. 35–40. (Oct 2009).
  4. 4.
    Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of data, pp. 1345–1350. ACM (June 2008)Google Scholar
  5. 5.
    Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, pp. 539–550. ACM (June 2006)Google Scholar
  6. 6.
    Buneman, P., Khanna, S., Tan, W.C.: Data provenance: some basic issues. In: International Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 87–93. Springer, Berlin, Heidelberg (Dec 2000)CrossRefGoogle Scholar
  7. 7.
    Borkin, M.A., Yeh, C.S., Boyd, M., Macko, P., Gajos, K.Z., Seltzer, M., Pfister, H.: Evaluation of filesystem provenance visualization tools. IEEE Trans. Visual Comput. Graph. 19(12), 2476–2485 (2013)CrossRefGoogle Scholar
  8. 8.
    Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)CrossRefGoogle Scholar
  9. 9.
    Foster, I., Vöckler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: International Conference on Scientific and Statistical Database Management, vol. 37Google Scholar
  10. 10.
    Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 745–747. ACM (June 2006)Google Scholar
  11. 11.
    Simmhan, Y.L., Plale, B., Gannon, D.: Towards a quality model for effective data selection in collaboratories. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp. 72–72. IEEE (2006)Google Scholar
  12. 12.
    Yang, C., Yang, G., Gehani, A., Yegneswaran, V., Tariq, D., Gu, G.: Using provenance patterns to vet sensitive behaviors in Android apps. In: International Conference on Security and Privacy in Communication Systems, pp. 58–77. Springer International Publishing (Oct 2015)Google Scholar
  13. 13.
    Goble, C., Wroe, C., Stevens, R.: The myGrid project: services, architecture and demonstrator. In: Proceedings of the UK e-Science All Hands Meeting, pp. 595–602Google Scholar
  14. 14.
    Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.P., Moreau, L.: Recording and using provenance in a protein compressibility experiment. In: Proceedings of 14th IEEE International Symposium on High-Performance Distributed Computing, 2005 on HDPC-14, pp. 201–208. IEEEGoogle Scholar
  15. 15.
    Mouallem, P., Barreto, R., Klasky, S., Podhorszki, N., Vouk, M.: Tracking files in the Kepler provenance framework. In: International Conference on Scientific and Statistical Database Management, pp. 273–282. Springer, Berlin, Heidelberg (June 2009)Google Scholar
  16. 16.
    Simmhan, Y.L., Plale, B., Gannon, D.: A framework for collecting provenance in data-centric scientific workflows. In: 2006 IEEE International Conference on Web Services (ICWS’06) pp. 427–436. IEEEGoogle Scholar
  17. 17.
    Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Plale, B.: The open provenance model core specification (v1. 1). Fut. Generat. Comp. Syst. 27(6), 743–756 (2011)CrossRefGoogle Scholar
  18. 18.
    Suriarachchi, I., Zhou, Q., Plale, B.: Komadu: a capture and visualization system for scientific data provenance. J. Open Res. Softw. 3.1 (2015)Google Scholar
  19. 19.
    Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modeling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776. ACM (Mar 2013)Google Scholar
  20. 20.
    Moreau, L., Missier, P.: Prov-dm: the proven data model (2013)Google Scholar
  21. 21.
    Asuncion, H.U.: Automated data provenance capture in spreadsheets, with case studies. Fut. Generat. Comput. Syst. 29(8), 2169–2181 (2013)CrossRefGoogle Scholar
  22. 22.
    Videla, A., Jason, J.W.: RabbitMQ in action. Manning (2012)Google Scholar
  23. 23.
    Vinoski, S.: Advanced message queuing protocol. IEEE Internet Comput. 10(6), 87 (2006)CrossRefGoogle Scholar
  24. 24.
    Hua, H., Curt, T., Stephan, Z.: Prov-xml: the prov xml schema (2013)Google Scholar
  25. 25.
    Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L., Ideker, T.: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27(3), 431–432 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Dharavath Ramesh
    • 1
  • Himangshu Biswas
    • 1
  • Vijay Kumar Vallamdas
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology (ISM)DhanbadIndia

Personalised recommendations