Skip to main content

Reconstructing Unsound Data Provenance View in Scientific Workflow

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7234))

Abstract

The view of data provenance provides an approach of data abstraction and encapsulation by partitioning tasks in the data provenance graph (DPG) of scientific workflow into a set of composite modules due to the data flow relations among them, so as to efficiently decrease the workload consumed by researchers making analysis on the data provenance and the time needed in doing data querying. However, unless a view is carefully designed, it may not preserve the dataflow between tasks in the workflow. Concentrating on this scenario, we propose a method for reconstructing unsound view. We also design a polynomial-time algorithm, and analyze its maximal time complexity. Finally, we give an example and conduct comprehensive experiments to show the feasibility and effectiveness of our method.

This work is partially supported by the National Natural Science Foundation of China under Grant No.60873022, 60903053, 61003047, the Natural Science Foundation of Zhejiang Province (Z1100822), the Open Foundation of State Key Laboratory for Novel Software Technology of Nanjing University (KFKT2011B07).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sun, P., Liu, Z.Y., Susan, D., Chen, Y.: Detecting and resolving unsound workflow views for correct provenance analysis. In: Cetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) The ACM SIGMOD International Conference on Management of Data, pp. 549–562. ACM, Rhode Island (2009)

    Google Scholar 

  2. Zou, Z.N., Li, J.Z., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graphs. Journal of Software 20, 2965–2976 (2009)

    Article  Google Scholar 

  3. Chui, C.-K., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Hintsanen, P., Toivonen, H.: Finding reliable subgraphs from large probabilistic graphs. In: Daelemans, W., Goethals, B., Morik, K. (eds.) Data Mining and Knowledge Discovery, vol. 17, pp. 3–23. Springer, Tucson (1997)

    Google Scholar 

  5. Cheng, J., Yu, J., Lin, X.: Fast computing reachability labelings for large graphs with high compression rate. In: Kemper, A., Valduriez, P., Mouaddib, N., Teubner, J., Bouzeghoub, M., Markl, V., Amsaleg, L., Manolescu, I. (eds.) The 11th International Conference on Extending Database Technology, pp. 193–204. ACM, Nantes (2008)

    Chapter  Google Scholar 

  6. Jin, R., Hong, H., Wang, H.X., Ruan, N., Xiang, Y.: Computing label-constraint reachability in graph databases. In: Elmagarmid, A.K., Agrawal, D. (eds.) The ACM SIGMOD International Conference on Management of Data, pp. 123–134. ACM, Indianapolis (2010)

    Google Scholar 

  7. Biton, O., Davidson, S.B., Khanna, S., Roy, S.: Optimizing user views for workflows. In: Ronald, F. (ed.) The 12th International Conference on Database Theory, pp. 310–323. ACM, Saint-Petersburg (2009)

    Google Scholar 

  8. Biton, O., Boulakia, S.C., Davidson, S.B., Hara, C.S.: Querying and managing provenance through user views in scientific workflows. In: The 24th Int’l Conf. on Data Engineering, pp. 1072–1081. IEEE, Cancun (2008)

    Chapter  Google Scholar 

  9. Zhou, S.G., Yu, Z.C., Jiang, H.L.: Concepts, issues, and advances of searching in graph structured data. Communication 3, 59–65 (2007)

    Google Scholar 

  10. Shasha, D., Wang, T.L., Guigno, R.: Algorithmics and applications of tee and graph searching. In: Franklin, M.J., Moon, B., Ailamaki, A. (eds.) The 21st ACM SIGMOD- SIGART Symposium on Principles of Database Systems, Madison, pp. 39–52 (2002)

    Google Scholar 

  11. Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure based approach. In: Weikum, G., König, A.C., Deßloch, S. (eds.) The 2004 ACM SIGMOD International Conference on Management of Data, pp. 335–346. ACM, Paris (2004)

    Chapter  Google Scholar 

  12. Gao, H., Zhang, W.: Research status of the management of uncertain graph data. In: Communications of the China Computer Federation, pp. 31–36 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, H., Liu, Z., Hu, H. (2012). Reconstructing Unsound Data Provenance View in Scientific Workflow. In: Wang, H., et al. Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29426-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29426-6_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29425-9

  • Online ISBN: 978-3-642-29426-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics