Abstract
The view of data provenance provides an approach of data abstraction and encapsulation by partitioning tasks in the data provenance graph (DPG) of scientific workflow into a set of composite modules due to the data flow relations among them, so as to efficiently decrease the workload consumed by researchers making analysis on the data provenance and the time needed in doing data querying. However, unless a view is carefully designed, it may not preserve the dataflow between tasks in the workflow. Concentrating on this scenario, we propose a method for reconstructing unsound view. We also design a polynomial-time algorithm, and analyze its maximal time complexity. Finally, we give an example and conduct comprehensive experiments to show the feasibility and effectiveness of our method.
This work is partially supported by the National Natural Science Foundation of China under Grant No.60873022, 60903053, 61003047, the Natural Science Foundation of Zhejiang Province (Z1100822), the Open Foundation of State Key Laboratory for Novel Software Technology of Nanjing University (KFKT2011B07).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sun, P., Liu, Z.Y., Susan, D., Chen, Y.: Detecting and resolving unsound workflow views for correct provenance analysis. In: Cetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) The ACM SIGMOD International Conference on Management of Data, pp. 549–562. ACM, Rhode Island (2009)
Zou, Z.N., Li, J.Z., Gao, H., Zhang, S.: Mining frequent subgraph patterns from uncertain graphs. Journal of Software 20, 2965–2976 (2009)
Chui, C.-K., Kao, B., Hung, E.: Mining Frequent Itemsets from Uncertain Data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007)
Hintsanen, P., Toivonen, H.: Finding reliable subgraphs from large probabilistic graphs. In: Daelemans, W., Goethals, B., Morik, K. (eds.) Data Mining and Knowledge Discovery, vol. 17, pp. 3–23. Springer, Tucson (1997)
Cheng, J., Yu, J., Lin, X.: Fast computing reachability labelings for large graphs with high compression rate. In: Kemper, A., Valduriez, P., Mouaddib, N., Teubner, J., Bouzeghoub, M., Markl, V., Amsaleg, L., Manolescu, I. (eds.) The 11th International Conference on Extending Database Technology, pp. 193–204. ACM, Nantes (2008)
Jin, R., Hong, H., Wang, H.X., Ruan, N., Xiang, Y.: Computing label-constraint reachability in graph databases. In: Elmagarmid, A.K., Agrawal, D. (eds.) The ACM SIGMOD International Conference on Management of Data, pp. 123–134. ACM, Indianapolis (2010)
Biton, O., Davidson, S.B., Khanna, S., Roy, S.: Optimizing user views for workflows. In: Ronald, F. (ed.) The 12th International Conference on Database Theory, pp. 310–323. ACM, Saint-Petersburg (2009)
Biton, O., Boulakia, S.C., Davidson, S.B., Hara, C.S.: Querying and managing provenance through user views in scientific workflows. In: The 24th Int’l Conf. on Data Engineering, pp. 1072–1081. IEEE, Cancun (2008)
Zhou, S.G., Yu, Z.C., Jiang, H.L.: Concepts, issues, and advances of searching in graph structured data. Communication 3, 59–65 (2007)
Shasha, D., Wang, T.L., Guigno, R.: Algorithmics and applications of tee and graph searching. In: Franklin, M.J., Moon, B., Ailamaki, A. (eds.) The 21st ACM SIGMOD- SIGART Symposium on Principles of Database Systems, Madison, pp. 39–52 (2002)
Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure based approach. In: Weikum, G., König, A.C., Deßloch, S. (eds.) The 2004 ACM SIGMOD International Conference on Management of Data, pp. 335–346. ACM, Paris (2004)
Gao, H., Zhang, W.: Research status of the management of uncertain graph data. In: Communications of the China Computer Federation, pp. 31–36 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, H., Liu, Z., Hu, H. (2012). Reconstructing Unsound Data Provenance View in Scientific Workflow. In: Wang, H., et al. Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29426-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-29426-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29425-9
Online ISBN: 978-3-642-29426-6
eBook Packages: Computer ScienceComputer Science (R0)