Abstract
Public repositories of scientific and business workflows are gaining growing attention as a means to enable understanding, reuse and ultimately the reproducibility of the processes such workflows incarnate. However, as the number of workflows hosted by such repositories grows, their users face difficulties when it come to exploring and querying workflows. In this paper, we explore a functionality that can help repository administrators to index their workflows, and users to identify the workflows that are of interest to them. In particular, we investigate the problem of finding frequent and similar fragments in workflows using graph mining techniques. Our objective is not to come up with yet another graph mining or similarity technique. Instead, we explore different representations that can be used for encoding workflows before assessing their similarity taking into consideration the effectiveness and efficiency of the mining algorithm.
Notes
References
Roure, D.D., Goble, C.A., Stevens, R.: The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generation Comp. Syst. 25(5), 561–567 (2009)
Mates, P., Santos, E., Freire, J., Silva, C.T.: CrowdLabs: social analysis and visualization for the sciences. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 555–564. Springer, Heidelberg (2011)
Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Miller, W., Kent, W.J., Nekrutenko, A.: Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451–1455 (2005)
Bae, J., Caverlee, J., Liu, L., Yan, H.: Process mining by measuring process block similarity. In: Eder, J., Dustdar, S. (eds.) BPM Workshops 2006. LNCS, vol. 4103, pp. 141–152. Springer, Heidelberg (2006)
Goderis, A., Li, P., Goble, C.: Workflow discovery: the problem, a case study from e-science and a graph-based solution. In: International Conference on Web Services, ICWS 2006, pp. 313–19. IEEE, Chicago (2006)
Bergmann, R., Gil, Y.: Similarity assessment and efficient retrieval of semantic workflows. Inf. Syst. 40, 115–127 (2014)
Burstein, M., Yaman, F., Oates, T.: A context driven approach for workflow mining. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 1798–1803. Morgan Kaufmann Publishers Inc., Pasadena (2009)
Leake, D.B., Kendall-Morwick, J.: Towards case-based support for e-science workflow generation by mining provenance. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 269–283. Springer, Heidelberg (2008)
Diamantini, C., Potena, D., Storti, E.: Mining usage patterns from a repository of scientific workflows. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 629–635. IEEE, Haikou (2007)
Diamantini, C., Potena, D., Storti, E.: Mining usage patterns from a repository of scientific workflows. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, pp. 152–157. ACM, New York (2012)
Starlinger, J., Brancotte, B., Cohen-Boulakia, S., Leser, U.: Similarity search for scientific workflows. In: 40th International Conference on Very Large Data Bases, pp. 2150–8097. VLDB Endowment, Hangzhou (2014)
Cuzzocrea, A., Diamantini, C., Genga, L., Potena, D., Storti, E.: A composite methodology for supporting collaboration pattern discovery via semantic enrichment and multidimensional analysis. In: 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 459–464. IEEE, Tunis (2014)
Garijo, D., Corcho, Ó., Gil, Y.: Detecting common scientific workflow fragments using templates and execution provenance. In: Proceedings of the Seventh International Conference on Knowledge Capture, pp. 33–40. ACM, New York (2013)
Diamantini, C., Genga, L., Potena, D., Storti, E.: Innovation pattern analysis. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 628–629. IEEE, San Diego (2013)
Garijo, D., Corcho, Ó., Gil, Y., Gutman, B.A., Dinov, I.D., Thompson, P.M., Toga, A.W.: Fragflow automated fragment detection in scientific workflows. In: 10th IEEE International Conference on e-Science, pp. 281–289. IEEE, Sao Paulo (2014)
Diamantini, C., Gengaand, L., Potena, D., Storti, E.: Discovering behavioural patterns in knowledge-intensive collaborative processes. In: Proceedings of the ECML/PKDD 2014 Workshop on New Frontiers in Mining Complex Patterns (NFmcp 2014) (2014)
Jonyer, I., Cook, D.J., Holder, L.B.: Graph-based hierarchical conceptual clustering. J. Mach. Learn. Res. 2, 19–43 (2001)
Spell, B.: Java api for wordnet searching (jaws). http://lyle.smu.edu/~tspell/jaws/index.html
P. University: Princeton university “about wordnet” (2010). http://wordnet.princeton.edu/wordnet/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Harmassi, M., Grigori, D., Belhajjame, K. (2015). Mining Workflow Repositories for Improving Fragments Reuse. In: Cardoso, J., Guerra, F., Houben, GJ., Pinto, A.M., Velegrakis, Y. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2015. Lecture Notes in Computer Science(), vol 9398. Springer, Cham. https://doi.org/10.1007/978-3-319-27932-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-27932-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27931-2
Online ISBN: 978-3-319-27932-9
eBook Packages: Computer ScienceComputer Science (R0)