Skip to main content

Mining Workflow Repositories for Improving Fragments Reuse

  • Conference paper
  • First Online:
Semantic Keyword-Based Search on Structured Data Sources (IKC 2015)

Abstract

Public repositories of scientific and business workflows are gaining growing attention as a means to enable understanding, reuse and ultimately the reproducibility of the processes such workflows incarnate. However, as the number of workflows hosted by such repositories grows, their users face difficulties when it come to exploring and querying workflows. In this paper, we explore a functionality that can help repository administrators to index their workflows, and users to identify the workflows that are of interest to them. In particular, we investigate the problem of finding frequent and similar fragments in workflows using graph mining techniques. Our objective is not to come up with yet another graph mining or similarity technique. Instead, we explore different representations that can be used for encoding workflows before assessing their similarity taking into consideration the effectiveness and efficiency of the mining algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://www.myexperiment.org/home.

References

  1. Roure, D.D., Goble, C.A., Stevens, R.: The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generation Comp. Syst. 25(5), 561–567 (2009)

    Article  Google Scholar 

  2. Mates, P., Santos, E., Freire, J., Silva, C.T.: CrowdLabs: social analysis and visualization for the sciences. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 555–564. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Miller, W., Kent, W.J., Nekrutenko, A.: Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451–1455 (2005)

    Article  Google Scholar 

  4. Bae, J., Caverlee, J., Liu, L., Yan, H.: Process mining by measuring process block similarity. In: Eder, J., Dustdar, S. (eds.) BPM Workshops 2006. LNCS, vol. 4103, pp. 141–152. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Goderis, A., Li, P., Goble, C.: Workflow discovery: the problem, a case study from e-science and a graph-based solution. In: International Conference on Web Services, ICWS 2006, pp. 313–19. IEEE, Chicago (2006)

    Google Scholar 

  6. Bergmann, R., Gil, Y.: Similarity assessment and efficient retrieval of semantic workflows. Inf. Syst. 40, 115–127 (2014)

    Article  Google Scholar 

  7. Burstein, M., Yaman, F., Oates, T.: A context driven approach for workflow mining. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 1798–1803. Morgan Kaufmann Publishers Inc., Pasadena (2009)

    Google Scholar 

  8. Leake, D.B., Kendall-Morwick, J.: Towards case-based support for e-science workflow generation by mining provenance. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 269–283. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Diamantini, C., Potena, D., Storti, E.: Mining usage patterns from a repository of scientific workflows. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 629–635. IEEE, Haikou (2007)

    Google Scholar 

  10. Diamantini, C., Potena, D., Storti, E.: Mining usage patterns from a repository of scientific workflows. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, pp. 152–157. ACM, New York (2012)

    Google Scholar 

  11. Starlinger, J., Brancotte, B., Cohen-Boulakia, S., Leser, U.: Similarity search for scientific workflows. In: 40th International Conference on Very Large Data Bases, pp. 2150–8097. VLDB Endowment, Hangzhou (2014)

    Article  Google Scholar 

  12. Cuzzocrea, A., Diamantini, C., Genga, L., Potena, D., Storti, E.: A composite methodology for supporting collaboration pattern discovery via semantic enrichment and multidimensional analysis. In: 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 459–464. IEEE, Tunis (2014)

    Google Scholar 

  13. Garijo, D., Corcho, Ó., Gil, Y.: Detecting common scientific workflow fragments using templates and execution provenance. In: Proceedings of the Seventh International Conference on Knowledge Capture, pp. 33–40. ACM, New York (2013)

    Google Scholar 

  14. Diamantini, C., Genga, L., Potena, D., Storti, E.: Innovation pattern analysis. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 628–629. IEEE, San Diego (2013)

    Google Scholar 

  15. Garijo, D., Corcho, Ó., Gil, Y., Gutman, B.A., Dinov, I.D., Thompson, P.M., Toga, A.W.: Fragflow automated fragment detection in scientific workflows. In: 10th IEEE International Conference on e-Science, pp. 281–289. IEEE, Sao Paulo (2014)

    Google Scholar 

  16. Diamantini, C., Gengaand, L., Potena, D., Storti, E.: Discovering behavioural patterns in knowledge-intensive collaborative processes. In: Proceedings of the ECML/PKDD 2014 Workshop on New Frontiers in Mining Complex Patterns (NFmcp 2014) (2014)

    Google Scholar 

  17. Jonyer, I., Cook, D.J., Holder, L.B.: Graph-based hierarchical conceptual clustering. J. Mach. Learn. Res. 2, 19–43 (2001)

    MATH  Google Scholar 

  18. Spell, B.: Java api for wordnet searching (jaws). http://lyle.smu.edu/~tspell/jaws/index.html

  19. P. University: Princeton university “about wordnet” (2010). http://wordnet.princeton.edu/wordnet/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khalid Belhajjame .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Harmassi, M., Grigori, D., Belhajjame, K. (2015). Mining Workflow Repositories for Improving Fragments Reuse. In: Cardoso, J., Guerra, F., Houben, GJ., Pinto, A.M., Velegrakis, Y. (eds) Semantic Keyword-Based Search on Structured Data Sources. IKC 2015. Lecture Notes in Computer Science(), vol 9398. Springer, Cham. https://doi.org/10.1007/978-3-319-27932-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27932-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27931-2

  • Online ISBN: 978-3-319-27932-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics