Mining Workflow Repositories for Improving Fragments Reuse

  • Mariem Harmassi
  • Daniela Grigori
  • Khalid Belhajjame
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9398)


Public repositories of scientific and business workflows are gaining growing attention as a means to enable understanding, reuse and ultimately the reproducibility of the processes such workflows incarnate. However, as the number of workflows hosted by such repositories grows, their users face difficulties when it come to exploring and querying workflows. In this paper, we explore a functionality that can help repository administrators to index their workflows, and users to identify the workflows that are of interest to them. In particular, we investigate the problem of finding frequent and similar fragments in workflows using graph mining techniques. Our objective is not to come up with yet another graph mining or similarity technique. Instead, we explore different representations that can be used for encoding workflows before assessing their similarity taking into consideration the effectiveness and efficiency of the mining algorithm.


  1. 1.
    Roure, D.D., Goble, C.A., Stevens, R.: The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generation Comp. Syst. 25(5), 561–567 (2009)CrossRefGoogle Scholar
  2. 2.
    Mates, P., Santos, E., Freire, J., Silva, C.T.: CrowdLabs: social analysis and visualization for the sciences. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 555–564. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  3. 3.
    Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Miller, W., Kent, W.J., Nekrutenko, A.: Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451–1455 (2005)CrossRefGoogle Scholar
  4. 4.
    Bae, J., Caverlee, J., Liu, L., Yan, H.: Process mining by measuring process block similarity. In: Eder, J., Dustdar, S. (eds.) BPM Workshops 2006. LNCS, vol. 4103, pp. 141–152. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  5. 5.
    Goderis, A., Li, P., Goble, C.: Workflow discovery: the problem, a case study from e-science and a graph-based solution. In: International Conference on Web Services, ICWS 2006, pp. 313–19. IEEE, Chicago (2006)Google Scholar
  6. 6.
    Bergmann, R., Gil, Y.: Similarity assessment and efficient retrieval of semantic workflows. Inf. Syst. 40, 115–127 (2014)CrossRefGoogle Scholar
  7. 7.
    Burstein, M., Yaman, F., Oates, T.: A context driven approach for workflow mining. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 1798–1803. Morgan Kaufmann Publishers Inc., Pasadena (2009)Google Scholar
  8. 8.
    Leake, D.B., Kendall-Morwick, J.: Towards case-based support for e-science workflow generation by mining provenance. In: Althoff, K.-D., Bergmann, R., Minor, M., Hanft, A. (eds.) ECCBR 2008. LNCS (LNAI), vol. 5239, pp. 269–283. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  9. 9.
    Diamantini, C., Potena, D., Storti, E.: Mining usage patterns from a repository of scientific workflows. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 629–635. IEEE, Haikou (2007)Google Scholar
  10. 10.
    Diamantini, C., Potena, D., Storti, E.: Mining usage patterns from a repository of scientific workflows. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, pp. 152–157. ACM, New York (2012)Google Scholar
  11. 11.
    Starlinger, J., Brancotte, B., Cohen-Boulakia, S., Leser, U.: Similarity search for scientific workflows. In: 40th International Conference on Very Large Data Bases, pp. 2150–8097. VLDB Endowment, Hangzhou (2014)Google Scholar
  12. 12.
    Cuzzocrea, A., Diamantini, C., Genga, L., Potena, D., Storti, E.: A composite methodology for supporting collaboration pattern discovery via semantic enrichment and multidimensional analysis. In: 2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), pp. 459–464. IEEE, Tunis (2014)Google Scholar
  13. 13.
    Garijo, D., Corcho, Ó., Gil, Y.: Detecting common scientific workflow fragments using templates and execution provenance. In: Proceedings of the Seventh International Conference on Knowledge Capture, pp. 33–40. ACM, New York (2013)Google Scholar
  14. 14.
    Diamantini, C., Genga, L., Potena, D., Storti, E.: Innovation pattern analysis. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 628–629. IEEE, San Diego (2013)Google Scholar
  15. 15.
    Garijo, D., Corcho, Ó., Gil, Y., Gutman, B.A., Dinov, I.D., Thompson, P.M., Toga, A.W.: Fragflow automated fragment detection in scientific workflows. In: 10th IEEE International Conference on e-Science, pp. 281–289. IEEE, Sao Paulo (2014)Google Scholar
  16. 16.
    Diamantini, C., Gengaand, L., Potena, D., Storti, E.: Discovering behavioural patterns in knowledge-intensive collaborative processes. In: Proceedings of the ECML/PKDD 2014 Workshop on New Frontiers in Mining Complex Patterns (NFmcp 2014) (2014)Google Scholar
  17. 17.
    Jonyer, I., Cook, D.J., Holder, L.B.: Graph-based hierarchical conceptual clustering. J. Mach. Learn. Res. 2, 19–43 (2001)zbMATHGoogle Scholar
  18. 18.
    Spell, B.: Java api for wordnet searching (jaws).
  19. 19.
    P. University: Princeton university “about wordnet” (2010).

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Mariem Harmassi
    • 1
  • Daniela Grigori
    • 1
  • Khalid Belhajjame
    • 1
  1. 1.LAMSADEParis Dauphine UniversityParisFrance

Personalised recommendations