Clone Detection in Repositories of Business Process Models

  • Reina Uba
  • Marlon Dumas
  • Luciano García-Bañuelos
  • Marcello La Rosa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6896)


Over time, process model repositories tend to accumulate duplicate fragments (also called clones) as new process models are created or extended by copying and merging fragments from other models. This phenomenon calls for methods to detect clones in process models, so that these clones can be refactored as separate subprocesses in order to improve maintainability. This paper presents an indexing structure to support the fast detection of clones in large process model repositories. The proposed index is based on a novel combination of a method for process model decomposition (specifically the Refined Process Structure Tree), with established graph canonization and string matching techniques. Experiments show that the algorithm scales to repositories with hundreds of models. The experimental results also show that a significant number of non-trivial clones can be found in process model repositories taken from industrial practice.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Babai, L.: Monte carlo algorithms in graph isomorphism testing. Technical Report D.M.S. No. 79-10, Universite de Montreal (1979)Google Scholar
  2. 2.
    Baxter, I.D., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone Detection Using Abstract Syntax Trees. In: The Int. Conf. on Software Maintenance, pp. 368–377 (1998)Google Scholar
  3. 3.
    Bellon, S., Koschke, R., Antoniol, G., Krinke, J., Merlo, E.: Comparison and Evaluation of Clone Detection Tools. IEEE Trans. on Software Engineering 33(9), 577–591 (2007)CrossRefGoogle Scholar
  4. 4.
    Deissenboeck, F., Hummel, B., Jürgens, E., Schätz, B., Wagner, S., Girard, J.-F., Teuchert, S.: Clone Detection in Automotive Model-based Development. In: ICSE (2008)Google Scholar
  5. 5.
    Fahland, D., Favre, C., Jobstmann, B., Koehler, J., Lohmann, N., Völzer, H., Wolf, K.: Instantaneous soundness checking of industrial business process models. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 278–293. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefzbMATHGoogle Scholar
  7. 7.
    He, H., Singh, A.K.: Closure-Tree: An Index Structure for Graph Queries. In: The 22nd Int. Conf. on Data Engineering. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  8. 8.
    Jin, T., Wang, J., Wu, N., La Rosa, M., ter Hofstede, A.H.M.: Efficient and Accurate Retrieval of Business Process Models through Indexing. In: Meersman, R., Dillon, T.S., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6426, pp. 402–409. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Keller, G., Teufel, T.: SAP R/3 Process Oriented Implementation: Iterative Process Prototyping. Addison-Wesley, Reading (1998)Google Scholar
  10. 10.
    Koschk, R.: Identifying and Removing Software Clones. In: Mens, T., Demeyer, S. (eds.) Software Evolution. Springer, Heidelberg (2008)Google Scholar
  11. 11.
    Krinke, J.: Identifying Similar Code with Program Dependence Graphs. In: WCRE (2001)Google Scholar
  12. 12.
    Pham, N.H., Nguyen, H.A., Nguyen, T.T., Al-Kofahi, J.M., Nguyen, T.N.: Complete and Accurate Clone Detection in Graph-based Models. In: ICSE, pp. 276–286. IEEE, Los Alamitos (2009)Google Scholar
  13. 13.
    Polyvyanyy, A., Vanhatalo, J., Völzer, H.: Simplified Computation and Generalization of the Refined Process Structure Tree. In: WSFM (2010)Google Scholar
  14. 14.
    Reijers, H.A., Mans, R.S., van der Toorn, R.A.: Improved Model Management with Aggregated Business Process Models. Data Knowl. Eng. 68(2), 221–243 (2009)CrossRefGoogle Scholar
  15. 15.
    Rosemann, M.: Potential pitfalls of process modeling: Part a. Business Process Management Journal 12(2), 249–254 (2006)CrossRefGoogle Scholar
  16. 16.
    Shasha, D., Wang, J.T.-L., Giugno, R.: Algorithmics and Applications of Tree and Graph Searching. In: PODS, pp. 39–52 (2002)Google Scholar
  17. 17.
    Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)CrossRefzbMATHGoogle Scholar
  18. 18.
    Vanhatalo, J., Völzer, H., Koehler, J.: The Refined Process Structure Tree. Data Knowl. Eng. 68(9), 793–818 (2009)CrossRefGoogle Scholar
  19. 19.
    Weber, B., Reichert, M., Mendling, J., Reijers, H.A.: Refactoring large process model repositories. Computers in Industry 62(5), 467–486 (2011)CrossRefGoogle Scholar
  20. 20.
    Williams, D.W., Huan, J., Wang, W.: Graph database indexing using structured graph decomposition. In: ICDE, pp. 976–985 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Reina Uba
    • 1
  • Marlon Dumas
    • 1
  • Luciano García-Bañuelos
    • 1
  • Marcello La Rosa
    • 2
    • 3
  1. 1.University of TartuEstonia
  2. 2.Queensland University of TechnologyAustralia
  3. 3.NICTA Queensland LabAustralia

Personalised recommendations