Towards Efficient Business Process Clustering and Retrieval: Combining Language Modeling and Structure Matching

  • Mu Qiao
  • Rama Akkiraju
  • Aubrey J. Rembert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6896)


Large organizations tend to have hundreds of business processes. Discovering and understanding similarities among business processes can be useful to organizations for a number of reasons including better overall process management and maintenance. In this paper we present a novel and efficient approach to cluster and retrieve business processes. A given set of business processes are clustered based on their underlying topic, structure and semantic similarities. In addition, given a query business process, top k most similar processes are retrieved based on clustering results. In this work, we bring together two not well-connected schools of work: statistical language modeling and structure matching and combine them in a novel way. Our approach takes into account both high-level topic information that can be collected from process description documents and keywords as well as detailed structural features such as process control flows in finding similarities among business processes. This ability to work with processes that may not always have formal control flows is particularly useful in dealing with real-world business processes which are not always described formally. We developed a system to implement our approach and evaluated it on several collections of industry best practice processes and real-world business processes at a large IT service company that are described at varied levels of formalisms. Our experimental results reveal that the combined language modeling and structure matching based retrieval outperforms structure-matching-only techniques in both mean average precision and running time measures.


Business Process Query Process Latent Dirichlet Allocation Retrieval Method Mean Average Precision 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akkiraju, R., Ivan, A.: Discovering Business Process Similarities: An Empirical Study with SAP Best Practice Business Processes. In: Maglio, P.P., Weske, M., Yang, J., Fantinato, M. (eds.) ICSOC 2010. LNCS, vol. 6470, pp. 515–526. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Croft, W.B.: A model of cluster searching based on classification. Information Systems 5, 189–195 (1980)CrossRefGoogle Scholar
  4. 4.
    Dijkman, R., Dumas, M., García-Bañuelos, L.: Graph Matching Algorithms for Business Process Model Similarity Search. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 48–63. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Dong, X., Halevy, A.Y., Madhavan, J., Nemes, E., Zhang, J.: Simlarity Search for Web Services. In: Proc. of VLDB 2004, pp. 372–383 (2004)Google Scholar
  6. 6.
    van Dongen, B.F., Dijkman, R., Mendling, J.: Measuring similarity between business process models. In: Bellahsène, Z., Léonard, M. (eds.) CAiSE 2008. LNCS, vol. 5074, pp. 450–464. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  7. 7.
    Dumas, M., García-Bañuelos, L., Dijkman, R.M.: Similarity Search of Business Process Models. Bulletin of the Technical Committee on Data Engineering 32(3), 23–28 (2009)Google Scholar
  8. 8.
    Ehrig, M., Koschmider, A., Oberweis, A.: Measuring similarity between semantic business process models. In: Proc. of APCCM 2007, pp. 71–80 (2007)Google Scholar
  9. 9.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)CrossRefGoogle Scholar
  10. 10.
    Grigori, D., Corrales, J.C., Bouzeghoub, M., Gater, A.: Ranking BPEL Processes for Service Discovery. IEEE T. Services Computing 3(3), 178–192 (2010)CrossRefGoogle Scholar
  11. 11.
    Jung, J., Bae, J., Liu, L.: Hierarchical clustering of business process models. International Journal of Innovative Computing, Information and Control 5(12), 1349–4198 (2009)Google Scholar
  12. 12.
    Keller, G., Teufel, T.: SAP(R) R/3 Process Oriented Implementation: Iterative Process Prototyping. Addison-Wesley, Reading (1998)Google Scholar
  13. 13.
    Li, J.: Two-scale image retrieval with significant meta-information feedback. In: Proc. of ACM Multimedia Conference 2005, pp. 499–502 (2005)Google Scholar
  14. 14.
    Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. of ACM SIGIR Conference 2004, pp. 186–193 (2004)Google Scholar
  15. 15.
    Madhusudan, T., Zhao, J.L., Marshall, B.: A case-based reasoning framework for workflow model management. Data Knowl. Eng. 50(1), 87–115 (2004)CrossRefGoogle Scholar
  16. 16.
    Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: Proc. of ACM SIGIR Conference 1998, pp. 275–281 (1998)Google Scholar
  17. 17.
    Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefzbMATHGoogle Scholar
  18. 18.
    Turpin, A., Scholer, F.: User Performance versus Precision Measures for Simple Search Tasks. In: Proc. of ACM SIGIR Conference 2006, pp. 11–18 (2006)Google Scholar
  19. 19.
    Voorhees, E.M.: The cluster hypothesis revisited. In: Proc. of ACM SIGIR Conference 1985, pp. 188–196 (1985)Google Scholar
  20. 20.
    Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proc. of ACM SIGIR Conference 2006, pp. 178–185 (2006)Google Scholar
  21. 21.
    Yan, Z., Dijkman, R., Grefen, P.: Fast Business Process Similarity Search with Feature-Based Similarity Estimation. In: Meersman, R., Dillon, T.S., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6426, pp. 60–77. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  22. 22.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc. of ACM SIGIR Conference 2001, pp. 334-342 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mu Qiao
    • 1
  • Rama Akkiraju
    • 2
  • Aubrey J. Rembert
    • 2
  1. 1.Department of Computer Science and EngineeringThe Pennsylvania State UniversityUSA
  2. 2.IBM T.J. Watson Research CenterHawthorneUSA

Personalised recommendations