Towards Efficient Business Process Clustering and Retrieval: Combining Language Modeling and Structure Matching
Large organizations tend to have hundreds of business processes. Discovering and understanding similarities among business processes can be useful to organizations for a number of reasons including better overall process management and maintenance. In this paper we present a novel and efficient approach to cluster and retrieve business processes. A given set of business processes are clustered based on their underlying topic, structure and semantic similarities. In addition, given a query business process, top k most similar processes are retrieved based on clustering results. In this work, we bring together two not well-connected schools of work: statistical language modeling and structure matching and combine them in a novel way. Our approach takes into account both high-level topic information that can be collected from process description documents and keywords as well as detailed structural features such as process control flows in finding similarities among business processes. This ability to work with processes that may not always have formal control flows is particularly useful in dealing with real-world business processes which are not always described formally. We developed a system to implement our approach and evaluated it on several collections of industry best practice processes and real-world business processes at a large IT service company that are described at varied levels of formalisms. Our experimental results reveal that the combined language modeling and structure matching based retrieval outperforms structure-matching-only techniques in both mean average precision and running time measures.
KeywordsBusiness Process Query Process Latent Dirichlet Allocation Retrieval Method Mean Average Precision
Unable to display preview. Download preview PDF.
- 5.Dong, X., Halevy, A.Y., Madhavan, J., Nemes, E., Zhang, J.: Simlarity Search for Web Services. In: Proc. of VLDB 2004, pp. 372–383 (2004)Google Scholar
- 7.Dumas, M., García-Bañuelos, L., Dijkman, R.M.: Similarity Search of Business Process Models. Bulletin of the Technical Committee on Data Engineering 32(3), 23–28 (2009)Google Scholar
- 8.Ehrig, M., Koschmider, A., Oberweis, A.: Measuring similarity between semantic business process models. In: Proc. of APCCM 2007, pp. 71–80 (2007)Google Scholar
- 11.Jung, J., Bae, J., Liu, L.: Hierarchical clustering of business process models. International Journal of Innovative Computing, Information and Control 5(12), 1349–4198 (2009)Google Scholar
- 12.Keller, G., Teufel, T.: SAP(R) R/3 Process Oriented Implementation: Iterative Process Prototyping. Addison-Wesley, Reading (1998)Google Scholar
- 13.Li, J.: Two-scale image retrieval with significant meta-information feedback. In: Proc. of ACM Multimedia Conference 2005, pp. 499–502 (2005)Google Scholar
- 14.Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: Proc. of ACM SIGIR Conference 2004, pp. 186–193 (2004)Google Scholar
- 16.Ponte, J., Croft, W.B.: A language modeling approach to information retrieval. In: Proc. of ACM SIGIR Conference 1998, pp. 275–281 (1998)Google Scholar
- 18.Turpin, A., Scholer, F.: User Performance versus Precision Measures for Simple Search Tasks. In: Proc. of ACM SIGIR Conference 2006, pp. 11–18 (2006)Google Scholar
- 19.Voorhees, E.M.: The cluster hypothesis revisited. In: Proc. of ACM SIGIR Conference 1985, pp. 188–196 (1985)Google Scholar
- 20.Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: Proc. of ACM SIGIR Conference 2006, pp. 178–185 (2006)Google Scholar
- 22.Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proc. of ACM SIGIR Conference 2001, pp. 334-342 (2001)Google Scholar