A Multi-view Learning Approach to the Discovery of Deviant Process Instances

Cuzzocrea, Alfredo; Folino, Francesco; Guarascio, Massimo; Pontieri, Luigi

doi:10.1007/978-3-319-26148-5_9

A Multi-view Learning Approach to the Discovery of Deviant Process Instances

Alfredo Cuzzocrea^20,21,
Francesco Folino²⁰,
Massimo Guarascio²⁰ &
…
Luigi Pontieri²⁰

Conference paper
First Online: 28 October 2015

1401 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9415))

Abstract

Increasing attention has been paid of late to the problem of detecting and explaining “deviant” process instances, i.e. instances diverging from normal/desired outcomes (e.g., frauds, faults, SLA violations), based on log data. Current solutions allow to discriminate between deviant and normal instances, by combining the extraction of (sequence-based) behavioral patterns with standard classifier-induction methods. However, there is no general consensus on which kind of patterns are the most suitable for such a task, while mixing multiple pattern families together will produce a cumbersome redundant representation of log data that may well confuse the learner. We here propose an ensemble-learning approach to this deviance mining tasks, where multiple base learners are trained on different feature-based views of the given log (obtained each by using a distinguished family of patterns). The final model, induced through a stacking procedure, can implicitly reason on heterogeneous kinds of structural features, by leveraging the predictions of the base models. To make the discovered models more effective, the approach leverages resampling techniques and exploits non-structural process data. The approach was implemented and tested on real-life logs, where it reached compelling performances w.r.t. state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W.E., Weijters, A.J.M.M.T., van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 444–454. Springer, Heidelberg (2005)
Chapter Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Discovering signature patterns from event logs. In: IEEE Symp. on Computational Intelligence and Data Mining (CIDM 2013), pp. 111–118 (2013)
Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Abstractions in process mining: a taxonomy of patterns. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 159–175. Springer, Heidelberg (2009)
Chapter Google Scholar
Bose, R.P.J.C., van der Aalst, W.M.P.: Trace clustering based on conserved patterns: towards achieving better process models. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 170–181. Springer, Heidelberg (2010)
Chapter Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7), 1145–1159 (1997)
Article Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2) (1996)
Google Scholar
Cannataro, M., Cuzzocrea, A., Mastroianni, C., Ortale, R., Pugliese, A.: Modeling adaptive hypermedia with an object-oriented approach and XML. In: Proc. of Second International Workshop on Web Dynamics, pp. 35–44 (2002)
Google Scholar
Clarke, B.: Comparing bayes model averaging and stacking when model approximation error cannot be ignored. The Journal of Machine Learning Research 4, 683–712 (2003)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Cuzzocrea A.: Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: Proc. of 18th International Conference on Scientific and Statistical Database Management, pp. 301–310 (2006)
Google Scholar
Cuzzocrea, A., Russo, V., Saccà, D.: A robust sampling-based framework for privacy preserving OLAP. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 97–114. Springer, Heidelberg (2008)
Chapter Google Scholar
Dietterichl, T.: Ensemble Learning (2002)
Google Scholar
van Dongen, B.: Real-life event logs - hospital log (2011). http://dx.doi.org/10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
Folino, F., Guarascio, M., Pontieri, L.: Mining predictive process models out of low-level multidimensional logs. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE 2014. LNCS, vol. 8484, pp. 533–547. Springer, Heidelberg (2014)
Google Scholar
Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B.: Weka - a machine learning workbench for data mining. In: The Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer (2005)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. of 13th Int. Conf. on Machine Learning (ICML 1996), pp. 148–156 (1996)
Google Scholar
Kubat, M., Holte, R., Matwin, S.: Learning when negative examples abound. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol. 1224, pp. 146–153. Springer, Heidelberg (1997)
Google Scholar
Lakshmanan, G.T., Rozsnyai, S., Wang, F.: Investigating clinical care pathways correlated with outcomes. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 323–338. Springer, Heidelberg (2013)
Chapter Google Scholar
Lo, D., Cheng, H., Han, J., Khoo, S.C., Sun, C.: Classification of software behaviors for failure detection: a discriminative pattern mining approach. In: Proc. of 15th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2009), pp. 557–566 (2009)
Google Scholar
Nguyen, H., Dumas, M., La Rosa, M., Maggi, F.M., Suriadi, S.: Mining business process deviance: a quest for accuracy. In: Meersman, R., Panetto, H., Dillon, T., Missikoff, M., Liu, L., Pastor, O., Cuzzocrea, A., Sellis, T. (eds.) OTM 2014. LNCS, vol. 8841, pp. 436–445. Springer, Heidelberg (2014)
Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Sun, C., Du, J., Chen, N., Khoo, S.C., Yang, Y.: Mining explicit rules for software process evaluation. In: Proc. of Int. Conf. on Software and System Process (ICSSP 2013), pp. 118–125 (2013)
Google Scholar
Suriadi, S., Wynn, M.T., Ouyang, C., ter Hofstede, A.H.M., van Dijk, N.J.: Understanding process behaviours in a large insurance company in australia: a case study. In: Salinesi, C., Norrie, M.C., Pastor, Ó. (eds.) CAiSE 2013. LNCS, vol. 7908, pp. 449–464. Springer, Heidelberg (2013)
Chapter Google Scholar
Swinnen, J., Depaire, B., Jans, M.J., Vanhoof, K.: A process deviation analysis – a case study. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM Workshops 2011, Part I. LNBIP, vol. 99, pp. 87–98. Springer, Heidelberg (2012)
Chapter Google Scholar
Webb, G.I., Boughton, J.R., Wang, Z.: Not so naïve Bayes: aggregating one-dependence estimators. Machine learning 58(1), 5–24 (2005)
Article MATH Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc. (2005)
Google Scholar
Wolpert, D.H.: Original contribution: Stacked generalization. Neural Networks 5(2), 241–259 (1992)
Article MathSciNet Google Scholar
Zhang, G.P.: Neural networks for classification: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 30(4), 451–462 (2000)
Article Google Scholar
Zhou, Z.-H., Chen, K.-J., Jiang, Y.: Exploiting unlabeled data in content-based image retrieval. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 525–536. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

ICAR Institute, National Research Council, Rende, Italy
Alfredo Cuzzocrea, Francesco Folino, Massimo Guarascio & Luigi Pontieri
DIA Department, University of Trieste, Trieste, Italy
Alfredo Cuzzocrea

Authors

Alfredo Cuzzocrea
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Folino
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Guarascio
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Pontieri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alfredo Cuzzocrea .

Editor information

Editors and Affiliations

Trinity College Dublin, Dublin 2, Iran
Christophe Debruyne
University of Lorraine, Vandoeuvre-les-Nancy Cedex, France
Hervé Panetto
TU Graz, Graz, Austria
Robert Meersman
La Trobe University, Melbourne, Australia
Tharam Dillon
PROFACTOR GmbH, Steyr-Gleink, Austria
Georg Weichhart
Drexel University, Philadelphia, Pennsylvania, USA
Yuan An
Università degli Studi di Milano, Crema, Italy
Claudio Agostino Ardagna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cuzzocrea, A., Folino, F., Guarascio, M., Pontieri, L. (2015). A Multi-view Learning Approach to the Discovery of Deviant Process Instances. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2015 Conferences. OTM 2015. Lecture Notes in Computer Science(), vol 9415. Springer, Cham. https://doi.org/10.1007/978-3-319-26148-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-26148-5_9
Published: 28 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26147-8
Online ISBN: 978-3-319-26148-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics