Reducing Event Variability in Logs by Clustering of Word Embeddings

Sánchez-Charles, David; Carmona, Josep; Muntés-Mulero, Victor; Solé, Marc

doi:10.1007/978-3-319-74030-0_14

David Sánchez-Charles⁸,
Josep Carmona⁹,
Victor Muntés-Mulero⁸ &
…
Marc Solé⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 308))

Included in the following conference series:

International Conference on Business Process Management

3399 Accesses
5 Citations

Abstract

Several business-to-business and business-to-consumer services are provided as a human-to-human conversation in which the provider representative guides the conversation towards its resolution based on her experience, following internal guidelines. Several attempts to automatize these services are becoming popular, but they are currently limited to procedures and objectives set during design step. Process discovery techniques could provide the necessary mechanisms to monitor event logs derived from textual conversations and expand the capabilities of conversational bots. Still, variability of textual messages hinders the utility of process discovery techniques by producing non-understandable unstructured process models. In this paper, we propose the usage of word embedding for combining events that have a semantically similar name.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For instance, a faster customer support channel leads to lower customer churn rates. https://www.salesforce.com/blog/2017/03/effective-strategies-to-reduce-customer-churn.html.
2.
For the sake of simplicity, the definitions and examples of the paper are tailored to the context of conversations between humans and, possibly, computers. In spite of this, the theory of the paper can be applied to general event logs as defined in [18].
3.
We follow the classical definition \(idf(w) = \log \frac{\text {Number of documents}}{\text {Occurrences of } w}\).
4.
During the evaluation of this approach, we set c to 1.2 and b to 0.75 as proposed by [11].
5.
i.e. a finite collection of sets \(\{ E_i \}_{i \in I}\) such that \(\cup _{i \in I} E_i = E\) and \(E_i \cap E_j = \emptyset \) for any \(i \not = j\).
6.
https://en.wikipedia.org/wiki/Wikipedia:List_of_guidelines.
7.
8th August 2016. The dataset is publicly available on data.4tu.nl [16].
8.
https://en.wikipedia.org/wiki/Wikipedia:Featured_articles.
9.
The flower model is a model that allows any possible behavior.
10.
We run the infrequent version of the Inductive Miner, with default parameters, on ProM 6.5.1.
11.
Results are consistent with respect to a \(20\%\)-out cross-validation.
12.
ISO 224617-2 defines 57 generic communicative functions, that one may enrich or refine depending with domain knowledge.

References

Adriansyah, A., Munoz-Gama, J., Carmona, J., Dongen, B.F., Aalst, W.M.: Measuring precision of modeled behavior. Inf. Syst. E-bus. Manag. 13(1), 37–67 (2015)
Article Google Scholar
Adriansyah, A., van Dongen, B.F., van der Aalst, W.M.P.: Conformance checking using cost-based fitness analysis. In: Proceedings of the 2011 IEEE 15th International Enterprise Distributed Object Computing Conference, EDOC 2011, Washington, DC, USA, pp. 55–64. IEEE Computer Society (2011)
Google Scholar
Baier, T., Mendling, J., Weske, M.: Bridging abstraction layers in process mining. Inf. Syst. 46, 123–139 (2014)
Article Google Scholar
Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of Association for Computational Linguistics (ACL), vol. 1 (2014)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
MATH Google Scholar
Jagadeesh Chandra Bose, R.P., van der Aalst, W.M.P.: Abstractions in process mining: a taxonomy of patterns. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, vol. 5701, pp. 159–175. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03848-8_12
Chapter Google Scholar
Da Silva, G.A., Ferreira, D.R.: Applying hidden Markov models to process mining. Sistemas e Tecnologias de Informação. AISTI/FEUP/UPF (2009)
Google Scholar
Günther, C.W., Rozinat, A., van der Aalst, W.M.P.: Activity mining by global trace segmentation. In: Rinderle-Ma, S., Sadiq, S., Leymann, F. (eds.) BPM 2009. LNBIP, vol. 43, pp. 128–139. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12186-9_13
Chapter Google Scholar
Günther, C.W., van der Aalst W.M.P.: Mining activity clusters from low-level event logs. Beta, Research School for Operations Management and Logistics (2006)
Google Scholar
He, Z., Liu, X., Lv, P., Wu, J.: Hidden softmax sequence model for dialogue structure analysis. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (2016)
Google Scholar
Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1411–1420. ACM, New York (2015)
Google Scholar
Klinkmüller, C., Weber, I., Mendling, J., Leopold, H., Ludwig, A.: Increasing recall of process model matching by improved activity label matching. In: Daniel, F., Wang, J., Weber, B. (eds.) BPM 2013. LNCS, vol. 8094, pp. 211–218. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40176-3_17
Chapter Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, 21–26 June 2014, Beijing, China, pp. 1188–1196 (2014)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546 (2013)
Google Scholar
Morelli, R.A., Bronzino, J.D., Goethe, J.W.: A computational speech-act model of human-computer conversations. In: Proceedings of the 1991 IEEE Seventeenth Annual Northeast Bioengineering Conference, pp. 263–264. IEEE (1991)
Google Scholar
Sanchez-Charles, D.: Title and subtitles of wikipedia articles (2017). https://doi.org/10.4121/uuid:61fb9665-40ab-4b70-8214-767c521cc950
Tax, N., Sidorova, N., Haakma, R., van der Aalst, W.M.P.: Event abstraction for process mining using supervised learning techniques. CoRR, abs/1606.07283 (2016)
Google Scholar
van der Aalst, W.M.P.: Process Mining - Discovery Conformance and Enhancement of Business Processes. Springer, Berlin (2011)
MATH Google Scholar
van der Aalst, W.M.P., Günther, C.W.: Finding structure in unstructured processes: the case for process mining. In: ACSD, pp. 3–12. IEEE Computer Society (2007)
Google Scholar

Download references

Acknowledgements

This work is funded by Secretaria de Universitats i Recerca of Generalitat de Catalunya, under the Industrial Doctorate Program 2013DI062, and the Spanish Ministry for Economy and Competitiveness, the European Union (FEDER funds) under grant COMMAS (Ref. TIN2013-46181-C2-1-R).

Author information

Authors and Affiliations

CA Strategic Research, CA Technologies, Barcelona, Spain
David Sánchez-Charles, Victor Muntés-Mulero & Marc Solé
Universitat Politècnica de Catalunya, Barcelona, Spain
Josep Carmona

Authors

David Sánchez-Charles
View author publications
You can also search for this author in PubMed Google Scholar
Josep Carmona
View author publications
You can also search for this author in PubMed Google Scholar
Victor Muntés-Mulero
View author publications
You can also search for this author in PubMed Google Scholar
Marc Solé
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Sánchez-Charles .

Editor information

Editors and Affiliations

Department of Service and Information System Engineering, Universitat Politècnica de Catalunya, Barcelona, Spain
Ernest Teniente
Humboldt-Universität zu Berlin, Berlin, Berlin, Germany
Matthias Weidlich

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez-Charles, D., Carmona, J., Muntés-Mulero, V., Solé, M. (2018). Reducing Event Variability in Logs by Clustering of Word Embeddings. In: Teniente, E., Weidlich, M. (eds) Business Process Management Workshops. BPM 2017. Lecture Notes in Business Information Processing, vol 308. Springer, Cham. https://doi.org/10.1007/978-3-319-74030-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-74030-0_14
Published: 17 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74029-4
Online ISBN: 978-3-319-74030-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics