Advertisement

Visualizing and exploring event databases: a methodology to benefit from process analytics

  • Pavlos DeliasEmail author
  • Vassilios Zoumpoulidis
  • Ioannis Kazanidis
Original paper

Abstract

Events, routinely broadcasted by news media all over the world, are captured and get recorded to event databases in standardized formats. This wealth of information can be aggregated and get visualized with several ways, to result in alluring illustrations. However, existing aggregation techniques tend to consider that events are fragmentary, or that they are part of a strictly sequential chain. Nevertheless, events’ occurrences may appear with varying structures (i.e., others than sequence), reflecting elements of a larger, implicit process. In this work, we propose a methodology that will support analysts to get richer insights from event datasets by enabling a process perspective. Through a case study about a political phenomenon, we provide concrete recommendations on data reviewing, process discovery, and visually facilitated interpretations. We furthermore discuss the methodological and epistemological aspects that are needed to make our approach applicable for event analytics.

Keywords

Event data Process mining Process analytics 

Notes

Acknowledgements

We would like to thank our graduate students Zafeiris Papavaritis and Christianna Pantermali who spent many hours in checking every event of the original dataset for relevance, and who manually filtered them out.

References

  1. Aalen O, Borgan O, Gjessing H (2008) Survival and event history analysis: a process point of view. Springer, BerlinCrossRefGoogle Scholar
  2. Adriansyah A, Buijs JCAM (2012) Mining process performance from event logs: the BPI challenge 2012. Case Study BPM Center Report BPM-12-15. BPMcenter.orgGoogle Scholar
  3. Best RH, Carpino C, Crescenzi MJ (2013) An analysis of the TABARI coding system. Confl Manag Peace Sci 30(4):335–348CrossRefGoogle Scholar
  4. Bose RJC, van der Aalst WM (2009) Context aware trace clustering: towards improving process mining results. In: SDM, SIAM, pp 401–412Google Scholar
  5. Bose RJC, van der Aalst WM (2012) Process diagnostics using trace alignment: opportunities, issues, and challenges. Information Systems 37(2):117–141 (Management and engineering of process-aware information systems)CrossRefGoogle Scholar
  6. Broström G (2012) Event history analysis with R. CRC Press, Boca RatonGoogle Scholar
  7. Celonis (2017) Academic cloud. https://academiccloud.celonis.com. Accessed 25 Sept 2017
  8. Ching WK, Huang X, Ng MK, Siu TK (2013) Higher-order markov chains. Springer, Boston, pp 141–176Google Scholar
  9. De Leoni M, van der Aalst WM, Dees M (2014) A general framework for correlating business process characteristics. In: International conference on business process management, Springer, pp 250–266Google Scholar
  10. Delias P, Kazanidis I (2017) Process analytics through event databases: potentials for visualizations and process mining. In: Linden I, Liu S, Colot C (eds) Decision support systems VII. Data, information and knowledge visualization in decision support systems, vol 282, Springer International Publishing, Cham, pp 88–100.  https://doi.org/10.1007/978-3-319-57487-5_7
  11. Delias P, Doumpos M, Matsatsinis N (2015a) Business process analytics: a dedicated methodology through a case study. EURO J Decis Process 3(3–4):357–374.  https://doi.org/10.1007/s40070-015-0050-4 CrossRefGoogle Scholar
  12. Delias P, Grigori D, Mouhoub ML, Tsoukias A (2015b) Discovering characteristics that affect process control flow. In: Decision support systems IV—information and knowledge management in decision processes, Springer, pp 51–63Google Scholar
  13. Fails JA, Karlson A, Shahamat L, Shneiderman B (2006) A visual interface for multivariate temporal data: finding patterns of events across multiple histories. In: 2006 IEEE symposium on visual analytics science and technology, IEEE, pp 167–174Google Scholar
  14. Galili T (2015) dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics 31:3718–3720CrossRefGoogle Scholar
  15. Gerner DJ, Schrodt PA, Francisco RA, Weddle JL (1994) Machine coding of event data using regional and international sources. Int Stud Q 38(1):91–119CrossRefGoogle Scholar
  16. Gerner DJ, Schrodt PA, Yilmaz O, Abu-Jabr R (2002) Conflict and mediation event observations (cameo): a new event data framework for the analysis of foreign policy interactions. International Studies Association, New OrleansGoogle Scholar
  17. Glaser BG (1978) Theoretical sensitivity: advances in the methodology of grounded theory. Sociology Press, Mill Valley (oCLC: 926199357)Google Scholar
  18. Gotz D, Stavropoulos H (2014) DecisionFlow: visual analytics for high-dimensional temporal event sequence data. IEEE Trans Vis Comput Graph 20(12):1783–1792CrossRefGoogle Scholar
  19. Gotz D, Wongsuphasawat K (2012) Interactive intervention analysis. In: AMIA annual symposium proceedings, American Medical Informatics Association, Washington, DC, USA 2012, pp 274–280Google Scholar
  20. Gotz D, Wang F, Perer A (2014) A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. J Biomed Inf 48:148–159CrossRefGoogle Scholar
  21. Günther CW, Rozinat A, van der Aalst WM (2009) Activity mining by global trace segmentation. In: International conference on business process management, Springer, pp 128–139Google Scholar
  22. Gupta A, Jain R (2011) Managing event information: modeling, retrieval, and applications. Synth Lect Data Manag 3(4):1–141CrossRefGoogle Scholar
  23. Jiang L, Mai F (2014) Discovering bilateral and multilateral causal events in GDELT. In: International conference on social computing, behavioral-cultural modeling, and predictionGoogle Scholar
  24. Keertipati S, Savarimuthu BTR, Purvis M, Purvis M (2014) Multi-level analysis of peace and conflict data in GDELT. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis, ACM, p 33Google Scholar
  25. Kwak H, An J (2016) Two tales of the world: Comparison of widely used world news datasets GDELT and EventRegistry. arXiv preprint arXiv:1603.01979
  26. Leetaru K, Schrodt PA (2013) GDELT: global data on events, location and tone, 1979–2012. resreport, International Studies Association, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Champaign, USA. http://data.gdeltproject.org/documentation/ISA.2013.GDELT.pdf. Accessed 25 Sept 2017
  27. Liu Z, Wang Y, Dontcheva M, Hoffman M, Walker S, Wilson A (2017) Patterns and sequences: interactive exploration of clickstreams to understand common visitor paths. IEEE Trans Vis Comput Graph 23(01):321–330CrossRefGoogle Scholar
  28. Maggi FM, Mooij AJ, van der Aalst WM (2011) User-guided discovery of declarative process models. In: 2011 IEEE symposium on computational intelligence and data mining (CIDM), IEEE, pp 192–199Google Scholar
  29. Mannhardt F, de Leoni M, Reijers HA, van der Aalst WM, Toussaint PJ (2016) From low-level events to activities-a pattern-based approach. In: International conference on business process management, Springer, pp 125–141Google Scholar
  30. Martjushev J, Bose RJC, van der Aalst WM (2015) Change point detection and dealing with gradual and multi-order dynamics in process mining. In: International conference on business informatics research, Springer, pp 161–178Google Scholar
  31. McClelland CA (1961) The acute international crisis. World Polit 14(01):182–204CrossRefGoogle Scholar
  32. McClelland CA (1976) World event/interaction survey codebook. ICPSR, Ann ArborGoogle Scholar
  33. Nguyen H, Dumas M, La Rosa M, Maggi FM, Suriadi S (2014) Mining business process deviance: a quest for accuracy. In: OTM confederated international conferences “On the move to meaningful internet systems”, Springer, pp 436–445Google Scholar
  34. Nguyen H, Dumas M, ter Hofstede AH, La Rosa M, Maggi FM (2016) Business process performance mining with staged process flows. In: International conference on advanced information systems engineering, Springer, pp 167–185Google Scholar
  35. O’Brien SP (2010) Crisis early warning and decision support: contemporary approaches and thoughts on future research. Int Stud Rev 12(1):87–104CrossRefGoogle Scholar
  36. Pesic M, Schonenberg H, van der Aalst WM (2007) Declare: full support for loosely-structured processes. In: Enterprise distributed object computing conference, 2007. EDOC 2007. 11th IEEE international, IEEE, pp 287–287Google Scholar
  37. Peuquet DJ, Robinson AC, Stehle S, Hardisty FA, Luo W (2015) A method for discovery and analysis of temporal patterns in complex event data. Int J Geogr Inf Sci 29(9):1588–1611CrossRefGoogle Scholar
  38. Phua C, Feng Y, Ji J, Soh T (2014) Visual and predictive analytics on singapore news: experiments on GDELT, wikipedia, and \(^{\wedge }\)sti. CoRR arXiv:1404.1996
  39. Roy B (1994) On operational research and decision aid. Eur J Oper Res 73(1):23–26CrossRefGoogle Scholar
  40. Scholz M (2016) R package clickstream: analyzing clickstream data with markov chains. J Stat Softw 74(4):1–17CrossRefGoogle Scholar
  41. Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11(2):33CrossRefGoogle Scholar
  42. Song M, Günther CW, van der Aalst WM (2008) Trace clustering in process mining. In: International conference on business process management, Springer, pp 109–120Google Scholar
  43. Studer M, Ritschard G (2015) What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures. J R Stat Soc Ser A 179(2):481–511CrossRefGoogle Scholar
  44. Tax N, Sidorova N, van der Aalst WM, Haakma R (2016a) Heuristic approaches for generating local process models through log projections. In: 2016 IEEE symposium series on computational intelligence (SSCI), IEEEGoogle Scholar
  45. Tax N, Sidorova N, Haakma R, van der Aalst WM (2016b) Mining local process models. J Innov Digit Ecosyst 3(2):183–196CrossRefGoogle Scholar
  46. Thaler T, Ternis SF, Fettke P, Loos P (2015) A comparative analysis of process instance cluster techniques. In: Wirtschaftsinformatik proceedings 2015, Osnabrück, pp 423–437Google Scholar
  47. van Beest NR, Dumas M, García-Bañuelos L, La Rosa M (2015) Log delta analysis: interpretable differencing of business process event logs. In: International Conference on Business Process Management, Springer, pp 386–405Google Scholar
  48. van Dongen B, Weber B, Ferreira D, De Weerdt J (2013) Proceedings of the 3rd business process intelligence challenge (co-located with 9th international business process intelligence workshop, BPI 2013, Beijing, China, August 26, 2013)Google Scholar
  49. van der Aalst WM (2016) Process mining: data science in action, 2nd edn. Springer, Berlin.  https://doi.org/10.1007/978-3-662-49851-4 CrossRefGoogle Scholar
  50. van der Aalst WM, Schonenberg MH, Song M (2011) Time prediction based on process mining. Inf Syst 36(2):450–475CrossRefGoogle Scholar
  51. van der Aalst WM, Adriansyah A, van Dongen B (2012) Replaying history on process models for conformance checking and performance analysis. Wiley Interdiscip Rev Data Min Knowl Discov 2(2):182–192CrossRefGoogle Scholar
  52. van der Aalst WM, Low WZ, Wynn MT, ter Hofstede AH (2015) Change your history: learning from event logs to improve processes. In: 2015 IEEE 19th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 7–12Google Scholar
  53. van der Heijden T (2012) Process mining project methodology: developing a general approach to apply process mining in practice. Master Thesis, Technische Universiteit Eindhoven, Eindhoven. http://alexandria.tue.nl/extra2/afstversl/tm/van_der_Heijden_2012.pdf. Accessed 25 Sept 2017
  54. Venkatachalam B, Apple J, St John K, Gusfield D (2010) Untangling tanglegrams: comparing trees by their drawings. IEEE/ACM Trans Comput Biol Bioinform 7(4):588–597CrossRefGoogle Scholar
  55. Vrotsou K, Johansson J, Cooper M (2009) Activitree: interactive visual exploration of sequences in event-based data using graph similarity. IEEE Trans Vis Comput Graph 15(6):945–952CrossRefGoogle Scholar
  56. Ward MD, Beger A, Cutler J, Dickenson M, Dorff C, Radford B (2013) Comparing GDELT and ICEWS event data. Analysis 21:267–297Google Scholar
  57. Wiesche M, Jurisch MC, Yetton PW, Krcmar H (2017) Grounded theory methodology in information systems research. MIS Q 41(3):685–701CrossRefGoogle Scholar
  58. Wongsuphasawat K, Gotz D (2012) Exploring flow, factors, and outcomes of temporal event sequences with the outflow visualization. IEEE Trans Vis Comput Graph 18(12):2659–2668CrossRefGoogle Scholar
  59. Wongsuphasawat K, Plaisant C, Taieb-Maimon M, Shneiderman B (2012) Querying event sequences by exact match or similarity search: design and empirical evaluation. Interact Comput 24(2):55–68CrossRefGoogle Scholar
  60. Xu J, Wickramarathne TL, Chawla NV (2016) Representing higher-order dependencies in networks. Sci Adv 2(5):e1600028CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Eastern Macedonia and Thrace Institute of TechnologyKavalaGreece

Personalised recommendations