Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

SSRDVis: Interactive visualization for event sequences summarization and rare detection

Abstract

This paper presents SSRDVis, a visual approach to effectively summarize event sequences and interactively detect rare behaviors. SSRDVis is mainly composed of three components: (1) a sequence embedding module for learning effective feature vectors of sequences, (2) a sequence grouping and summarization module to find representative clusters and patterns in the dataset, (3) a rare detection module to discover and explain the rare cases. The sequences are embedded into vector space via “mixed-ngram2vec,” which is adapted from “word2vec.” Then, unsupervised learning models could be applied to group similar sequences and detect anomalies in the vector space. Furthermore, sequential pattern graphs are built to provide a compact and semantic summarization of sequences. These components work together to present both overall sequential patterns and abnormal behaviors in one visual interface. We have demonstrated the feasibility of our approach by applying it to analyze Web clickstreams. Experimental results have shown that our approach could help identify noticeable patterns from a large number of event sequences, especially for rare behaviors.

Graphic abstract

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Agarwal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499

  2. Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 429–435

  3. Casas-Garriga G (2005) Summarizing sequential data with closed partial orders. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 380–391

  4. Chen Y, Xu P, Ren L (2017) Sequence synopsis: optimize visual summary of temporal event data. IEEE Trans Vis Comput Gr 24(1):45–55

  5. Cuenca E, Sallaberry A, Ying Wang F, Poncelet P (2018) MultiStream: a multiresolution streamgraph approach to explore hierarchical time series. IEEE Trans Vis Comput Gr 24(12):3160–3173

  6. Du F, Shneiderman B, Plaisant C, Malik S, Perer A (2016) Coping with volume and variety in temporal event sequences: strategies for sharpening analytic focus. IEEE Trans Vis Comput Gr 23(6):1636–1649

  7. Fan X, Li C, Dong X (2019) A real-time network security visualization system based on incremental learning (chinavis 2018). J Vis 22(1):215–229

  8. Fournier-Viger P, Wu CW, Tseng VS (2012) Mining top-k association rules. In: Canadian conference on artificial intelligence. Springer, pp 61–73

  9. Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 40–52

  10. Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn 1(1):54–77

  11. Guo S, Xu K, Zhao R, Gotz D, Zha H, Cao N (2017) EventThread: visual summarization and stage analysis of event sequence data. IEEE Trans Vis Comput Gr 99:1–1

  12. Guo S, Du F, Malik S, Koh E, Kim S, Liu Z, Kim D, Zha H, Cao N (2019) Visualizing uncertainty and alternatives in event sequence predictions. In: Proceedings of the 2019 CHI conference on human factors in computing systems. ACM, p 573

  13. Heckerman D (1999) Msnbc. com anonymous web data set

  14. Koh YS, Ravana SD (2016) Unsupervised rare pattern mining: a survey. ACM Trans Knowl Discov Data 10(4):45

  15. Kwon BC, Choi M-J, Kim JT, Choi E, Kim YB, Kwon S, Sun J, Choo J (2019) Retainvis: visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Trans Vis Comput Gr 25(1):299–309

  16. Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: Eighth IEEE international conference on data mining. IEEE, pp 413–422

  17. Liu Z, Wang Y, Dontcheva M, Hoffman M, Walker S, Wilson A (2016) Patterns and sequences: interactive exploration of clickstreams to understand common visitor paths. IEEE Trans Vis Comput Gr 23(1):321–330

  18. Liu Z, Kerr B, Dontcheva M, Grover J, Hoffman M, Wilson A (2017) Coreflow: extracting and visualizing branching patterns from event sequences. Comput Gr Forum 36(3):527–538

  19. Lu J, Wang X-F, Adjei O, Hussain F (2004) Sequential patterns graph and its construction algorithm. Chin J Comput Chin Edn 27(6):782–788

  20. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  21. Monroe M, Lan R, Lee H, Plaisant C, Shneiderman B (2013) Temporal event sequence simplification. IEEE Trans Vis Comput Gr 19(12):2227–2236

  22. Ng P (2017) dna2vec: Consistent vector representations of variable-length k-mers. arXiv preprint arXiv:1701.06279

  23. Piech C, Bassen J, Huang J, Ganguli S, Sahami M, Guibas LJ, Sohl-Dickstein J (2015) Deep knowledge tracing. In: Advances in neural information processing systems, pp 505–513

  24. Plaisant C, Shneiderman B (2016) The diversity of data and tasks in event analytics. In: Proceedings of the IEEE VIS 2016 workshop on temporal and sequential event analysis

  25. Samet A, Guyet T, Négrevergne B (2017) Mining rare sequential patterns with ASP. In: ILP

  26. Scholtes I (2017) When is a network a network? Multi-order graphical model selection in pathways and temporal networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1037–1046

  27. Song Y, Wen Z, Lin CY, Davis R (2013) One-class conditional random fields for sequential anomaly detection. In: Twenty-third international joint conference on artificial intelligence

  28. Sugiyama K, Tagawa S, Toda M (1981) Methods for visual understanding of hierarchical system structures. IEEE Trans Syst Man Cybern 11(2):109–125

  29. Unger A, Dräger N, Sips M, Lehmann DJ (2017) Understanding a sequence of sequences: visual exploration of categorical states in lake sediment cores. IEEE Trans Vis Comput Gr 99:1

  30. Wei J, Shen Z, Sundaresan N, Ma KL (2012) Visual cluster exploration of web clickstream data. In: IEEE VAST, pp 3–12

  31. Wongsuphasawat K, Gotz D (2012) Exploring flow, factors, and outcomes of temporal event sequences with the outflow visualization. IEEE Trans Vis Comput Gr 18(12):2659–2668

  32. Wongsuphasawat K, Guerra Gómez JA, Plaisant C, Wang TD, Taieb-Maimon M, Shneiderman, B (2011) Lifeflow: visualizing an overview of event sequences. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1747–1756

  33. Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60

  34. Zhao Z, Liu T, Li S, Li B, Du X (2017) Ngram2vec: learning improved word representations from ngram co-occurrence statistics. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 244–253

  35. Zhu J, Wang K, Wu Y, Hu Z, Wang H (2016) Mining user-aware rare sequential topic patterns in document streams. IEEE Trans Knowl Data Eng 28(7):1790–1804

Download references

Acknowledgements

This work is supported by National Key Research and Development Program of China (Grant No. 2017YFB0701900), National Nature Science Foundation of China (Grant No. 61100053) and Key Laboratory of Machine Perception in Peking University (K-2019-09).

Author information

Correspondence to Xiaoju Dong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, C., Dong, X., Liu, W. et al. SSRDVis: Interactive visualization for event sequences summarization and rare detection. J Vis 23, 171–184 (2020). https://doi.org/10.1007/s12650-019-00609-x

Download citation

Keywords

  • Visual analytics
  • Event sequences
  • Sequential pattern mining
  • Rare detection