Advertisement

Fast Discovery of Generalized Sequential Patterns

  • Marzena Kryszkiewicz
  • Łukasz Skonieczny
Chapter
Part of the Studies in Big Data book series (SBD, volume 40)

Abstract

Knowledge in the form of generalized sequential patterns finds many applications. In this paper, we focus on optimizing GSP, which is a well-known algorithm for discovering such patterns. Our optimization consists in more selective identification of nodes to be visited while traversing a hash tree with candidates for generalized sequential patterns. It is based on the fact that elements of candidate sequences are stored as ordered sets of items. In order to reduce the number of visited nodes in the hash tree, we also propose to use not only parameters windowSize and maxGap as in original GSP, but also parameter minGap. As a result of our optimization, the number of candidates that require final time-consuming verification may be considerably decreased. In the experiments we have carried out, our optimized variant of GSP was several times faster than standard GSP.

Keywords

Data mining Sequential patterns Generalized sequential patterns GSP 

References

  1. 1.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE 1995, pp. 3–14. IEEE Computer Society (1995)Google Scholar
  2. 2.
    Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: KDD 2002, pp. 429–435. ACM (2002)Google Scholar
  3. 3.
    Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)Google Scholar
  4. 4.
    Garofalakis, M.N., Rastogi, R., Shim, K., SPIRIT: sequential pattern mining with regular expression constraints. VLDB J. 223–234 (1999)Google Scholar
  5. 5.
    Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: frequent pattern-projected sequential pattern mining. In: KDD 2000, pp. 355–359. ACM (2000)Google Scholar
  6. 6.
    IBM Almaden Quest Research Group, Quest Synthetic Data GeneratorGoogle Scholar
  7. 7.
    Lin, M.Y., Lee, S.Y.: Fast discovery of sequential patterns by memory indexing. In: DaWaK 2002. LNCS, vol. 2454, pp. 150–160. Springer (2002)CrossRefGoogle Scholar
  8. 8.
    Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns by prefix-projected growth. In: ICDE 2001, pp. 215–224. IEEE Computer Society (2001)Google Scholar
  9. 9.
    Pei, J., Han, J., Wang, W.: Mining sequential patterns with constraints in large databases. In: CIKM 2002, pp. 18–25. ACM (2002)Google Scholar
  10. 10.
    Protaziuk, G., Kryszkiewicz, M., Rybinski, H., Delteil, A.: Discovering compound and proper nouns. In: RSEISP 2007. LNCS, vol. 4585, pp. 505–515. Springer (2007)Google Scholar
  11. 11.
    Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer (1996)Google Scholar
  12. 12.
    Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: ICDE 2004, pp. 79–90. IEEE Computer Society (2004)Google Scholar
  13. 13.
    Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: SDM 2003, pp. 166–177. SIAM (2003)CrossRefGoogle Scholar
  14. 14.
    Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1/2), 31–60 (2001)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Computer ScienceWarsaw University of TechnologyWarsawPoland

Personalised recommendations