Advertisement

Constraint-Based Sequence Mining Using Constraint Programming

  • Benjamin NegrevergneEmail author
  • Tias Guns
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9075)

Abstract

The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task.

We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms.

Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.

Keywords

Sequential pattern mining Sequence mining Episode mining Constrained pattern mining Constraint programming Declarative programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules in large database. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)Google Scholar
  3. 3.
    Coquery, E., Jabbour, S., Sais, L., Salhi, Y.: A sat-based approach for discovering frequent, closed and maximal patterns in a sequence. In: European Conference on Artificial Intelligence (ECAI), pp. 258–263 (2012)Google Scholar
  4. 4.
    Fannes, T., Vandermarliere, E., Schietgat, L., Degroeve, S., Martens, L., Ramon, J.: Predicting tryptic cleavage from proteomics data using decision tree ensembles. Journal of Proteome Research 12(5), 2253–2259 (2013). http://pubs.acs.org/doi/abs/10.1021/pr4001114 CrossRefGoogle Scholar
  5. 5.
    Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: A constraint programming perspective. Artificial Intelligence 175(12–13), 1951–1983 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. ICDE 2001, pp. 215–224, April 2001Google Scholar
  7. 7.
    Jabbour, S., Sais, L., Salhi, Y.: Boolean satisfiability for sequence mining. In: 22nd International Conference on Information and Knowledge Management (CIKM 2013), pp. 649–658. ACM Press, San Francisco (2013)Google Scholar
  8. 8.
    Kemmar, A., Ugarte, W., Loudni, S., Charnois, T., Lebbah, Y., Boizumault, P., Cremilleux, B.: Mining relevant sequence patterns with cp-based framework. In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE (2014)Google Scholar
  9. 9.
    Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997)CrossRefGoogle Scholar
  10. 10.
    Métivier, J.P., Loudni, S., Charnois, T.: A constraint programming approach for mining sequential patterns in a sequence database. In: ECML/PKDD 2013 Workshop on Languages for Data Mining and Machine Learning (2013)Google Scholar
  11. 11.
    Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset mining. In: International Conference on Data Mining (ICDM) (2013)Google Scholar
  12. 12.
    Negrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. CoRR abs/1501.01178 (2015)Google Scholar
  13. 13.
    Nijssen, S., Guns, T., De Raedt, L.: Correlated itemset mining in ROC space: A constraint programming approachGoogle Scholar
  14. 14.
    Ohtani, H., Kida, T., Uno, T., Arimura, H., Arimura, H.: Efficient serial episode mining with minimal occurrences. In: ICUIMC, pp. 457–464 (2009)Google Scholar
  15. 15.
    Ugarte Rojas, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (soft-) skypatterns using dynamic CSP. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 71–87. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  16. 16.
    Tatti, N., Vreeken, J.: The long and the short of it: summarising event sequences with serial episodes. In: KDD, pp. 462–470 (2012)Google Scholar
  17. 17.
    Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering, pp. 79–90. IEEE (2004)Google Scholar
  18. 18.
    Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)Google Scholar
  19. 19.
    Ye, K., Kosters, W.A., IJzerman, A.P.: An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics 23(6), 687–693 (2007)CrossRefGoogle Scholar
  20. 20.
    Zaki, M.J.: Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the ninth international conference on Information and knowledge management, pp. 422–429. ACM (2000)Google Scholar
  21. 21.
    Zaki, M.J.: Spade: An efficient algorithm for mining frequent sequences. Machine Learning 42(1), 31–60 (2001)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.DTAI Research groupKU LeuvenLeuvenBelgium

Personalised recommendations