Abstract
The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task.
We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms.
Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules in large database. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Coquery, E., Jabbour, S., Sais, L., Salhi, Y.: A sat-based approach for discovering frequent, closed and maximal patterns in a sequence. In: European Conference on Artificial Intelligence (ECAI), pp. 258–263 (2012)
Fannes, T., Vandermarliere, E., Schietgat, L., Degroeve, S., Martens, L., Ramon, J.: Predicting tryptic cleavage from proteomics data using decision tree ensembles. Journal of Proteome Research 12(5), 2253–2259 (2013). http://pubs.acs.org/doi/abs/10.1021/pr4001114
Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: A constraint programming perspective. Artificial Intelligence 175(12–13), 1951–1983 (2011)
Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. ICDE 2001, pp. 215–224, April 2001
Jabbour, S., Sais, L., Salhi, Y.: Boolean satisfiability for sequence mining. In: 22nd International Conference on Information and Knowledge Management (CIKM 2013), pp. 649–658. ACM Press, San Francisco (2013)
Kemmar, A., Ugarte, W., Loudni, S., Charnois, T., Lebbah, Y., Boizumault, P., Cremilleux, B.: Mining relevant sequence patterns with cp-based framework. In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE (2014)
Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997)
Métivier, J.P., Loudni, S., Charnois, T.: A constraint programming approach for mining sequential patterns in a sequence database. In: ECML/PKDD 2013 Workshop on Languages for Data Mining and Machine Learning (2013)
Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset mining. In: International Conference on Data Mining (ICDM) (2013)
Negrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. CoRR abs/1501.01178 (2015)
Nijssen, S., Guns, T., De Raedt, L.: Correlated itemset mining in ROC space: A constraint programming approach
Ohtani, H., Kida, T., Uno, T., Arimura, H., Arimura, H.: Efficient serial episode mining with minimal occurrences. In: ICUIMC, pp. 457–464 (2009)
Ugarte Rojas, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (soft-) skypatterns using dynamic CSP. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 71–87. Springer, Heidelberg (2014)
Tatti, N., Vreeken, J.: The long and the short of it: summarising event sequences with serial episodes. In: KDD, pp. 462–470 (2012)
Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering, pp. 79–90. IEEE (2004)
Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)
Ye, K., Kosters, W.A., IJzerman, A.P.: An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics 23(6), 687–693 (2007)
Zaki, M.J.: Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the ninth international conference on Information and knowledge management, pp. 422–429. ACM (2000)
Zaki, M.J.: Spade: An efficient algorithm for mining frequent sequences. Machine Learning 42(1), 31–60 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Negrevergne, B., Guns, T. (2015). Constraint-Based Sequence Mining Using Constraint Programming. In: Michel, L. (eds) Integration of AI and OR Techniques in Constraint Programming. CPAIOR 2015. Lecture Notes in Computer Science(), vol 9075. Springer, Cham. https://doi.org/10.1007/978-3-319-18008-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-18008-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18007-6
Online ISBN: 978-3-319-18008-3
eBook Packages: Computer ScienceComputer Science (R0)