Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks

Guyet, Thomas; Moinard, Yves; Quiniou, René; Schaub, Torsten

doi:10.1007/978-3-319-65406-5_3

Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks

Thomas Guyet⁶,
Yves Moinard⁷,
René Quiniou⁷ &
…
Torsten Schaub⁸

Chapter
First Online: 11 October 2017

522 Accesses
7 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 732))

Abstract

This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for a long time. We propose encodings of the classical sequential pattern mining tasks within two representations of embeddings (fill-gaps versus skip-gaps) and for various kinds of patterns: frequent, constrained and condensed. We compare the computational performance of these encodings with each other to get a good insight into the efficiency of ASP encodings. The results show that the fill-gaps strategy is better on real problems due to lower memory consumption. Finally, compared to a constraint programming approach (CPSM), another declarative programming paradigm, our proposal showed comparable performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
It is important to notice that the scope of a variable is the rule and each occurrence of a variable in a rule represents the same value.
2.
clingo is fully compliant with the recent ASP standard: https://www.mat.unical.it/aspcomp2013/ASPStandardization.
3.
A similar encoding can be done for the fill-gaps strategy applying the same changes as above.
4.
asprin (Brewka et al. 2015) is a clingo extension that allows for this kind of comparison. For more details about the use of asprin to extract skypatterns, see Gebser et al. (2016).
5.
https://potassco.org/.
6.
The generator and databases used in our experiments are available at https://sites.google.com/site/aspseqmining.
7.
The use of subset-minimal heuristic keeps solving the maximal patterns problem complete.

References

Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data (pp. 207–216).
Google Scholar
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proceedings of the International Conference on Data Engineering (pp. 3–14).
Google Scholar
Biere, A., Heule, M., van Maaren, H., & Walsh, T. (2009). Handbook of satisfiability. Frontiers in artificial intelligence and applications (Vol. 185). IOS Press.
Google Scholar
Bonchi, F., Giannotti, F., Lucchese, C., Orlando, S., Perego, R., & Trasarti, R. (2006). Conquest: A constraint-based querying system for exploratory pattern discovery. In Proceedings of the International Conference on Data Engineering (pp. 159–159).
Google Scholar
Boulicaut, J.-F., & Jeudy, B. (2005). Constraint-based data mining. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 399–416). US: Springer.
Chapter Google Scholar
Brewka, G., Delgrande, J.P., Romero, J., & Schaub, T. (2015). Asprin: Customizing answer set preferences without a headache. In Proceedings of the Conference on Artificial Intelligence (AAAI), pp. 1467–1474.
Google Scholar
Bruynooghe, M., Blockeel, H., Bogaerts, B., De Cat, B., De Pooter, S., Jansen, J., et al. (2015). Predicate logic as a modeling language: Modeling and solving some machine learning and data mining problems with IDP3. Theory and Practice of Logic Programming, 15(06), 783–817.
Article MathSciNet MATH Google Scholar
Coletta, R., & Negrevergne, B. (2016). A SAT model to mine flexible sequences in transactional datasets. arXiv:1604.00300.
Coquery, E., Jabbour, S., Saïs, L., & Salhi, Y. (2012). A SAT-Based approach for discovering frequent, closed and maximal patterns in a sequence. In Proceedings of European Conference on Artificial Intelligence (ECAI) (pp. 258–263).
Google Scholar
Dao, T., Duong, K., & Vrain, C. (2015). Constrained minimum sum of squares clustering by constraint programming. In Proceedings of Principles and Practice of Constraint Programming (pp. 557–573).
Google Scholar
De Raedt, L. (2015). Languages for learning and mining. In Proceedings of the Conference on Artificial Intelligence (AAAI) (pp. 4107–4111).
Google Scholar
Garofalakis, M., Rastogi, R., & Shim, K. (1999). SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of the International Conference on Very Large Data Bases (pp. 223–234).
Google Scholar
Gebser, M., Guyet, T., Quiniou, R., Romero, J., & Schaub, T. (2016). Knowledge-based sequence mining with ASP. In Proceedings of International Join Conference on Artificial Intelligence (pp. 1497–1504).
Google Scholar
Gebser, M., Kaminski, R., Kaufmann, B., Ostrowski, M., Schaub, T., & Schneider, M. (2011). Potassco: The Potsdam answer set solving collection. AI Communications, 24(2), 107–124.
MathSciNet MATH Google Scholar
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T. (2014). Clingo = ASP + control: Preliminary report. In Technical Communications of the Thirtieth International Conference on Logic Programming.
Google Scholar
Gelfond, M., & Lifschitz, V. (1991). Classical negation in logic programs and disjunctive databases. New Generation Computing, 9, 365–385.
Article MATH Google Scholar
Guns, T., Dries, A., Nijssen, S., Tack, G., & De Raedt, L. (2015). MiningZinc: A declarative framework for constraint-based mining. Artificial Intelligence, page In press.
Google Scholar
Guns, T., Nijssen, S., & De Raedt, L. (2011). Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12–13), 1951–1983.
Article MathSciNet MATH Google Scholar
Gupta, M., & Han, J. (2013). Data mining: Concepts, methodologies, tools, and applications, chapter Applications of pattern discovery using sequential data mining (pp. 947–970). IGI-Global.
Google Scholar
Guyet, T., Moinard, Y., & Quiniou, R. (2014). Using answer set programming for pattern mining. In Proceedings of Conference “Intelligence Artificielle Fondamentale” (IAF).
Google Scholar
Guyet, T., Moinard, Y., Quiniou, R., & Schaub, T. (2016). Fouille de motifs séquentiels avec ASP. In Proceedings of Conference “Extraction et la Gestion des Connaissances” (EGC) (pp. 39–50).
Google Scholar
Imielinski, T., & Mannila, H. (1996). A database perspective on knowledge discovery. Communications of the ACM, 39(11), 58–64.
Article Google Scholar
Janhunen, T., & Niemelä, I. (2016). The answer set programming paradigm. AI Magazine, 37, 13–24.
Article Google Scholar
Järvisalo, M. (2011). Itemset mining as a challenge application for answer set enumeration. In Proceedings of the Conference on Logic Programming and Nonmonotonic Reasoning (pp. 304–310).
Google Scholar
Lallouet, A., Moinard, Y., Nicolas, P., & Stéphan, I. (2013). Programmation logique. In P. Marquis, O. Papini, & H. Prade (Eds.), Panorama de l’intelligence artificielle: ses bases méthodologiques, ses développements (Vol. 2). Cépaduès.
Google Scholar
Lefèvre, C., & Nicolas, P. (2009). The first version of a new ASP solver: ASPeRiX. In Proceedings of the Conference on Logic Programming and Nonmonotonic Reasoning (pp. 522–527).
Google Scholar
Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., et al. (2006). The DLV system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7(3), 499–562.
Article MathSciNet MATH Google Scholar
Lhote, L. (2010). Number of frequent patterns in random databases. In Skiadas, C. H. (Ed.), Advances in data analysis, Statistics for industry and technology (pp. 33–45).
Google Scholar
Lifschitz, V. (2008). What is answer set programming? In Proceedings of the Conference on Artificial Intelligence (AAAI) (pp. 1594–1597).
Google Scholar
Low-Kam, C., Raïssi, C., Kaytoue, M., & Pei, J. (2013). Mining statistically significant sequential patterns. In Proceedings of the IEEE International Conference on Data Mining (pp. 488–497).
Google Scholar
Métivier, J.-P., Loudni, S., & Charnois, T. (2013). A constraint programming approach for mining sequential patterns in a sequence database. In Proceedings of the Workshops of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD).
Google Scholar
Mooney, C. H., & Roddick, J. F. (2013). Sequential pattern mining—Approaches and algorithms. ACM Computing Surveys, 45(2), 1–39.
Article MATH Google Scholar
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19, 629–679.
Article MathSciNet MATH Google Scholar
Negrevergne, B., Dries, A., Guns, T., & Nijssen, S. (2013). Dominance programming for itemset mining. In Proceedings of the International Conference on Data Mining (pp. 557–566).
Google Scholar
Negrevergne, B., & Guns, T. (2015). Constraint-based sequence mining using constraint programming. In Proceedings of International Conference on Integration of AI and OR Techniques in Constraint Programming, CPAIOR (pp. 288–305).
Google Scholar
Nethercote, N., Stuckey, P. J., Becket, R., Brand, S., Duck, G. J., & Tack, G. (2007). MiniZinc: Towards a standard CP modelling language. In Proceedings of the Conference on Principles and Practice of Constraint Programming (pp. 529–543).
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.
Article Google Scholar
Pei, J., Han, J., & Wang, W. (2007). Constraint-based sequential pattern mining: The pattern-growth methods. Journal of Intelligent Information Systems, 28(2), 133–160.
Article Google Scholar
Perer, A., & Wang, F. (2014). Frequence: Interactive mining and visualization of temporal frequent event sequences. In Proceedings of the international Conference on Intelligent User Interfaces (pp. 153–162).
Google Scholar
Rossi, F., Van Beek, P., & Walsh, T. (2006). Handbook of constraint programming. Elsevier.
Google Scholar
Shen, W., Wang, J., & Han, J. (2014). Sequential pattern mining. In Aggarwal, C. C., & Han, J. (Ed.), Frequent pattern mining (pp. 261–282). Springer.
Google Scholar
Simons, P., Niemelä, I., & Soininen, T. (2002). Extending and implementing the stable model semantics. Artificial Intelligence, 138(1–2), 181–234.
Article MathSciNet MATH Google Scholar
Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology (pp. 3–17).
Google Scholar
Syrjänen, T., & Niemelä, I. (2001). The smodels system. In Proceedings of the Conference on Logic Programming and Nonmotonic Reasoning (pp. 434–438).
Google Scholar
Ugarte, W., Boizumault, P., Crémilleux, B., Lepailleur, A., Loudni, S., Plantevit, M., Raïssi, C., & Soulet, A. (2015). Skypattern mining: From pattern condensed representations to dynamic constraint satisfaction problems. Artificial Intelligence, page In press.
Google Scholar
Uno, T. (2004). http://research.nii.ac.jp/~uno/code/lcm_seq.html.
Vautier, A., Cordier, M., & Quiniou, R. (2007). Towards data mining without information on knowledge structure. In Proceedings of the Conference on Principles and Practice of Knowledge Discovery in Databases (pp. 300–311).
Google Scholar
Wang, J., & Han, J. (2004). BIDE: Efficient mining of frequent closed sequences. In Proceedings of the International Conference on Data Engineering (pp. 79–90).
Google Scholar
Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large datasets. In Proceedings of the SIAM Conference on Data Mining (pp. 166–177).
Google Scholar
Zaki, M. J. (2001). SPADE: An efficient algorithm for mining frequent sequences. Journal of Machine Learning, 42(1/2), 31–60.
Article MATH Google Scholar
Zhang, L., Luo, P., Tang, L., Chen, E., Liu, Q., Wang, M., et al. (2015). Occupancy-based frequent pattern mining. ACM Transactions on Knowledge Discovery from Data, 10(2), 1–33.
Google Scholar

Download references

Acknowledgements

We would like to thanks Roland Kaminski and Max Ostrowski for their significant inputs and comments about ASP encodings; and Benjamin Negrevergne and Tias Guns for their suggestions about the experimental part. We also thank the anonymous reviewers for their valuable comments and constructive suggestions.

Author information

Authors and Affiliations

AGROCAMPUS-OUEST/IRISA-UMR 6074, Rennes, France
Thomas Guyet
Inria – Centre de Rennes, Rennes, France
Yves Moinard & René Quiniou
Potsdam University, Potsdam, Germany
Torsten Schaub

Authors

Thomas Guyet
View author publications
You can also search for this author in PubMed Google Scholar
Yves Moinard
View author publications
You can also search for this author in PubMed Google Scholar
René Quiniou
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Schaub
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Guyet .

Editor information

Editors and Affiliations

University of Bordeaux, Bordeaux, France
Bruno Pinaud
Polytech Nantes, Nantes, France
Fabrice Guillet
University of Caen Normandie, Caen Cedex 5, France
Bruno Cremilleux
University of Reims Champagne-Ardenne, Reims, France
Cyril de Runz

Appendix

Listing 3.10 illustrates how the encoding of the skip-gaps strategy can be transformed to mine sequential patterns that are sequences of itemsets.

The first difference with the encoding of Listing 3.2 concerns the generation of patterns. The upper bound constraint of the choice rule in Line 9 has been removed, enabling the possible generation of every non-empty subset of \(\mathscr {I}\).

The second difference is that the new ASP rules verify the inclusion of all items in itemsets. Line 14, seq(T,P,I):pat(1,I) indicates that for each atom pat(1,I) there should exist an atom seq(T,P,I) to satisfy the rule body. A similar expression is used Line 15.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Guyet, T., Moinard, Y., Quiniou, R., Schaub, T. (2018). Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks. In: Pinaud, B., Guillet, F., Cremilleux, B., de Runz, C. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 732. Springer, Cham. https://doi.org/10.1007/978-3-319-65406-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-65406-5_3
Published: 11 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65405-8
Online ISBN: 978-3-319-65406-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation