Skip to main content

Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 732))

Abstract

This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for a long time. We propose encodings of the classical sequential pattern mining tasks within two representations of embeddings (fill-gaps versus skip-gaps) and for various kinds of patterns: frequent, constrained and condensed. We compare the computational performance of these encodings with each other to get a good insight into the efficiency of ASP encodings. The results show that the fill-gaps strategy is better on real problems due to lower memory consumption. Finally, compared to a constraint programming approach (CPSM), another declarative programming paradigm, our proposal showed comparable performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    It is important to notice that the scope of a variable is the rule and each occurrence of a variable in a rule represents the same value.

  2. 2.

    clingo is fully compliant with the recent ASP standard: https://www.mat.unical.it/aspcomp2013/ASPStandardization.

  3. 3.

    A similar encoding can be done for the fill-gaps strategy applying the same changes as above.

  4. 4.

    asprin (Brewka et al. 2015) is a clingo extension that allows for this kind of comparison. For more details about the use of asprin to extract skypatterns, see Gebser et al. (2016).

  5. 5.

    https://potassco.org/.

  6. 6.

    The generator and databases used in our experiments are available at https://sites.google.com/site/aspseqmining.

  7. 7.

    The use of subset-minimal heuristic keeps solving the maximal patterns problem complete.

References

  • Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data (pp. 207–216).

    Google Scholar 

  • Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proceedings of the International Conference on Data Engineering (pp. 3–14).

    Google Scholar 

  • Biere, A., Heule, M., van Maaren, H., & Walsh, T. (2009). Handbook of satisfiability. Frontiers in artificial intelligence and applications (Vol. 185). IOS Press.

    Google Scholar 

  • Bonchi, F., Giannotti, F., Lucchese, C., Orlando, S., Perego, R., & Trasarti, R. (2006). Conquest: A constraint-based querying system for exploratory pattern discovery. In Proceedings of the International Conference on Data Engineering (pp. 159–159).

    Google Scholar 

  • Boulicaut, J.-F., & Jeudy, B. (2005). Constraint-based data mining. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 399–416). US: Springer.

    Chapter  Google Scholar 

  • Brewka, G., Delgrande, J.P., Romero, J., & Schaub, T. (2015). Asprin: Customizing answer set preferences without a headache. In Proceedings of the Conference on Artificial Intelligence (AAAI), pp. 1467–1474.

    Google Scholar 

  • Bruynooghe, M., Blockeel, H., Bogaerts, B., De Cat, B., De Pooter, S., Jansen, J., et al. (2015). Predicate logic as a modeling language: Modeling and solving some machine learning and data mining problems with IDP3. Theory and Practice of Logic Programming, 15(06), 783–817.

    Article  MathSciNet  MATH  Google Scholar 

  • Coletta, R., & Negrevergne, B. (2016). A SAT model to mine flexible sequences in transactional datasets. arXiv:1604.00300.

  • Coquery, E., Jabbour, S., Saïs, L., & Salhi, Y. (2012). A SAT-Based approach for discovering frequent, closed and maximal patterns in a sequence. In Proceedings of European Conference on Artificial Intelligence (ECAI) (pp. 258–263).

    Google Scholar 

  • Dao, T., Duong, K., & Vrain, C. (2015). Constrained minimum sum of squares clustering by constraint programming. In Proceedings of Principles and Practice of Constraint Programming (pp. 557–573).

    Google Scholar 

  • De Raedt, L. (2015). Languages for learning and mining. In Proceedings of the Conference on Artificial Intelligence (AAAI) (pp. 4107–4111).

    Google Scholar 

  • Garofalakis, M., Rastogi, R., & Shim, K. (1999). SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of the International Conference on Very Large Data Bases (pp. 223–234).

    Google Scholar 

  • Gebser, M., Guyet, T., Quiniou, R., Romero, J., & Schaub, T. (2016). Knowledge-based sequence mining with ASP. In Proceedings of International Join Conference on Artificial Intelligence (pp. 1497–1504).

    Google Scholar 

  • Gebser, M., Kaminski, R., Kaufmann, B., Ostrowski, M., Schaub, T., & Schneider, M. (2011). Potassco: The Potsdam answer set solving collection. AI Communications, 24(2), 107–124.

    MathSciNet  MATH  Google Scholar 

  • Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T. (2014). Clingo = ASP + control: Preliminary report. In Technical Communications of the Thirtieth International Conference on Logic Programming.

    Google Scholar 

  • Gelfond, M., & Lifschitz, V. (1991). Classical negation in logic programs and disjunctive databases. New Generation Computing, 9, 365–385.

    Article  MATH  Google Scholar 

  • Guns, T., Dries, A., Nijssen, S., Tack, G., & De Raedt, L. (2015). MiningZinc: A declarative framework for constraint-based mining. Artificial Intelligence, page In press.

    Google Scholar 

  • Guns, T., Nijssen, S., & De Raedt, L. (2011). Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12–13), 1951–1983.

    Article  MathSciNet  MATH  Google Scholar 

  • Gupta, M., & Han, J. (2013). Data mining: Concepts, methodologies, tools, and applications, chapter Applications of pattern discovery using sequential data mining (pp. 947–970). IGI-Global.

    Google Scholar 

  • Guyet, T., Moinard, Y., & Quiniou, R. (2014). Using answer set programming for pattern mining. In Proceedings of Conference “Intelligence Artificielle Fondamentale” (IAF).

    Google Scholar 

  • Guyet, T., Moinard, Y., Quiniou, R., & Schaub, T. (2016). Fouille de motifs séquentiels avec ASP. In Proceedings of Conference “Extraction et la Gestion des Connaissances” (EGC) (pp. 39–50).

    Google Scholar 

  • Imielinski, T., & Mannila, H. (1996). A database perspective on knowledge discovery. Communications of the ACM, 39(11), 58–64.

    Article  Google Scholar 

  • Janhunen, T., & Niemelä, I. (2016). The answer set programming paradigm. AI Magazine, 37, 13–24.

    Article  Google Scholar 

  • Järvisalo, M. (2011). Itemset mining as a challenge application for answer set enumeration. In Proceedings of the Conference on Logic Programming and Nonmonotonic Reasoning (pp. 304–310).

    Google Scholar 

  • Lallouet, A., Moinard, Y., Nicolas, P., & Stéphan, I. (2013). Programmation logique. In P. Marquis, O. Papini, & H. Prade (Eds.), Panorama de l’intelligence artificielle: ses bases méthodologiques, ses développements (Vol. 2). Cépaduès.

    Google Scholar 

  • Lefèvre, C., & Nicolas, P. (2009). The first version of a new ASP solver: ASPeRiX. In Proceedings of the Conference on Logic Programming and Nonmonotonic Reasoning (pp. 522–527).

    Google Scholar 

  • Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., et al. (2006). The DLV system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7(3), 499–562.

    Article  MathSciNet  MATH  Google Scholar 

  • Lhote, L. (2010). Number of frequent patterns in random databases. In Skiadas, C. H. (Ed.), Advances in data analysis, Statistics for industry and technology (pp. 33–45).

    Google Scholar 

  • Lifschitz, V. (2008). What is answer set programming? In Proceedings of the Conference on Artificial Intelligence (AAAI) (pp. 1594–1597).

    Google Scholar 

  • Low-Kam, C., Raïssi, C., Kaytoue, M., & Pei, J. (2013). Mining statistically significant sequential patterns. In Proceedings of the IEEE International Conference on Data Mining (pp. 488–497).

    Google Scholar 

  • Métivier, J.-P., Loudni, S., & Charnois, T. (2013). A constraint programming approach for mining sequential patterns in a sequence database. In Proceedings of the Workshops of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD).

    Google Scholar 

  • Mooney, C. H., & Roddick, J. F. (2013). Sequential pattern mining—Approaches and algorithms. ACM Computing Surveys, 45(2), 1–39.

    Article  MATH  Google Scholar 

  • Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19, 629–679.

    Article  MathSciNet  MATH  Google Scholar 

  • Negrevergne, B., Dries, A., Guns, T., & Nijssen, S. (2013). Dominance programming for itemset mining. In Proceedings of the International Conference on Data Mining (pp. 557–566).

    Google Scholar 

  • Negrevergne, B., & Guns, T. (2015). Constraint-based sequence mining using constraint programming. In Proceedings of International Conference on Integration of AI and OR Techniques in Constraint Programming, CPAIOR (pp. 288–305).

    Google Scholar 

  • Nethercote, N., Stuckey, P. J., Becket, R., Brand, S., Duck, G. J., & Tack, G. (2007). MiniZinc: Towards a standard CP modelling language. In Proceedings of the Conference on Principles and Practice of Constraint Programming (pp. 529–543).

    Google Scholar 

  • Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.

    Article  Google Scholar 

  • Pei, J., Han, J., & Wang, W. (2007). Constraint-based sequential pattern mining: The pattern-growth methods. Journal of Intelligent Information Systems, 28(2), 133–160.

    Article  Google Scholar 

  • Perer, A., & Wang, F. (2014). Frequence: Interactive mining and visualization of temporal frequent event sequences. In Proceedings of the international Conference on Intelligent User Interfaces (pp. 153–162).

    Google Scholar 

  • Rossi, F., Van Beek, P., & Walsh, T. (2006). Handbook of constraint programming. Elsevier.

    Google Scholar 

  • Shen, W., Wang, J., & Han, J. (2014). Sequential pattern mining. In Aggarwal, C. C., & Han, J. (Ed.), Frequent pattern mining (pp. 261–282). Springer.

    Google Scholar 

  • Simons, P., Niemelä, I., & Soininen, T. (2002). Extending and implementing the stable model semantics. Artificial Intelligence, 138(1–2), 181–234.

    Article  MathSciNet  MATH  Google Scholar 

  • Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology (pp. 3–17).

    Google Scholar 

  • Syrjänen, T., & Niemelä, I. (2001). The smodels system. In Proceedings of the Conference on Logic Programming and Nonmotonic Reasoning (pp. 434–438).

    Google Scholar 

  • Ugarte, W., Boizumault, P., Crémilleux, B., Lepailleur, A., Loudni, S., Plantevit, M., Raïssi, C., & Soulet, A. (2015). Skypattern mining: From pattern condensed representations to dynamic constraint satisfaction problems. Artificial Intelligence, page In press.

    Google Scholar 

  • Uno, T. (2004). http://research.nii.ac.jp/~uno/code/lcm_seq.html.

  • Vautier, A., Cordier, M., & Quiniou, R. (2007). Towards data mining without information on knowledge structure. In Proceedings of the Conference on Principles and Practice of Knowledge Discovery in Databases (pp. 300–311).

    Google Scholar 

  • Wang, J., & Han, J. (2004). BIDE: Efficient mining of frequent closed sequences. In Proceedings of the International Conference on Data Engineering (pp. 79–90).

    Google Scholar 

  • Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large datasets. In Proceedings of the SIAM Conference on Data Mining (pp. 166–177).

    Google Scholar 

  • Zaki, M. J. (2001). SPADE: An efficient algorithm for mining frequent sequences. Journal of Machine Learning, 42(1/2), 31–60.

    Article  MATH  Google Scholar 

  • Zhang, L., Luo, P., Tang, L., Chen, E., Liu, Q., Wang, M., et al. (2015). Occupancy-based frequent pattern mining. ACM Transactions on Knowledge Discovery from Data, 10(2), 1–33.

    Google Scholar 

Download references

Acknowledgements

We would like to thanks Roland Kaminski and Max Ostrowski for their significant inputs and comments about ASP encodings; and Benjamin Negrevergne and Tias Guns for their suggestions about the experimental part. We also thank the anonymous reviewers for their valuable comments and constructive suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Guyet .

Editor information

Editors and Affiliations

Appendix

Appendix

Listing 3.10 illustrates how the encoding of the skip-gaps strategy can be transformed to mine sequential patterns that are sequences of itemsets.

figure x

The first difference with the encoding of Listing 3.2 concerns the generation of patterns. The upper bound constraint of the choice rule in Line 9 has been removed, enabling the possible generation of every non-empty subset of \(\mathscr {I}\).

The second difference is that the new ASP rules verify the inclusion of all items in itemsets. Line 14, seq(T,P,I):pat(1,I) indicates that for each atom pat(1,I) there should exist an atom seq(T,P,I) to satisfy the rule body. A similar expression is used Line 15.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Guyet, T., Moinard, Y., Quiniou, R., Schaub, T. (2018). Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks. In: Pinaud, B., Guillet, F., Cremilleux, B., de Runz, C. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 732. Springer, Cham. https://doi.org/10.1007/978-3-319-65406-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65406-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65405-8

  • Online ISBN: 978-3-319-65406-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics