Skip to main content
Log in

Improving constrained pattern mining with first-fail-based heuristics

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In this paper, we present a general framework to mine patterns with antimonotone constraints. This framework uses a technique that structures the pattern space in a way that facilitates the integration of constraints within the mining process. Furthermore, we also introduce a powerful strategy that uses background information on the data to speed-up the mining process. We illustrate our approach on a popular structured data mining problem, the frequent subgraph mining problem, and show, through experiments on synthetic and real-life data, that this general approach has advantages over state-of-the-art pattern mining algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. Washington, DC, pp 207–216

  • Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In Proceedings of the 2nd annual SIAM symposium on data mining, pp 158–174

  • Bessière C, Régin J-C (1996) Mac and combined heuristics: two reasons to forsake FC (and CBJ ?) on hard problems

  • Borgelt C, Berthold MR, Patterson DE (2005) Molecular fragment mining for drug discovery. Number 3571 in lecture notes in AI. Springer Verlag, pp 1002–1013

  • Chen Y, Yang LH, Wang YG (2004) Incremental mining of frequent xml query patterns. In: Proceedings of 4th IEEE international conference on data mining (ICDM’04), Los Alamitos, CA, USA. IEEE Computer Society, pp 343–346

  • Dehaspe L, Toivonen H (1999) Discovery of frequent datalog patterns. Data Min Knowl Discov 3(1): 7–36

    Article  Google Scholar 

  • Deshpande M, Kuramochi M, Karypis G (2002) Automated approaches for classifying structures. In: Proceedings of the 2002 workshop on data mining in bioinformatics (BIOKDD’02). Edmonton, Canada, pp 11–18

  • Desrosiers C, Galinier P, Hansen P, Hertz A (2007) Sygma: reducing symmetry in graph mining. Technical Report G-2007-12, Les Cahiers du GERAD, December 2007

  • Fortin S (1996) The graph isomorphism problem. Technical Report 96-20, University of Alberta, Edomonton, Alberta, Canada

  • Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Proceedings of ACM SIGKDD, ACM, pp 138–147

  • Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co Ltd., New York

    MATH  Google Scholar 

  • Garofalakis M, Rastogi R, Shim K (2002) Mining sequential patterns with regular expression constraints. IEEE Trans Knowl Data Eng 14(3): 530–552

    Article  Google Scholar 

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton J, Bernstein PA (eds) 2000 ACM SIGMOD international conference on management of data. ACM Press, pp 1–12

  • Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), pp 549–552

  • Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proc of the 4th European conference on principles of data mining and knowledge discovery. Springer-Verlag, pp 13–23

  • Kramer S, De Raedt L, Helma C (2001) Molecular feature mining in HIV data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’01). ACM, New York, NY, USA, pp 136–143

  • Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the first IEEE conference on data mining, pp 313–320

  • Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3): 243–271

    Article  MathSciNet  Google Scholar 

  • Lee SD, De Raedt L (2004) Constraint based mining of first order sequences in seqlog. In: Database support for data mining application. Springer, pp 155–176

  • Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258

    Article  Google Scholar 

  • McKay B (1981) Practical graph isomorphism. Congressus Numeratium 30: 45–87

    MathSciNet  Google Scholar 

  • National Cancer Institute (NCI) (1999) DTP/2D and 3D structural information. http://cactus.nci.nih.gov/ncidb2/download.html

  • Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: SIGMOD’98, pp 13–24

  • Nijssen S, Kok JN (2001) Faster association rules for multiple relations. In: IJCAI, pp 891–896

  • Nijssen S, Kok JN (2004) The gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools (Grabats 2004), October 2004. Elsevier, pp 281–285

  • Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering (ICDE ’01). IEEE Computer Society, Washington, DC, USA, pp 215–224

  • Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT ’96: proceedings of the 5th international conference on extending database technology, pp 3–17

  • Sternberg MJE, King RD, Srinivasan A, Muggleton S (1995) Drug design by machine learning. In: Machine intelligence, vol 15, pp 328–338

  • Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: Proc of int conf on data mining ICDM’02. Maebashi, Japan, pp 450–457

  • Wang C, Hong M-S, Wang W, Shi B-L (2004) Chopper: efficient algorithm for tree mining. J Comput Sci Technol 19(3): 309–319

    Article  Google Scholar 

  • Wang C, Zhu Y, Wu T, Wang W, Shi B (2005) Constraint-based graph mining in large database. In: Proc of APWeb 2005, pp 133–144

  • Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM’02). IEEE Computer Society, Washington, DC, USA, pp 721–724

  • Yan X, Yu P, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of 2004 ACM-SIGMOD international conference management of data (SIGMOD’04). Paris, France, pp 335–346

  • Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management, pp 422–429

  • Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. 2002 ACM, New York, NY, USA, pp 71–80

  • Zaki MJ, Nadimpally V, Bardhan D, Bystroff C (2004) Predicting protein folding pathways. Bioinformatics 20(1): 386–393

    Article  Google Scholar 

  • Zhu F, Yan X, Han J, Yu PS (2007) gPrune: a constraint pushing framework for graph pattern mining. In: Advances in knowledge discovery and data mining. PAKDD 2007, pp 388–400

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Desrosiers.

Additional information

Responsible editor: M.J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Desrosiers, C., Galinier, P., Hertz, A. et al. Improving constrained pattern mining with first-fail-based heuristics. Data Min Knowl Disc 23, 63–90 (2011). https://doi.org/10.1007/s10618-010-0199-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-010-0199-1

Keywords

Navigation