Data Mining and Knowledge Discovery

, Volume 23, Issue 1, pp 63–90 | Cite as

Improving constrained pattern mining with first-fail-based heuristics

  • Christian Desrosiers
  • Philippe Galinier
  • Alain Hertz
  • Pierre Hansen


In this paper, we present a general framework to mine patterns with antimonotone constraints. This framework uses a technique that structures the pattern space in a way that facilitates the integration of constraints within the mining process. Furthermore, we also introduce a powerful strategy that uses background information on the data to speed-up the mining process. We illustrate our approach on a popular structured data mining problem, the frequent subgraph mining problem, and show, through experiments on synthetic and real-life data, that this general approach has advantages over state-of-the-art pattern mining algorithms.


Constraint-based pattern mining Frequent subgraph mining First-fail heuristic 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the 1993 ACM SIGMOD international conference on management of data. Washington, DC, pp 207–216Google Scholar
  2. Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In Proceedings of the 2nd annual SIAM symposium on data mining, pp 158–174Google Scholar
  3. Bessière C, Régin J-C (1996) Mac and combined heuristics: two reasons to forsake FC (and CBJ ?) on hard problemsGoogle Scholar
  4. Borgelt C, Berthold MR, Patterson DE (2005) Molecular fragment mining for drug discovery. Number 3571 in lecture notes in AI. Springer Verlag, pp 1002–1013Google Scholar
  5. Chen Y, Yang LH, Wang YG (2004) Incremental mining of frequent xml query patterns. In: Proceedings of 4th IEEE international conference on data mining (ICDM’04), Los Alamitos, CA, USA. IEEE Computer Society, pp 343–346Google Scholar
  6. Dehaspe L, Toivonen H (1999) Discovery of frequent datalog patterns. Data Min Knowl Discov 3(1): 7–36CrossRefGoogle Scholar
  7. Deshpande M, Kuramochi M, Karypis G (2002) Automated approaches for classifying structures. In: Proceedings of the 2002 workshop on data mining in bioinformatics (BIOKDD’02). Edmonton, Canada, pp 11–18Google Scholar
  8. Desrosiers C, Galinier P, Hansen P, Hertz A (2007) Sygma: reducing symmetry in graph mining. Technical Report G-2007-12, Les Cahiers du GERAD, December 2007Google Scholar
  9. Fortin S (1996) The graph isomorphism problem. Technical Report 96-20, University of Alberta, Edomonton, Alberta, CanadaGoogle Scholar
  10. Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Proceedings of ACM SIGKDD, ACM, pp 138–147Google Scholar
  11. Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co Ltd., New YorkMATHGoogle Scholar
  12. Garofalakis M, Rastogi R, Shim K (2002) Mining sequential patterns with regular expression constraints. IEEE Trans Knowl Data Eng 14(3): 530–552CrossRefGoogle Scholar
  13. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton J, Bernstein PA (eds) 2000 ACM SIGMOD international conference on management of data. ACM Press, pp 1–12Google Scholar
  14. Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM), pp 549–552Google Scholar
  15. Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proc of the 4th European conference on principles of data mining and knowledge discovery. Springer-Verlag, pp 13–23Google Scholar
  16. Kramer S, De Raedt L, Helma C (2001) Molecular feature mining in HIV data. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’01). ACM, New York, NY, USA, pp 136–143Google Scholar
  17. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the first IEEE conference on data mining, pp 313–320Google Scholar
  18. Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3): 243–271CrossRefMathSciNetGoogle Scholar
  19. Lee SD, De Raedt L (2004) Constraint based mining of first order sequences in seqlog. In: Database support for data mining application. Springer, pp 155–176Google Scholar
  20. Mannila H, Toivonen H (1997) Levelwise search and borders of theories in knowledge discovery. Data Min Knowl Discov 1(3): 241–258CrossRefGoogle Scholar
  21. McKay B (1981) Practical graph isomorphism. Congressus Numeratium 30: 45–87MathSciNetGoogle Scholar
  22. National Cancer Institute (NCI) (1999) DTP/2D and 3D structural information.
  23. Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: SIGMOD’98, pp 13–24Google Scholar
  24. Nijssen S, Kok JN (2001) Faster association rules for multiple relations. In: IJCAI, pp 891–896Google Scholar
  25. Nijssen S, Kok JN (2004) The gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools (Grabats 2004), October 2004. Elsevier, pp 281–285Google Scholar
  26. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th international conference on data engineering (ICDE ’01). IEEE Computer Society, Washington, DC, USA, pp 215–224Google Scholar
  27. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT ’96: proceedings of the 5th international conference on extending database technology, pp 3–17Google Scholar
  28. Sternberg MJE, King RD, Srinivasan A, Muggleton S (1995) Drug design by machine learning. In: Machine intelligence, vol 15, pp 328–338Google Scholar
  29. Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: Proc of int conf on data mining ICDM’02. Maebashi, Japan, pp 450–457Google Scholar
  30. Wang C, Hong M-S, Wang W, Shi B-L (2004) Chopper: efficient algorithm for tree mining. J Comput Sci Technol 19(3): 309–319CrossRefGoogle Scholar
  31. Wang C, Zhu Y, Wu T, Wang W, Shi B (2005) Constraint-based graph mining in large database. In: Proc of APWeb 2005, pp 133–144Google Scholar
  32. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM’02). IEEE Computer Society, Washington, DC, USA, pp 721–724Google Scholar
  33. Yan X, Yu P, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of 2004 ACM-SIGMOD international conference management of data (SIGMOD’04). Paris, France, pp 335–346Google Scholar
  34. Zaki MJ (2000) Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the 9th international conference on information and knowledge management, pp 422–429Google Scholar
  35. Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: KDD ’02: proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. 2002 ACM, New York, NY, USA, pp 71–80Google Scholar
  36. Zaki MJ, Nadimpally V, Bardhan D, Bystroff C (2004) Predicting protein folding pathways. Bioinformatics 20(1): 386–393CrossRefGoogle Scholar
  37. Zhu F, Yan X, Han J, Yu PS (2007) gPrune: a constraint pushing framework for graph pattern mining. In: Advances in knowledge discovery and data mining. PAKDD 2007, pp 388–400Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Christian Desrosiers
    • 1
  • Philippe Galinier
    • 2
  • Alain Hertz
    • 2
  • Pierre Hansen
    • 3
  1. 1.Ecole de Technologie SupérieureMontrealCanada
  2. 2.Ecole Polytechnique de MontréalMontrealCanada
  3. 3.HEC MontréalMontrealCanada

Personalised recommendations