Abstract
We present a new approach that provides the simplest rules characterizing classes with respect to their left-hand sides. This approach is based on a condensed representation (δ-free sets) of data which is efficiently computed. Produced rules have a minimal body (i.e. any subset of the left-hand side of a rule does not enable to conclude on the same class value). We show a sensible sufficient condition that avoids important classification conflicts. Experiments show that the number of rules characterizing classes drastically decreases. The technique is operational for large data sets and can be used even in the difficult context of highly-correlated data where other algorithms fail.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R. and Imielinski, T. and Swami, A. Mining association rules between sets of items in large databases, In Proceedings SIGMOD’93, ACM Press, pp. 207–216, 1993.
Ali, K. and Manganaris, S. and Srikant, R. Partial classification using association rules, In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, KDD’97, AAAI Press, pp. 115–118, 1997.
Bayardo, R.J., Brute-force mining of high-confidence classification rules, In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, KDD’97, AAAI Press, pp. 123–126, 1997.
Bayardo, R.J. and Agrawal, R. and Gunopulos, D. Constraint-based rule mining in large, dense database, In Proceedings ICDE’99, pp. 188–197, 1999.
Boulicaut, J.F. and Bykowski, A. Frequent closures as a concise representation for binary data mining, In Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD’00, LNAI 1805, Springer-Verlag, pp. 62 – 73, Kyoto, Japan, 2000.
Boulicaut, J.F. and Bykowski, A. and Rigotti, C. Approximation of frequency queries by means of free-sets, In Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD’00, LNAI 1910, Springer-Verlag pp. 75–85, Lyon, France, 2000.
Boulicaut, J.F. and Crémilleux, B. Delta-strong classification rules for characterizing chemical carcinogens, In Proceedings of the Predictive Toxicology Challenge for 2000-2001 co-located with PKDD’01, Freiburg, Germany, 2001.
Boulicaut, J.F. and Crémilleux, B. Delta-strong classification rules for predicting collagen diseases, In Discovery Challenge on Thrombosis Data for 2000-2001 co-located with PKDD’01, pp. 29 – 38, Freiburg, Germany, 2001.
Freitas, A.A. Understanding the crucial differences between classification and discovery of association rules - a position paper, In SIGKDD Explorations, Vol. 2(l), pp. 65–69, 2000.
Helma, C. and Gottmann, E. and Kramer, S. Knowledge Discovery and data mining in toxicology Technical Report, University of Freiburg, 2000.
Jovanoski, V. and Lavrac, N. Classification Rule with Apriori-C, In Proceedings of the Integrating Aspects of Data Mining, Decision Support and Meta Learning workshop, co-located with PKDD’01, 81 – 92, Freiburg, Germany, 2001.
King, R.D. and Feng, C. and Sutherland, A. Statlog: Comparison of classification algorithms on large real-world problems, In Applied Artificial Intelligence, 1995.
Liu, B. and Hsu, W. and Ma, Y. Integrating classification and association rules mining, In Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining, KKDD’98, AAAI Press, pp. 80–86, 1998.
Li, W. and Han, J. and Pei, J. CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules, In Proceedings of the IEEE International Conference on Data Mining, ICDM’01, San Jose, California, 2001.
Liu, B. and Ma, Y. and Wong, C. K. Classification using association rules: weaknesses and enhancements, In Data mining for scientific applications, Kumar, V. et al (eds), pp. 1–11, 2001.
Li, J. and Shen, H. and Topor, R. Mining the Smallest Association Rule Set for Predictions, In Proceedings of the IEEE International Conference on Data Mining, ICDM’01, San Jose, California, 2001.
Mannila, H. and Toivonen, H. Levelwise search and borders of theories in knowledge discovery In Data Mining and Knowledge Discovery, vol. 3(1), pp. 241–258, 1997.
Mannila, H. and Toivonen, H. Multiple uses of frequent sets and condensed representations, In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, pp. 189 – 194, Portland, Oregon, 1996.
Pasquier, N. and Bastide, Y. and Taouil, R and Lakhal, L. Efficient mining of association rules using closed itemset lattices. In Information Systems 24(1), pp. 25–46. 1999.
Quinlan, J.R. C4.5 Programs for machine learning Morgan Kaufmann, San Mateo, Californie, 1993.
Schaffer, C. Overfitting avoidance as bias, In Machine Learning, vol. 10, pp. 153–178, 1993.
Toivonen, H. Sampling large databases for association rules, In Proceedings of the 22nd International Conference on Very Large Databases, VLDB’96, Morgan Kaufmann, pp. 134–145, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag London Limited
About this paper
Cite this paper
Crémilleux, B., Boulicaut, JF. (2003). Simplest Rules Characterizing Classes Generated by δ-Free Sets. In: Bramer, M., Preece, A., Coenen, F. (eds) Research and Development in Intelligent Systems XIX. Springer, London. https://doi.org/10.1007/978-1-4471-0651-7_3
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0651-7_3
Publisher Name: Springer, London
Print ISBN: 978-1-85233-674-5
Online ISBN: 978-1-4471-0651-7
eBook Packages: Springer Book Archive