Abstract
The output of boolean association rule mining algorithms is often too large for manual examination. For dense datasets, it is often impractical to even generate all frequent itemsets. The closed itemset approach handles this information overload by pruning “uninteresting” rules following the observation that most rules can be derived from other rules. In this paper, we propose a new framework, namely, the generalized closed (or g-closed) itemset framework. By allowing for a small tolerance in the accuracy of itemset supports, we show that the number of such redundant rules is far more than what was previously estimated. Our scheme can be integrated into both levelwise algorithms (Apriori) and two-pass algorithms (ARMOR). We evaluate its performance by measuring the reduction in output size as well as in response time. Our experiments show that incorporating g-closed itemsets provides significant performance improvements on a variety of databases.
A poster of this paper appeared in Proc. of IEEE Intl. Conf. on Data Engineering (ICDE), March 2003, Bangalore, India.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Aggarwal and P. Yu. Online generation of association rules. In Intl. Conf. on Data Engineering (ICDE), February 1998.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of Intl. Conf. on Very Large Databases (VLDB), September 1994.
R. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In Intl. Conf. on Data Engineering (ICDE), February 1999.
J-F. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary data mining. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), April 2000.
J-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), September 2000.
G. Dong and J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 1998.
C. Hidber. Online association rule mining. In Proc. of ACM SIGMOD Intl. Conf. on Management of Data, June 1999.
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Intl. Conf. on Information and Knowledge Management (CIKM), November 1994.
B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered association rules. In Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1999.
G. Manku and R. Motwani. Approximate frequency counts over streaming data. In Proc. of Intl. Conf. on Very Large Databases (VLDB), August 2002.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. of Intl. Conference on Database Theory (ICDT), January 1999.
J. Pei et al. H-mine: Hyper-structure mining of frequent patterns in large databases. In Intl. Conf. on Data Mining (ICDM), December 2001.
V. Pudi and J. Haritsa. Generalized closed itemsets: Improving the conciseness of rule covers. Technical Report TR-2002-02, DSL, Indian Institute of Science, 2002.
V. Pudi and J. Haritsa. On the efficiency of association-rule mining algorithms. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2002.
R. Taouil, N. Pasquier, Y. Bastide, and L. Lakhal. Mining basis for association rules using closed sets. In Intl. Conf. on Data Engineering (ICDE), February 2000.
M. J. Zaki. Generating non-redundant association rules. In Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 2000.
M. J. Zaki and C. Hsiao. Charm: An efficient algorithm for closed itemset mining. In SIAM International Conference on Data Mining, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pudi, V., Haritsa, J.R. (2003). Reducing Rule Covers with Deterministic Error Bounds. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_31
Download citation
DOI: https://doi.org/10.1007/3-540-36175-8_31
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-04760-5
Online ISBN: 978-3-540-36175-6
eBook Packages: Springer Book Archive