Effective Sampling for Mining Association Rules

Li, Yanrong; Gopalan, Raj P.

doi:10.1007/978-3-540-30549-1_35

Yanrong Li²⁰ &
Raj P. Gopalan²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

2643 Accesses
15 Citations

Abstract

As discovering association rules in a very large database is time consuming, researchers have developed many algorithms to improve the efficiency. Sampling can significantly reduce the cost of mining, since the mining algorithms need to deal with only a small dataset compared to the original database. Especially, if data comes as a stream flowing at a faster rate than can be processed, sampling seems to be the only choice. How to sample the data and how big the sample size should be for a given error bound and confidence level are key issues for particular data mining tasks. In this paper, we derive the sufficient sample size based on central limit theorem for sampling large datasets with replacement. This approach requires smaller sample size than that based on the Chernoff bounds and is effective for association rules mining. The effectiveness of the method has been evaluated on both dense and sparse datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Conference on Management of Data (1993)
Google Scholar
Heikki Mannila, H.T., Verkamo, I.: Efficient Algorithms for Discovering Association Rules. In: AAAI Workshop on Knowledge Discovery in Databases, pp. 181–192 (1994)
Google Scholar
Toivonen, H.: Sampling large databases for association rules. In: 22th International Conference on Very Large Databases, pp. 134–145 (1996)
Google Scholar
Zaki, M.J., Parthasarathy, S., Li, W., Ogihara, M.: Evaluation of sampling for data mining of association rules. In: 7th International Workshop on Research Issues in Data Engineering High Performance Database Management for Large-Scale Applications, pp. 42–50 (1997)
Google Scholar
Chen, B., Haas, P., Scheuermann, P.: A New Two Phase Sampling Based Algorithm for Discovering Association Rules. In: SIGKDD (2002)
Google Scholar
Zhang, C., Zhang, S., Webb, G.I.: Identifying Approximate Itemsets of Interest in Large Databases. Applied Intelligence 18(1), 91–104
Google Scholar
Parthasarathy, S.: Efficient Progressive Sampling for Association Rules. In: IEEE International Conference on Data Mining (2002)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: 21st ACM Symposium on Principles of Database Systems (2002)
Google Scholar
Borgelt, C.: Efficient Implementations of Apriori and Eclat. In: Workshop of Frequent Item Set Mining Implementations (2003)
Google Scholar
Gopalan, R.P., Sucahyo, Y.G.: Fast Frequent Itemset Mining using Compressed Data Representation. In: IASTED International Conference on Databases and Applications (2003)
Google Scholar
Thomson, S.K.: Sampling. John Wiley & Sons Inc, Chichester (1992)
Google Scholar
Mendenhall, W., Sincich, T.: Statistics for Engineering and Sciences. Dellen Publishing Company, San Francisco (1992)
MATH Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: The 20th VLDB Conference (1994)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, Irvine, CA, University of California, Department of Information and Computer Science (1998)
Google Scholar
Frequent Itemset Mining Dataset Repository, http://fimi.cs.helsinki.fi/data/

Download references

Author information

Authors and Affiliations

Department of Computing, Curtin University of Technology, Kent street, Bentley, Western Australia, 6102
Yanrong Li & Raj P. Gopalan

Authors

Yanrong Li
View author publications
You can also search for this author in PubMed Google Scholar
Raj P. Gopalan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Information Technology, Monash University, VIC 3800, Australia
Geoffrey I. Webb
Science, Engineering and Technology Portfolio, Royal Melbourne Institute of Technology, VIC 3001, Melbourne, Australia
Xinghuo Yu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Gopalan, R.P. (2004). Effective Sampling for Mining Association Rules. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-540-30549-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics