Skip to main content

Effective Sampling for Mining Association Rules

  • Conference paper
AI 2004: Advances in Artificial Intelligence (AI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3339))

Included in the following conference series:

Abstract

As discovering association rules in a very large database is time consuming, researchers have developed many algorithms to improve the efficiency. Sampling can significantly reduce the cost of mining, since the mining algorithms need to deal with only a small dataset compared to the original database. Especially, if data comes as a stream flowing at a faster rate than can be processed, sampling seems to be the only choice. How to sample the data and how big the sample size should be for a given error bound and confidence level are key issues for particular data mining tasks. In this paper, we derive the sufficient sample size based on central limit theorem for sampling large datasets with replacement. This approach requires smaller sample size than that based on the Chernoff bounds and is effective for association rules mining. The effectiveness of the method has been evaluated on both dense and sparse datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Conference on Management of Data (1993)

    Google Scholar 

  2. Heikki Mannila, H.T., Verkamo, I.: Efficient Algorithms for Discovering Association Rules. In: AAAI Workshop on Knowledge Discovery in Databases, pp. 181–192 (1994)

    Google Scholar 

  3. Toivonen, H.: Sampling large databases for association rules. In: 22th International Conference on Very Large Databases, pp. 134–145 (1996)

    Google Scholar 

  4. Zaki, M.J., Parthasarathy, S., Li, W., Ogihara, M.: Evaluation of sampling for data mining of association rules. In: 7th International Workshop on Research Issues in Data Engineering High Performance Database Management for Large-Scale Applications, pp. 42–50 (1997)

    Google Scholar 

  5. Chen, B., Haas, P., Scheuermann, P.: A New Two Phase Sampling Based Algorithm for Discovering Association Rules. In: SIGKDD (2002)

    Google Scholar 

  6. Zhang, C., Zhang, S., Webb, G.I.: Identifying Approximate Itemsets of Interest in Large Databases. Applied Intelligence 18(1), 91–104

    Google Scholar 

  7. Parthasarathy, S.: Efficient Progressive Sampling for Association Rules. In: IEEE International Conference on Data Mining (2002)

    Google Scholar 

  8. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: 21st ACM Symposium on Principles of Database Systems (2002)

    Google Scholar 

  9. Borgelt, C.: Efficient Implementations of Apriori and Eclat. In: Workshop of Frequent Item Set Mining Implementations (2003)

    Google Scholar 

  10. Gopalan, R.P., Sucahyo, Y.G.: Fast Frequent Itemset Mining using Compressed Data Representation. In: IASTED International Conference on Databases and Applications (2003)

    Google Scholar 

  11. Thomson, S.K.: Sampling. John Wiley & Sons Inc, Chichester (1992)

    Google Scholar 

  12. Mendenhall, W., Sincich, T.: Statistics for Engineering and Sciences. Dellen Publishing Company, San Francisco (1992)

    MATH  Google Scholar 

  13. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: The 20th VLDB Conference (1994)

    Google Scholar 

  14. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, Irvine, CA, University of California, Department of Information and Computer Science (1998)

    Google Scholar 

  15. Frequent Itemset Mining Dataset Repository, http://fimi.cs.helsinki.fi/data/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Gopalan, R.P. (2004). Effective Sampling for Mining Association Rules. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30549-1_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24059-4

  • Online ISBN: 978-3-540-30549-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics