Skip to main content

Mining Strongly Closed Itemsets from Data Streams

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10558))

Abstract

We consider the problem of mining strongly closed itemsets from transactional data streams. Compactness and stability against changes in the input are two characteristic features of this kind of itemsets that make them appealing for different applications. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. As a motivating application, we experimentally demonstrate the suitability of strongly closed itemsets to concept drift detection in transactional data streams.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Due to space limitations we omit frequency constraints in this short version.

  2. 2.

    We note that Hoeffding’s inequality applies to samples without replacement as well [5]. A tighter bound can be derived from Serfling’s inequality [12]. The improvement becomes however marginal with increasing data stream length.

  3. 3.

    http://fimi.ua.ac.be/data/.

  4. 4.

    We are going to present further practical applications (e.g., computer aided product configuration) in the long version of this paper.

References

  1. Boley, M., Horváth, T., Poigné, A., Wrobel, S.: Listing closed sets of strongly accessible set systems with applications to data mining. Theoret. Comput. Sci. 411(3), 691–700 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Boley, M., Horváth, T., Wrobel, S.: Efficient discovery of interesting patterns based on strong closedness. Stat. Anal. Data Mining 2(5–6), 346–360 (2009)

    Article  MathSciNet  Google Scholar 

  3. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)

    Article  MATH  Google Scholar 

  4. Gély, A.: A generic algorithm for generating closed sets of a binary relation. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS, vol. 3403, pp. 223–234. Springer, Heidelberg (2005). doi:10.1007/978-3-540-32262-7_15

    Chapter  Google Scholar 

  5. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  6. Iwanuma, K., Yamamoto, Y., Fukuda, S.: An on-line approximation algorithm for mining frequent closed itemsets based on incremental intersection. In: Proceedings of the 19th International Conference on Extending Database Technology, pp. 704–705 (2016)

    Google Scholar 

  7. Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1997)

    MATH  Google Scholar 

  8. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  9. Liu, X., Guan, J., Hu, P.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)

    Article  MATH  Google Scholar 

  10. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357. VLDB Endowment (2002)

    Google Scholar 

  11. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)

    Article  MATH  Google Scholar 

  12. Serfling, R.J.: Probability inequalities for the sum in sampling without replacement. Ann. Statist. 2(1), 39–48 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  13. Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  14. Yen, S.J., Wu, C.W., Lee, Y.S., Tseng, V.S., Hsieh, C.H.: A fast algorithm for mining frequent closed itemsets over stream sliding window. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 996–1002 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Trabold .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Trabold, D., Horváth, T. (2017). Mining Strongly Closed Itemsets from Data Streams. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds) Discovery Science. DS 2017. Lecture Notes in Computer Science(), vol 10558. Springer, Cham. https://doi.org/10.1007/978-3-319-67786-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67786-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67785-9

  • Online ISBN: 978-3-319-67786-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics