Skip to main content

Providing Concise Database Covers Instantly by Recursive Tile Sampling

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8777))

Abstract

Known pattern discovery algorithms for finding tilings (covers of 0/1-databases consisting of 1-rectangles) cannot be integrated in instant and interactive KD tools, because they do not satisfy at least one of two key requirements: a) to provide results within a short response time of only a few seconds and b) to return a concise set of patterns with only a few elements that nevertheless covers a large fraction of the input database. In this paper we present a novel randomized algorithm that works well under these requirements. It is based on the recursive application of a simple tile sample procedure that can be implemented efficiently using rejection sampling. While, as we analyse, the theoretical solution distribution can be weak in the worst case, the approach performs very well in practice and outperforms previous sampling as well as deterministic algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Uci machine learning repository, http://archive.ics.uci.edu/ml/

  2. Frequent itemset mining dataset repository (2004), http://fimi.ua.ac.be/data

  3. Al Hasan, M., Zaki, M.J.: Output space sampling for graph patterns. In: Proc. VLDB Endow, pp. 730–741 (2009)

    Google Scholar 

  4. Blumenstock, A., Hipp, J., Kempe, S., Lanquillon, C., Wirth, R.: Interactivity closes the gap. In: Proc. of the KDD Workshop on Data Min. for Business Applications, Philadelphia, USA (2006)

    Google Scholar 

  5. Boley, M.: The Efficient Discovery of Interesting Closed Pattern Collections. PhD thesis (2011)

    Google Scholar 

  6. Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two–step random procedures. In: Proc. ACM SIGKDD (2011)

    Google Scholar 

  7. Boley, M., Mampaey, M., Kang, B., Tokmakov, P., Wrobel, S.: One click mining: Interactive local pattern discovery through implicit preference and performance learning. In: IDEA 2013 Workshop in Proc. ACM SIGKDD, pp. 27–35. ACM (2013)

    Google Scholar 

  8. Boley, M., Moens, S., Gärtner, T.: Linear space direct pattern sampling using coupling from the past. In: Proc. ACM SIGKDD, pp. 69–77. ACM (2012)

    Google Scholar 

  9. Dzyuba, V., van Leeuwen, M.: Interactive discovery of interesting subgroup sets., pp. 150–161 (2013)

    Google Scholar 

  10. Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Goethals, B., Moens, S., Vreeken, J.: Mime: a framework for interactive visual pattern mining. In: Proc. ACM SIGKDD, pp. 757–760. ACM (2011)

    Google Scholar 

  12. Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with cn2-sd. J. Mach. Learn. Res, 153–188 (2004)

    Google Scholar 

  13. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. on Knowl. and Data Eng., 1348–1362 (2008)

    Google Scholar 

  14. Moens, S., Goethals, B.: Randomly sampling maximal itemsets. In: IDEA 2013 Workshop in Proc. ACM SIGKDD (2013)

    Google Scholar 

  15. Neal, R.M.: Slice sampling. In: Ann. Statist., pp. 705–767 (2003)

    Google Scholar 

  16. Ng, R.T., Lakshmanan, L.V., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. ACM SIGMOD Record, 13–24 (1998)

    Google Scholar 

  17. van Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  18. Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: Mining itemsets that compress. Data Min. Knowl. Discov., 169–214 (2011)

    Google Scholar 

  19. Škrabal, R., Šimůnek, M., Vojíř, S., Hazucha, A., Marek, T., Chudán, D., Kliegr, T.: Association rule mining following the web search paradigm. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 808–811. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  20. Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov, 215–251 (2011)

    Google Scholar 

  21. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Discov., 343–373 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Moens, S., Boley, M., Goethals, B. (2014). Providing Concise Database Covers Instantly by Recursive Tile Sampling. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds) Discovery Science. DS 2014. Lecture Notes in Computer Science(), vol 8777. Springer, Cham. https://doi.org/10.1007/978-3-319-11812-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11812-3_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11811-6

  • Online ISBN: 978-3-319-11812-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics