Skip to main content

Learning What Matters – Sampling Interesting Patterns

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10234))

Included in the following conference series:

Abstract

In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account—but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals.

To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in (1) weighted sampling in SAT and (2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user’s interests.

We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality–diversity and exploitation–exploration when compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Source: https://dtai.cs.kuleuven.be/CP4IM/datasets/.

References

  1. Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Heidelberg (2014)

    MATH  Google Scholar 

  2. Bhuiyan, M., Hasan, M.A.: Interactive knowledge discovery from hidden data through sampling of frequent patterns. Stat. Anal. Data Mining: ASA Data Sci. J. 9(4), 205–229 (2016)

    Article  MathSciNet  Google Scholar 

  3. Bhuiyan, M., Hasan, M.A.: PRIIME: a generic framework for interactive personalized interesting pattern discovery. In: Proceedings of IEEE Big Data, pp. 606–615 (2016)

    Google Scholar 

  4. Boley, M., Gärtner, T., Grosskreutz, H.: Formal concept sampling for counting and threshold-free local pattern mining. In: Proceedings of SDM, pp. 177–188 (2010)

    Google Scholar 

  5. Boley, M., Grosskreutz, H.: Approximating the number of frequent sets in dense data. Knowl. Inf. Syst. 21(1), 65–89 (2009)

    Article  Google Scholar 

  6. Boley, M., Mampaey, M., Kang, B., Tokmakov, P., Wrobel, S.: One click mining - interactive local pattern discovery through implicit preference and performance learning. In: Workshop Proceedings of KDD, pp. 28–36 (2013)

    Google Scholar 

  7. Boley, M., Moens, S., Gärtner, T.: Linear space direct pattern sampling using coupling from the past. In: Proceedings of KDD, pp. 69–77 (2012)

    Google Scholar 

  8. Bringmann, B., Nijssen, S., Tatti, N., Vreeken, J., Zimmermann, A.: Mining sets of patterns. Tutorial at ECML/PKDD (2010)

    Google Scholar 

  9. Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS (LNAI), vol. 3848, pp. 64–80. Springer, Heidelberg (2006). doi:10.1007/11615576_4

    Chapter  Google Scholar 

  10. Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: On parallel scalable uniform SAT witness generation. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 304–319. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46681-0_25

    Google Scholar 

  11. Chakraborty, S., Fremont, D., Meel, K., Vardi, M.: Distribution-aware sampling and weighted model counting for SAT. In: Proceedings of AAAI, pp. 1722–1730 (2014)

    Google Scholar 

  12. Dzyuba, V., van Leeuwen, M., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 1460026 (2014)

    Article  Google Scholar 

  13. Dzyuba, V., van Leeuwen, M.: Learning what matters - sampling interesting patterns, March 2017. http://arxiv.org/abs/1702.01975

  14. Dzyuba, V., van Leeuwen, M., De Raedt, L.: Flexible constrained sampling with guarantees for pattern mining. In: Data Mining and Knowledge Discovery (in press). https://arxiv.org/abs/1610.09263

  15. Filippi, S., Cappé, O., Garivier, A., Szepesvári, C.: Parametric bandits: the generalized linear case. In: Proceedings of NIPS, pp. 586–594 (2010)

    Google Scholar 

  16. Hasan, M.A., Zaki, M.: Output space sampling for graph patterns. In: Proceedings of VLDB, pp. 730–741 (2009)

    Google Scholar 

  17. Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of KDD, pp. 133–142 (2002)

    Google Scholar 

  18. van Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_9

    Chapter  Google Scholar 

  19. Rueping, S.: Ranking interesting subgroups. In: Proceedings of ICML, pp. 913–920 (2009)

    Google Scholar 

  20. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)

    MathSciNet  MATH  Google Scholar 

  21. van Leeuwen, M., Ukkonen, A.: Discovering skylines of subgroup sets. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 272–287. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40994-3_18

    Chapter  Google Scholar 

  22. Xin, D., Shen, X., Mei, Q., Han, J.: Discovering interesting patterns through user’s interactive feedback. In: Proceedings of KDD, pp. 773–778 (2006)

    Google Scholar 

Download references

Acknowledgements

Vladimir Dzyuba is supported by FWO-Vlaanderen. The authors would like to thank the anonymous reviewers for their helpful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Dzyuba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dzyuba, V., van Leeuwen, M. (2017). Learning What Matters – Sampling Interesting Patterns. In: Kim, J., Shim, K., Cao, L., Lee, JG., Lin, X., Moon, YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science(), vol 10234. Springer, Cham. https://doi.org/10.1007/978-3-319-57454-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57454-7_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57453-0

  • Online ISBN: 978-3-319-57454-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics