Supervised Evaluation of Top-k Itemset Mining Algorithms

Lucchese, Claudio; Orlando, Salvatore; Perego, Raffaele

doi:10.1007/978-3-319-22729-0_7

Claudio Lucchese¹⁵,
Salvatore Orlando¹⁶ &
Raffaele Perego¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

1700 Accesses
1 Citations

Abstract

A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description.

In this work, we review several greedy state-of-the-art algorithms, namely Asso, Hyper+, and PaNDa \(^{+}\), and propose a methodology to compare the patterns extracted. In evaluating the set of mined patterns, we aim at overcoming the usual assessment methodology, which only measures the given cost function to minimize. Thus, we evaluate how good are the models/patterns extracted in unveiling supervised knowledge on the data. To this end, we test algorithms and diverse cost functions on several datasets from the UCI repository. As contribution, we show that PaNDa \(^{+}\) performs best in the majority of the cases, since the classifiers built over the mined patterns used as dataset features are the most accurate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Miettinen, P., Mielikainen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE TKDE 20(10), 1348–1362 (2008)
Google Scholar
Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov. 23(2), 215–251 (2011)
Article MathSciNet MATH Google Scholar
Lucchese, C., Orlando, S., Perego, R.: Mining top-k patterns from binary datasets in presence of noise. In: SDM, pp. 165–176. SIAM (2010)
Google Scholar
Lucchese, C., Orlando, S., Perego, R.: A unifying framework for mining approximate top-k binary patterns. IEEE TKDE 26, 2900–2913 (2014)
Google Scholar
Cheng, H., Yu, P.S., Han, J.: AC-Close: efficiently mining approximate closed itemsets by core pattern recovery. In: Proceedings of ICDM, pp. 839–844. IEEE Computer Society (2006)
Google Scholar
Miettinen, P., Vreeken, J.: Model order selection for boolean matrix factorization. In: Proceedings of KDD, pp. 51–59. ACM (2011)
Google Scholar
Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In: Proceedings of KDD, pp. 758–766. ACM (2008)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)
Article MATH Google Scholar
Lucchese, C., Orlando, S., Perego, R.: A generative pattern model for mining binary datasets. In: SAC, pp. 1109–1110. ACM (2010)
Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell (2002)
Book Google Scholar
Cherkassky, V., Ma, Y.: Practical selection of svm parameters and noise estimation for SVM regression. Neural Netw. 17(1), 113–126 (2004)
Article MATH Google Scholar
Cheng, H., Yan, X., Han, J., wei Hsu, C.: Discriminative frequent pattern analysis for effective classification. In: Proceedings of ICDE, pp. 716–725 (2007)
Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of SIGIR, pp. 50–57. ACM (1999)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. John and Wiley, Chichester (2001)
Book Google Scholar
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 278–289. Springer, Heidelberg (2004)
Chapter Google Scholar
Gionis, A., Mannila, H., Seppänen, J.K.: Geometric and combinatorial tiles in 0–1 data. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 173–184. Springer, Heidelberg (2004)
Chapter Google Scholar
Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Article MathSciNet MATH Google Scholar
Kontonasios, K.N., Bie, T.D.: An information-theoretic approach to finding informative noisy tiles in binary databases. In: SDM, pp. 153–164. SIAM (2010)
Google Scholar
Tatti, N., Vreeken, J.: Comparing apples and oranges. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 398–413. Springer, Heidelberg (2011)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

ISTI-CNR, Pisa, Italy
Claudio Lucchese & Raffaele Perego
DAIS - Università Ca’ Foscari, Venice, Italy
Salvatore Orlando

Authors

Claudio Lucchese
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore Orlando
View author publications
You can also search for this author in PubMed Google Scholar
Raffaele Perego
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salvatore Orlando .

Editor information

Editors and Affiliations

University of Science and Technology, Rolla, Missouri, USA
Sanjay Madria
Osaka University, Osaka, Japan
Takahiro Hara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lucchese, C., Orlando, S., Perego, R. (2015). Supervised Evaluation of Top-k Itemset Mining Algorithms. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-22729-0_7
Published: 05 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics