TopPI: An Efficient Algorithm for Item-Centric Mining

Kirchgessner, Martin; Leroy, Vincent; Termier, Alexandre; Amer-Yahia, Sihem; Rousset, Marie-Christine

doi:10.1007/978-3-319-43946-4_2

Martin Kirchgessner¹⁵,
Vincent Leroy¹⁵,
Alexandre Termier¹⁶,
Sihem Amer-Yahia¹⁵ &
…
Marie-Christine Rousset¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9829))

Included in the following conference series:

International Conference on Big Data Analytics and Knowledge Discovery

Abstract

We introduce TopPI, a new semantics and algorithm designed to mine long-tailed datasets. For each item, and regardless of its frequency, TopPI finds the k most frequent closed itemsets that item belongs to. For example, in our retail dataset, TopPI finds the itemset “nori seaweed, wasabi, sushi rice, soy sauce” that occurrs in only 133 store receipts out of 290 million. It also finds the itemset “milk, puff pastry”, that appears 152,991 times. Thanks to a dynamic threshold adjustment and an adequate pruning strategy, TopPI efficiently traverses the relevant parts of the search space and can be parallelized on multi-cores. Our experiments on datasets with different characteristics show the high performance of TopPI and its superiority when compared to state-of-the-art mining algorithms. We show experimentally on real datasets that TopPI allows the analyst to explore and discover valuable itemsets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pp. 487–499 (1994)
Google Scholar
Anderson, C.: The Long Tail: Why the Future of Business Is Selling Less of More. Hyperion, New York (2006)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI) (2004)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the Symposium on Principles of Database Systems (PODS) (2001)
Google Scholar
Goel, S., Broder, A., Gabrilovich, E., Pang, B.: Anatomy of the long tail: ordinary people with extraordinary tastes. In: Proceedings of the Third International Conference on Web Search and Data Mining (WSDM), pp. 201–210 (2010)
Google Scholar
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top-k frequent closed patterns without minimum support. In: Proceedings of the International Conference on Data Mining (ICDM), pp. 211–218. IEEE (2002)
Google Scholar
Kirchgessner, M., Mishra, S., Leroy, V., Amer-Yahia, S.: Testing interestingness measures in practice: a large-scale analysis of buying patterns (2016). http://arxiv.org/abs/1603.04792
Le Bras, Y., Lenca, P., Lallich, S.: Mining interesting rules without support requirement: a general universal existential upward closure property. In: Stahlbock, R., Crone, S.F., Lessmann, S. (eds.) Data Mining. Annals of Information Systems, vol. 8, pp. 75–98. Springer, New York (2010)
Chapter Google Scholar
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the Second Conference on Recommender Systems (RecSys), pp. 107–114 (2008)
Google Scholar
Minato, S., Uno, T., Tsuda, K., Terada, A., Sese, J.: A fast method of statistical assessment for combinatorial hypotheses based on frequent itemset enumeration. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part II. LNCS, vol. 8725, pp. 422–436. Springer, Heidelberg (2014)
Google Scholar
Négrevergne, B., Termier, A., Méhaut, J.F., Uno, T.: Discovering closedfrequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: Proceedings of the International Conference on High Performance Computing and Simulation (HPCS). pp. 521–528 (2010)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Chapter Google Scholar
Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, vol. 4, pp. 21–30 (2000)
Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: An efficient algorithm for enumerating closed patterns in transaction databases. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 16–31. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Acknowledgments

This work was partially funded by the Datalyse PIA project.

Author information

Authors and Affiliations

Université Grenoble Alpes, LIG, CNRS, Grenoble, France
Martin Kirchgessner, Vincent Leroy, Sihem Amer-Yahia & Marie-Christine Rousset
Université Rennes 1, INRIA/IRISA, Rennes, France
Alexandre Termier

Authors

Martin Kirchgessner
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Leroy
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Termier
View author publications
You can also search for this author in PubMed Google Scholar
Sihem Amer-Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Christine Rousset
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Kirchgessner .

Editor information

Editors and Affiliations

University of Science and Technology , Rolla, Missouri, USA
Sanjay Madria
Osaka University , Osaka, Japan
Takahiro Hara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kirchgessner, M., Leroy, V., Termier, A., Amer-Yahia, S., Rousset, MC. (2016). TopPI: An Efficient Algorithm for Item-Centric Mining. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2016. Lecture Notes in Computer Science(), vol 9829. Springer, Cham. https://doi.org/10.1007/978-3-319-43946-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-43946-4_2
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43945-7
Online ISBN: 978-3-319-43946-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics