Incremental Frequent Itemsets Mining with MapReduce

Kandalov, Kirill; Gudes, Ehud

doi:10.1007/978-3-319-66917-5_17

Kirill Kandalov¹⁶ &
Ehud Gudes^16,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10509))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

1105 Accesses

Abstract

Frequent itemsets mining is a common task in data mining. Since sizes of today’s databases go far beyond capabilities of a single machine, recent studies show how to adopt classical algorithms for frequent itemsets mining for parallel frameworks such as MapReduce. Even then, in case of a slight database update a re-run of the MapReduce mining algorithm from the beginning on the whole data set is required and could be far from optimal. Thus, a variation of these algorithms for incremental database update is desired.

The current paper presents a general algorithm for incremental frequent itemsets mining and shows how to adapt it to the parallel paradigm. It also provides optimizations that are unique to a constrained model of MapReduce for an effective algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)
Google Scholar
Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)
Article Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Article Google Scholar
Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very large data bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Agrawal, R., Srikant, R.: Quest Synthetic Data Generator. IBM Almaden Research Center, San Jose, California. http://www.almaden.ibm.com/cs/quest/syndata.html
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)
Google Scholar
Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquin, R.: Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 7. ACM (2011)
Google Scholar
Cheung, D.W., Han, J., Ng, V.T., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the Twelfth International Conference on Data Engineering, 1996, pp. 106–114. IEEE (1996)
Google Scholar
Das, A., Bhattacharyya, D.K.: Rule mining for dynamic databases. In: Sen, A., Das, N., Das, S.K., Sinha, B.P. (eds.) IWDC 2004. LNCS, vol. 3326, pp. 46–51. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30536-1_6
Chapter Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in MapReduce. Int. J. Very Large Data Bases 23(3), 355–380 (2014)
Article Google Scholar
Duaimi, I.G., Salman, A.: Association rules mining for incremental database. Int. J. Adv. Res. Comput. Sci. Technol. 2, 346–352 (2014)
Google Scholar
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM (2010)
Google Scholar
Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1183–1188. ACM (2013)
Google Scholar
Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on MapReduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing, pp. 236–241. IEEE (2012)
Google Scholar
Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: WebDocs: a real-life huge transactional dataset. In: FIMI, vol. 126 (2004)
Google Scholar
Popa, L., Budiu, M., Yu, Y., Isard, M.: DryadInc: reusing work in large-scale computations. In: USENIX workshop on Hot Topics in Cloud Computing (2009)
Google Scholar
Thomas, S., Bodagala, S., Alsabti, K., Ranka, S.: An efficient algorithm for the incremental updation of association rules in large databases. In: KDD, pp. 263–266 (1997)
Google Scholar
Woo, J.: Apriori-map/reduce algorithm. In: The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2012)
Google Scholar
Yahya, O., Hegazy, O., Ezat, E.: An efficient implementation of Apriori algorithm based on Hadoop-Mapreduce model. Int. J. Rev. Comput. 12, 59–67 (2012)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, vol. 97, pp. 283–286 (1997)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29(2), 1–12 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Open University, Ra’anana, Israel
Kirill Kandalov & Ehud Gudes
Ben-Gurion University, Beer-Sheva, Israel
Ehud Gudes

Authors

Kirill Kandalov
View author publications
You can also search for this author in PubMed Google Scholar
Ehud Gudes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kirill Kandalov .

Editor information

Editors and Affiliations

Riga Technical University , Riga, Latvia
Mārīte Kirikova
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Cyprus , Nicosia, Cyprus
George A. Papadopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kandalov, K., Gudes, E. (2017). Incremental Frequent Itemsets Mining with MapReduce. In: Kirikova, M., Nørvåg, K., Papadopoulos, G. (eds) Advances in Databases and Information Systems. ADBIS 2017. Lecture Notes in Computer Science(), vol 10509. Springer, Cham. https://doi.org/10.1007/978-3-319-66917-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-66917-5_17
Published: 25 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66916-8
Online ISBN: 978-3-319-66917-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics