Skip to main content

Incremental Frequent Itemsets Mining with MapReduce

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10509))

Included in the following conference series:

  • 1105 Accesses

Abstract

Frequent itemsets mining is a common task in data mining. Since sizes of today’s databases go far beyond capabilities of a single machine, recent studies show how to adopt classical algorithms for frequent itemsets mining for parallel frameworks such as MapReduce. Even then, in case of a slight database update a re-run of the MapReduce mining algorithm from the beginning on the whole data set is required and could be far from optimal. Thus, a variation of these algorithms for incremental database update is desired.

The current paper presents a general algorithm for incremental frequent itemsets mining and shows how to adapt it to the parallel paradigm. It also provides optimizations that are unique to a constrained model of MapReduce for an effective algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 99–110 (2010)

    Google Scholar 

  2. Afrati, F.N., Ullman, J.D.: Optimizing multiway joins in a map-reduce environment. IEEE Trans. Knowl. Data Eng. 23(9), 1282–1298 (2011)

    Article  Google Scholar 

  3. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)

    Article  Google Scholar 

  4. Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Trans. Knowl. Data Eng. 8, 962–969 (1996)

    Article  Google Scholar 

  5. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very large data bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  6. Agrawal, R., Srikant, R.: Quest Synthetic Data Generator. IBM Almaden Research Center, San Jose, California. http://www.almaden.ibm.com/cs/quest/syndata.html

  7. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 975–986 (2010)

    Google Scholar 

  8. Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquin, R.: Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, p. 7. ACM (2011)

    Google Scholar 

  9. Cheung, D.W., Han, J., Ng, V.T., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the Twelfth International Conference on Data Engineering, 1996, pp. 106–114. IEEE (1996)

    Google Scholar 

  10. Das, A., Bhattacharyya, D.K.: Rule mining for dynamic databases. In: Sen, A., Das, N., Das, S.K., Sinha, B.P. (eds.) IWDC 2004. LNCS, vol. 3326, pp. 46–51. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30536-1_6

    Chapter  Google Scholar 

  11. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  12. Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in MapReduce. Int. J. Very Large Data Bases 23(3), 355–380 (2014)

    Article  Google Scholar 

  13. Duaimi, I.G., Salman, A.: Association rules mining for incremental database. Int. J. Adv. Res. Comput. Sci. Technol. 2, 346–352 (2014)

    Google Scholar 

  14. Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM (2010)

    Google Scholar 

  15. Farzanyar, Z., Cercone, N.: Efficient mining of frequent itemsets in social network data based on MapReduce framework. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1183–1188. ACM (2013)

    Google Scholar 

  16. Li, N., Zeng, L., He, Q., Shi, Z.: Parallel implementation of apriori algorithm based on MapReduce. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing, pp. 236–241. IEEE (2012)

    Google Scholar 

  17. Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: WebDocs: a real-life huge transactional dataset. In: FIMI, vol. 126 (2004)

    Google Scholar 

  18. Popa, L., Budiu, M., Yu, Y., Isard, M.: DryadInc: reusing work in large-scale computations. In: USENIX workshop on Hot Topics in Cloud Computing (2009)

    Google Scholar 

  19. Thomas, S., Bodagala, S., Alsabti, K., Ranka, S.: An efficient algorithm for the incremental updation of association rules in large databases. In: KDD, pp. 263–266 (1997)

    Google Scholar 

  20. Woo, J.: Apriori-map/reduce algorithm. In: The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2012)

    Google Scholar 

  21. Yahya, O., Hegazy, O., Ezat, E.: An efficient implementation of Apriori algorithm based on Hadoop-Mapreduce model. Int. J. Rev. Comput. 12, 59–67 (2012)

    Google Scholar 

  22. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, vol. 97, pp. 283–286 (1997)

    Google Scholar 

  23. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29(2), 1–12 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kirill Kandalov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kandalov, K., Gudes, E. (2017). Incremental Frequent Itemsets Mining with MapReduce. In: Kirikova, M., Nørvåg, K., Papadopoulos, G. (eds) Advances in Databases and Information Systems. ADBIS 2017. Lecture Notes in Computer Science(), vol 10509. Springer, Cham. https://doi.org/10.1007/978-3-319-66917-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66917-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66916-8

  • Online ISBN: 978-3-319-66917-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics