Skip to main content

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 100))

Abstract

Big data mining is modern scientific research, which is used by all data related fields such as communication, computer, biology, geographical science, and so on. Basically, big data is related to volume, variety, velocity, variability, value, veracity, and visualization. Data mining technique is related to extract needed information, knowledge and hidden pattern, relations from large datasets with the heterogeneous format of data, which is collected by multiple sources. Data mining have classification, clustering, and association techniques for big data mining. Clustering is one of the approaches for mining, which is used for mine similar types of data, hidden patterns, and related data. All traditional clustering data mining approaches, such as partition, hierarchical, density, grid, and model-based algorithm, works on only high volume or high variety or high velocity. If we Apply the traditional clustering algorithms for big data mining then these algorithms will not work in the proper manner, and they need such clustering algorithms that work under high volume, high variety and high velocity. This paper presents the introduction to big data, big data mining, and traditional clustering algorithms concepts. From a theoretical, practical, and existing research perspective, this paper categorized clustering framework based on volume (dataset size, dimensional data), variety (dataset type, cluster shape), and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this MapReduce clustering framework with the help of K-means algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chen M, Mao S, Liu Y (2014) Big data a survey. Mob Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0

    Article  Google Scholar 

  2. Rouhani S, Robbie S, Hamidi H (2017) What do we know about the big data researches? A systematic review from 2011 to 2017. J Decis Syst 26(4):368–393. https://doi.org/10.1080/12460125.2018.1437654

    Article  Google Scholar 

  3. Sivarajah U, Kamal MM (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286. https://doi.org/10.1016/j.jbusres.2016.08.001

    Article  Google Scholar 

  4. Gole S, Tidke B (2015) A survey of Big Data in social media using data mining techniques. Proc IEEE ICACCS. https://doi.org/10.1109/ICACCS.2015.7324059

    Article  Google Scholar 

  5. Gandomi A, Haider M (2015) Beyond the hype: Big Data concepts methods and analytics. Int J Inf Manag 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007

    Article  Google Scholar 

  6. Wasastjerna MC (2018) The role of big data and digital privacy in merger review. Eur Compet J 14(2–3):417–444. https://doi.org/10.1080/17441056.2018.1533364

    Article  Google Scholar 

  7. Pandey KK (2018) Mining on relationship in big data era using Apriori algorithm. In: Proceedings of NCDAMLS, pp 55–60. ISBN: 978-93-5291-457-9

    Google Scholar 

  8. Che D, Safran M, Peng Z (2013) From big data to big data mining challenges issues and opportunities. LNCS, vol 7827, pp 1–12. https://doi.org/10.1007/978-3-642-40270-8_1

    Google Scholar 

  9. Li N, Zeng L, Qing H, Zhongzhi S (2017) Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of 13th IEEE ACIS international conference on SEAIPDC. https://doi.org/10.1109/snpd.2012.31

  10. Elgendy N, Elragal A (2014) Big data analytics a literature review paper. LNAI, vol 8557, pp 214–227. https://doi.org/10.1007/978-3-319-08976-8_16

    Chapter  Google Scholar 

  11. Ozkose H, Ari ES, Gencer C (2015) Yesterday, today and tomorrow of big data. Proc Soc Behav Sci 195:1042–1050. https://doi.org/10.1016/j.sbspro.2015.06.147

    Article  Google Scholar 

  12. Apiletti D, Baralis E, Pulvirenti F, Cerquitelli T, Garza P, Venturini L (2017) Frequent itemsets mining for big data: a comparative analysis. Big Data Res 9:67–83. https://doi.org/10.1016/j.bdr.2017.06.006

    Article  Google Scholar 

  13. Jain AK, Murty MN, Flynn PJ (1999) Data clustering a review. ACM Comput Surv 31(3):264–323. https://doi.org/10.1145/331499.331504

    Article  Google Scholar 

  14. Nagpal A, Jatain A, Gaur D (2013) Review based on data clustering algorithms. In: Proceedings of IEEE ICT, pp 298–303. https://doi.org/10.1109/cict.2013.6558109

  15. Berkhin P (2006) A survey of clustering data mining techniques. In: Teboulle M (eds) Group Multidimens Data 25–71. https://doi.org/10.1007/3-540-28349-8_2

  16. Mann AK, Kaur NB (2013) Review paper on clustering techniques. Global J Comp Sci Tech Soft Data Eng 13(5)

    Google Scholar 

  17. Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T (2014) Big data clustering: a review. LNCS, vol 8583, pp 707–720. https://doi.org/10.1007/978-3-319-09156-3_49

    Chapter  Google Scholar 

  18. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193. https://doi.org/10.1007/s40745-015-0040-1

    Article  MathSciNet  Google Scholar 

  19. Oyelade J, Aromolaran O, Itaewon I, Uwoghiren E, Oladipupo F, Ameh F, Adebiyi E, Achas M (2016) Clustering algorithms their application to gene expression data. Bioinf Biol Insights 10:237–253. https://doi.org/10.4137/BBI.S38316

    Article  Google Scholar 

  20. Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/tetc.2014.2330519

    Article  Google Scholar 

  21. Pandove D, G.S.: A comprehensive study on clustering approaches for big data mining. In: IEEE 2nd ICECS, pp 1333–1338. https://doi.org/10.1109/ecs.2015.7124801

  22. Sardar TH, Ansari Z (2018) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Fut Comput Inf J 3(2):247–261. https://doi.org/10.1016/j.fcij.2018.06.002

    Article  Google Scholar 

  23. Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th BSMSP, vol 1, pp 281–297

    Google Scholar 

  24. Sinha A, Jana PK (2018) A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets. J Supercomput 74(4):1562–1579. https://doi.org/10.1007/s11227-017-2182-8

    Article  Google Scholar 

  25. Berard A, Hebrail G (2013) Searching time series with hadoop in an electric power company. In: Proceedings of BDSHSMASPMA, pp 15–22. https://doi.org/10.1145/2501221.2501224

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamlesh Kumar Pandey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pandey, K.K., Shukla, D., Milan, R. (2020). A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability. In: Shukla, R., Agrawal, J., Sharma, S., Chaudhari, N., Shukla, K. (eds) Social Networking and Computational Intelligence. Lecture Notes in Networks and Systems, vol 100. Springer, Singapore. https://doi.org/10.1007/978-981-15-2071-6_34

Download citation

Publish with us

Policies and ethics