A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Pandey, Kamlesh Kumar; Shukla, Diwakar; Milan, Ram

doi:10.1007/978-981-15-2071-6_34

Kamlesh Kumar Pandey¹⁴,
Diwakar Shukla¹⁴ &
Ram Milan¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 100))

706 Accesses
3 Citations

Abstract

Big data mining is modern scientific research, which is used by all data related fields such as communication, computer, biology, geographical science, and so on. Basically, big data is related to volume, variety, velocity, variability, value, veracity, and visualization. Data mining technique is related to extract needed information, knowledge and hidden pattern, relations from large datasets with the heterogeneous format of data, which is collected by multiple sources. Data mining have classification, clustering, and association techniques for big data mining. Clustering is one of the approaches for mining, which is used for mine similar types of data, hidden patterns, and related data. All traditional clustering data mining approaches, such as partition, hierarchical, density, grid, and model-based algorithm, works on only high volume or high variety or high velocity. If we Apply the traditional clustering algorithms for big data mining then these algorithms will not work in the proper manner, and they need such clustering algorithms that work under high volume, high variety and high velocity. This paper presents the introduction to big data, big data mining, and traditional clustering algorithms concepts. From a theoretical, practical, and existing research perspective, this paper categorized clustering framework based on volume (dataset size, dimensional data), variety (dataset type, cluster shape), and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this MapReduce clustering framework with the help of K-means algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Big Data Clustering Using MapReduce Framework: A Review

Optimized big data K-means clustering using MapReduce

Article 19 June 2014

An Optimized K-means Clustering Approach on Top of MapReduce

References

Chen M, Mao S, Liu Y (2014) Big data a survey. Mob Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0
Article Google Scholar
Rouhani S, Robbie S, Hamidi H (2017) What do we know about the big data researches? A systematic review from 2011 to 2017. J Decis Syst 26(4):368–393. https://doi.org/10.1080/12460125.2018.1437654
Article Google Scholar
Sivarajah U, Kamal MM (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286. https://doi.org/10.1016/j.jbusres.2016.08.001
Article Google Scholar
Gole S, Tidke B (2015) A survey of Big Data in social media using data mining techniques. Proc IEEE ICACCS. https://doi.org/10.1109/ICACCS.2015.7324059
Article Google Scholar
Gandomi A, Haider M (2015) Beyond the hype: Big Data concepts methods and analytics. Int J Inf Manag 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
Article Google Scholar
Wasastjerna MC (2018) The role of big data and digital privacy in merger review. Eur Compet J 14(2–3):417–444. https://doi.org/10.1080/17441056.2018.1533364
Article Google Scholar
Pandey KK (2018) Mining on relationship in big data era using Apriori algorithm. In: Proceedings of NCDAMLS, pp 55–60. ISBN: 978-93-5291-457-9
Google Scholar
Che D, Safran M, Peng Z (2013) From big data to big data mining challenges issues and opportunities. LNCS, vol 7827, pp 1–12. https://doi.org/10.1007/978-3-642-40270-8_1
Google Scholar
Li N, Zeng L, Qing H, Zhongzhi S (2017) Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of 13th IEEE ACIS international conference on SEAIPDC. https://doi.org/10.1109/snpd.2012.31
Elgendy N, Elragal A (2014) Big data analytics a literature review paper. LNAI, vol 8557, pp 214–227. https://doi.org/10.1007/978-3-319-08976-8_16
Chapter Google Scholar
Ozkose H, Ari ES, Gencer C (2015) Yesterday, today and tomorrow of big data. Proc Soc Behav Sci 195:1042–1050. https://doi.org/10.1016/j.sbspro.2015.06.147
Article Google Scholar
Apiletti D, Baralis E, Pulvirenti F, Cerquitelli T, Garza P, Venturini L (2017) Frequent itemsets mining for big data: a comparative analysis. Big Data Res 9:67–83. https://doi.org/10.1016/j.bdr.2017.06.006
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering a review. ACM Comput Surv 31(3):264–323. https://doi.org/10.1145/331499.331504
Article Google Scholar
Nagpal A, Jatain A, Gaur D (2013) Review based on data clustering algorithms. In: Proceedings of IEEE ICT, pp 298–303. https://doi.org/10.1109/cict.2013.6558109
Berkhin P (2006) A survey of clustering data mining techniques. In: Teboulle M (eds) Group Multidimens Data 25–71. https://doi.org/10.1007/3-540-28349-8_2
Mann AK, Kaur NB (2013) Review paper on clustering techniques. Global J Comp Sci Tech Soft Data Eng 13(5)
Google Scholar
Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T (2014) Big data clustering: a review. LNCS, vol 8583, pp 707–720. https://doi.org/10.1007/978-3-319-09156-3_49
Chapter Google Scholar
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193. https://doi.org/10.1007/s40745-015-0040-1
Article MathSciNet Google Scholar
Oyelade J, Aromolaran O, Itaewon I, Uwoghiren E, Oladipupo F, Ameh F, Adebiyi E, Achas M (2016) Clustering algorithms their application to gene expression data. Bioinf Biol Insights 10:237–253. https://doi.org/10.4137/BBI.S38316
Article Google Scholar
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/tetc.2014.2330519
Article Google Scholar
Pandove D, G.S.: A comprehensive study on clustering approaches for big data mining. In: IEEE 2nd ICECS, pp 1333–1338. https://doi.org/10.1109/ecs.2015.7124801
Sardar TH, Ansari Z (2018) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Fut Comput Inf J 3(2):247–261. https://doi.org/10.1016/j.fcij.2018.06.002
Article Google Scholar
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th BSMSP, vol 1, pp 281–297
Google Scholar
Sinha A, Jana PK (2018) A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets. J Supercomput 74(4):1562–1579. https://doi.org/10.1007/s11227-017-2182-8
Article Google Scholar
Berard A, Hebrail G (2013) Searching time series with hadoop in an electric power company. In: Proceedings of BDSHSMASPMA, pp 15–22. https://doi.org/10.1145/2501221.2501224

Download references

Author information

Authors and Affiliations

Department of Computer Science and Applications, Dr. Harisingh Gour Vishwavidyalaya, Sagar, 470003, Madhya Pradesh, India
Kamlesh Kumar Pandey, Diwakar Shukla & Ram Milan

Authors

Kamlesh Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Diwakar Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Ram Milan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamlesh Kumar Pandey .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Sagar Institute of Research and Technology, Bhopal, Madhya Pradesh, India
Rajesh Kumar Shukla
Department of Computer Science and Engineering, University Teaching Department, Rajiv Gandhi Technical University (State Technological University), Bhopal, Madhya Pradesh, India
Jitendra Agrawal
School of Information Technology, Rajiv Gandhi Technical University (State Technological University), Bhopal, Madhya Pradesh, India
Sanjeev Sharma
Department of Computer Science and Engineering, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India
Narendra S. Chaudhari
Department of Computer Science and Engineering, Indian Institute of Technology BHU, Varanasi, Uttar Pradesh, India
K. K. Shukla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pandey, K.K., Shukla, D., Milan, R. (2020). A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability. In: Shukla, R., Agrawal, J., Sharma, S., Chaudhari, N., Shukla, K. (eds) Social Networking and Computational Intelligence. Lecture Notes in Networks and Systems, vol 100. Springer, Singapore. https://doi.org/10.1007/978-981-15-2071-6_34

Download citation

DOI: https://doi.org/10.1007/978-981-15-2071-6_34
Published: 22 March 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2070-9
Online ISBN: 978-981-15-2071-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Abstract

Access this chapter

Similar content being viewed by others

Big Data Clustering Using MapReduce Framework: A Review

Optimized big data K-means clustering using MapReduce

An Optimized K-means Clustering Approach on Top of MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability

Abstract

Access this chapter

Similar content being viewed by others

Big Data Clustering Using MapReduce Framework: A Review

Optimized big data K-means clustering using MapReduce

An Optimized K-means Clustering Approach on Top of MapReduce

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation