Abstract
Big data mining is modern scientific research, which is used by all data related fields such as communication, computer, biology, geographical science, and so on. Basically, big data is related to volume, variety, velocity, variability, value, veracity, and visualization. Data mining technique is related to extract needed information, knowledge and hidden pattern, relations from large datasets with the heterogeneous format of data, which is collected by multiple sources. Data mining have classification, clustering, and association techniques for big data mining. Clustering is one of the approaches for mining, which is used for mine similar types of data, hidden patterns, and related data. All traditional clustering data mining approaches, such as partition, hierarchical, density, grid, and model-based algorithm, works on only high volume or high variety or high velocity. If we Apply the traditional clustering algorithms for big data mining then these algorithms will not work in the proper manner, and they need such clustering algorithms that work under high volume, high variety and high velocity. This paper presents the introduction to big data, big data mining, and traditional clustering algorithms concepts. From a theoretical, practical, and existing research perspective, this paper categorized clustering framework based on volume (dataset size, dimensional data), variety (dataset type, cluster shape), and velocity (scalability, time complexity), and presented a common framework for scalable and speed-up any type of clustering algorithm with MapReduce capability and shown this MapReduce clustering framework with the help of K-means algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen M, Mao S, Liu Y (2014) Big data a survey. Mob Netw Appl 19(2):171–209. https://doi.org/10.1007/s11036-013-0489-0
Rouhani S, Robbie S, Hamidi H (2017) What do we know about the big data researches? A systematic review from 2011 to 2017. J Decis Syst 26(4):368–393. https://doi.org/10.1080/12460125.2018.1437654
Sivarajah U, Kamal MM (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263–286. https://doi.org/10.1016/j.jbusres.2016.08.001
Gole S, Tidke B (2015) A survey of Big Data in social media using data mining techniques. Proc IEEE ICACCS. https://doi.org/10.1109/ICACCS.2015.7324059
Gandomi A, Haider M (2015) Beyond the hype: Big Data concepts methods and analytics. Int J Inf Manag 35(2):137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
Wasastjerna MC (2018) The role of big data and digital privacy in merger review. Eur Compet J 14(2–3):417–444. https://doi.org/10.1080/17441056.2018.1533364
Pandey KK (2018) Mining on relationship in big data era using Apriori algorithm. In: Proceedings of NCDAMLS, pp 55–60. ISBN: 978-93-5291-457-9
Che D, Safran M, Peng Z (2013) From big data to big data mining challenges issues and opportunities. LNCS, vol 7827, pp 1–12. https://doi.org/10.1007/978-3-642-40270-8_1
Li N, Zeng L, Qing H, Zhongzhi S (2017) Parallel implementation of apriori algorithm based on MapReduce. In: Proceedings of 13th IEEE ACIS international conference on SEAIPDC. https://doi.org/10.1109/snpd.2012.31
Elgendy N, Elragal A (2014) Big data analytics a literature review paper. LNAI, vol 8557, pp 214–227. https://doi.org/10.1007/978-3-319-08976-8_16
Ozkose H, Ari ES, Gencer C (2015) Yesterday, today and tomorrow of big data. Proc Soc Behav Sci 195:1042–1050. https://doi.org/10.1016/j.sbspro.2015.06.147
Apiletti D, Baralis E, Pulvirenti F, Cerquitelli T, Garza P, Venturini L (2017) Frequent itemsets mining for big data: a comparative analysis. Big Data Res 9:67–83. https://doi.org/10.1016/j.bdr.2017.06.006
Jain AK, Murty MN, Flynn PJ (1999) Data clustering a review. ACM Comput Surv 31(3):264–323. https://doi.org/10.1145/331499.331504
Nagpal A, Jatain A, Gaur D (2013) Review based on data clustering algorithms. In: Proceedings of IEEE ICT, pp 298–303. https://doi.org/10.1109/cict.2013.6558109
Berkhin P (2006) A survey of clustering data mining techniques. In: Teboulle M (eds) Group Multidimens Data 25–71. https://doi.org/10.1007/3-540-28349-8_2
Mann AK, Kaur NB (2013) Review paper on clustering techniques. Global J Comp Sci Tech Soft Data Eng 13(5)
Shirkhorshidi AS, Aghabozorgi S, Wah TY, Herawan T (2014) Big data clustering: a review. LNCS, vol 8583, pp 707–720. https://doi.org/10.1007/978-3-319-09156-3_49
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193. https://doi.org/10.1007/s40745-015-0040-1
Oyelade J, Aromolaran O, Itaewon I, Uwoghiren E, Oladipupo F, Ameh F, Adebiyi E, Achas M (2016) Clustering algorithms their application to gene expression data. Bioinf Biol Insights 10:237–253. https://doi.org/10.4137/BBI.S38316
Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya AY, Foufou S, Bouras A (2014) A survey of clustering algorithms for big data taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/tetc.2014.2330519
Pandove D, G.S.: A comprehensive study on clustering approaches for big data mining. In: IEEE 2nd ICECS, pp 1333–1338. https://doi.org/10.1109/ecs.2015.7124801
Sardar TH, Ansari Z (2018) Partition based clustering of large datasets using MapReduce framework: an analysis of recent themes and directions. Fut Comput Inf J 3(2):247–261. https://doi.org/10.1016/j.fcij.2018.06.002
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th BSMSP, vol 1, pp 281–297
Sinha A, Jana PK (2018) A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets. J Supercomput 74(4):1562–1579. https://doi.org/10.1007/s11227-017-2182-8
Berard A, Hebrail G (2013) Searching time series with hadoop in an electric power company. In: Proceedings of BDSHSMASPMA, pp 15–22. https://doi.org/10.1145/2501221.2501224
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pandey, K.K., Shukla, D., Milan, R. (2020). A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapReduce Capability. In: Shukla, R., Agrawal, J., Sharma, S., Chaudhari, N., Shukla, K. (eds) Social Networking and Computational Intelligence. Lecture Notes in Networks and Systems, vol 100. Springer, Singapore. https://doi.org/10.1007/978-981-15-2071-6_34
Download citation
DOI: https://doi.org/10.1007/978-981-15-2071-6_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2070-9
Online ISBN: 978-981-15-2071-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)