Skip to main content

Advertisement

Log in

A high-performance parallel coral reef optimization for data clustering

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

As a critical research topic toward the new era of big data, how to develop a high-performance data analytics system has received significant research attention from different disciplines since the 2000s. In the literature, many recent works attempted to develop a high-performance data analytics system to handle the large amount of data (i.e., volume) from different information systems (i.e., variety) that typically will be created very quickly in a short time (i.e., velocity). In particular, several recent studies have shown that metaheuristic algorithms can be applied to many data mining optimization problems to provide a better way to find a high-quality result than traditional deterministic algorithms. A high-performance clustering algorithm for big data analytics system will be presented in this paper. The proposed algorithm is designed based on a new kind of metaheuristic algorithm, coral reef optimization with substrate layers (CRO-SL), to get a better cluster result. To improve the effectiveness and efficiency, the proposed CRO-SL scheme has been applied to a cloud computing platform as well to reduce the response time of a data analytics system. The simulation results show that the proposed algorithm is able to provide a better clustering result than the other clustering algorithms compared in this research, including k-means, genetic k-means algorithm, particle swarm optimization, and simple coral reef optimization algorithm in terms of the sum of squared errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The CRO-SL is an extended version of the coral reefs optimization algorithm, which was presented in Salcedo-Sanz et al. (2016).

References

  • Agrawal D, Das S, El Abbadi A (2011) Big data and cloud computing: current state and future opportunities. In: Proceedings of the international conference on extending database technology, pp 530–533

  • Ashish T, Kapil S, Manju B (2018) Parallel bat algorithm-based clustering using MapReduce. In: Proceedings of the networking communication and data knowledge engineering. Springer Singapore, pp 73–82

  • Bandyopadhyay S, Maulik U (2002) An evolutionary technique based on K-means algorithm for optimal clustering in \(R^N\). Inf Sci 146(1):221–237

    Article  MATH  Google Scholar 

  • Baraniuk RG (2011) More is less: signal processing and the data deluge. Science 331(6018):717–719

    Article  Google Scholar 

  • Blum C, Roli A (2003) Metaheuristics in combinatorial optimization: overview and conceptual comparison. ACM Comput Surv 35(3):268–308

    Article  Google Scholar 

  • Bryan K, Cunningham P, Bolshakova N (2005) Biclustering of expression data using simulated annealing. In: Proceedings of the IEEE symposium on computer-based medical systems (CBMS’05), pp 383–388

  • Daoudi M, Hamena S, Benmounah Z, Batouche M (2014) Parallel differential evolution clustering algorithm based on MapReduce. In: Proceedings of the international conference of soft computing and pattern recognition, pp 337–341

  • Debuse JC, Rayward-Smith VJ (1997) Feature subset selection within a simulated annealing data mining algorithm. J Intell Inf Syst 9(1):57–81

    Article  Google Scholar 

  • Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Fang W, Lau KK, Lu M, Xiao X, Lam CK, Yang PY, He B, Luo Q, Sander PV, Yang K (2008) Parallel data mining on graphics processors. Tech. Rep., The Hong Kong University of Science and Technology

  • Fayyad U, Piatetsky-shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17:37–54

    Google Scholar 

  • Ficco M, Esposito C, Palmieri F, Castiglione A (2018) A coral-reefs and game theory-based approach for optimizing elastic cloud resource allocation. Future Gener Comput Syst 78:343–352

    Article  Google Scholar 

  • Glover F, Kochenberger GA (eds) (2003) Handbook of metaheuristics. Springer, Berlin

    MATH  Google Scholar 

  • Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1(2):95–113

    Article  Google Scholar 

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco. ISBN 0123814790, 9780123814791

  • Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inf Syst 47:98–115

    Article  Google Scholar 

  • Hoffman P, Grinstein G, Pinkney D (1999) Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations. In: Proceedings of the workshop on new paradigms in information visualization and manipulation in conjunction with the ACM international conference on information and knowledge management, pp 9–16

  • Huang DW, Lin J (2010) Scaling populations of a genetic algorithm for job shop scheduling problems using MapReduce. In: Proceedings of the IEEE second international conference on cloud computing technology and science, pp 780–785

  • Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of international conference on neural networks, vol 4, pp 1942–1948

  • Krishna K, Murty MN (1999) Genetic \(k\)-means algorithm. IEEE Trans Syst Man Cybern Part B 29(3):433–439

    Article  Google Scholar 

  • Lai JZC, Liaw Y-C, Liu J (2008) A fast VQ codebook generation algorithm using codeword displacement. Pattern Recognit Lett 41(1):315–319

    Article  MATH  Google Scholar 

  • Laney D (2001) 3D data management: controlling data volume, velocity, and variety. Tech. Rep, META Group

  • Liu B (2009) Web data mining: exploring hyperlinks, contents, and usage data. Springer, Berlin

    MATH  Google Scholar 

  • Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc VLDB Endow 5(8):716–727

    Article  Google Scholar 

  • Lu Y, Cao B, Rego C, Glover F (2018) A Tabu search based clustering algorithm and its parallel implementation on Spark. Appl Soft Comput 63:97–109

    Article  Google Scholar 

  • MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1: statistics, pp 281–297

  • Maimon O (2009) Soft computing for knowledge discovery and data mining. Springer, Berlin. ISBN 144194351X, 9781441943514

  • Medeiros IG, Xavier JC, Canuto AMP (2015) Applying the coral reefs optimization algorithm to clustering problems. In: Proceedings of the international joint conference on neural networks, pp 1–8

  • Mitra S, Pal SK, Mitra P (2002) Data mining in soft computing framework: a survey. IEEE Trans Neural Netw 13(1):3–14

    Article  Google Scholar 

  • Ostfeld A, Salomons S (2005) A hybrid genetic-instance based learning algorithm for CE-QUAL-W2 calibration. J Hydrol 310(1):122–142

    Article  Google Scholar 

  • Parpinelli RS, Lopes HS, Freitas AA (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evolut Comput 6(4):321–332

    Article  MATH  Google Scholar 

  • Radviz (2018) https://cran.r-project.org/web/packages/Radviz/vignettes/single_cell_projections.html

  • Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(3):1–10

    Google Scholar 

  • Sagiroglu S, Sinanc D (2013) Big data: a review. In: Proceedings of the international conference on collaboration technologies and systems (CTS), pp 42–47

  • Salcedo-Sanz S, Ser JD, Gil-López S, Landa-Torres I, Portilla-Figueras JA (2013a) The coral reefs optimization algorithm: an efficient meta-heuristic for solving hard optimization problems. In: Proceedings of the applied stochastic models and data analysis international conference, pp 751–758

  • Salcedo-Sanz S, Pastor-Sánchez A, Gallo-Marazuela D, Portilla-Figueras A (2013b) A novel coral reefs optimization algorithm for multi-objective problems. In: Proceedings of the intelligent data engineering and automated learning, pp 326–333

  • Salcedo-Sanz S, Ser JD, Landa-Torres I, Gil-López S, Portilla-Figueras JA (2014a) The coral reefs optimization algorithm: a novel metaheuristic for efficiently solving optimization problems. Sci World J 2014:1–15

    Google Scholar 

  • Salcedo-Sanz S, García-Díaz P, Portilla-Figueras J, Ser JD, Gil-López S (2014b) A coral reefs optimization algorithm for optimal mobile network deployment with electromagnetic pollution control criterion. Appl Soft Comput 24:239–248

    Article  Google Scholar 

  • Salcedo-Sanz S, Gallo-Marazuela D, Pastor-Sánchez A, Carro-Calvo L, Portilla-Figueras A, Prieto L (2014c) Offshore wind farm design with the coral reefs optimization algorithm. Renew Energy 63:109–115

    Article  Google Scholar 

  • Salcedo-Sanz S, Casanova-Mateo C, Pastor-Sánchez A, Sánchez-Girón M (2014d) Daily global solar radiation prediction based on a hybrid coral reefs optimization—extreme learning machine approach. Sol Energy 105:91–98

    Article  Google Scholar 

  • Salcedo-Sanz S, Pastor-Sánchez A, Ser JD, Prieto L, Geem Z (2015) A coral reefs optimization algorithm with harmony search operators for accurate wind speed prediction. Renew Energy 75:93–101

    Article  Google Scholar 

  • Salcedo-Sanz S, Camacho-Gómez C, Molina D, Herrera F (2016) A coral reefs optimization algorithm with substrate layers and local search for large scale global optimization. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp 3574–3581

  • Sarazin T, Azzag H, Lebbah M (2014) SOM clustering using Spark-MapReduce. In: Proceedings of the IEEE international parallel distributed processing symposium workshops, pp 1727–1734

  • Selim SZ, Alsultan K (1991) A simulated annealing algorithm for the clustering problem. Pattern Recognit 24(10):1003–1008

    Article  MathSciNet  Google Scholar 

  • Shmueli G, Bruce PC, Yahav I, Patel NR, L KC Jr (2017) Data mining for business analytics: concepts, techniques, and applications in R. Wiley, Hoboken

    Google Scholar 

  • Teijeiro D, Pardo XC, González P, Banga JR, Doallo R (2016) Implementing parallel differential evolution on Spark. In: Proceedings of the applications of evolutionary computation. Springer, pp 75–90

  • Tsai C, Lai C, Chiang M, Yang LT (2014) Data mining for internet of things: a survey. IEEE Commun Surv Tutor 16(1):77–97

    Article  Google Scholar 

  • Tsai C-W, Huang K-W, Yang C-S, Chiang M-C (2015) A fast particle swarm optimization for clustering. Soft Comput 19(2):321–338

    Article  Google Scholar 

  • Tsai C-W, Chang H-C, Hu K-C, Chiang M-C (2016) Parallel coral reef algorithm for solving JSP on Spark. In: Proceedings of the IEEE international conference on systems, man, and cybernetics, pp 1872–1877

  • Tsai C-W, Liu S-J, Wang Y-C (2018) A parallel metaheuristic data clustering framework for cloud. J Parallel Distrib Comput 116:39–49

    Article  Google Scholar 

  • Tseng L-Y, Chen C (2008) Multiple trajectory search for large scale global optimization. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp 3052–3059

  • User locations until 2012 (FINLAND) (2018). http://cs.uef.fi/mopsi/data/

  • van der Merwe DW, Engelbrecht AP (2003) Data clustering using particle swarm optimization. Proc Evolut Comput 1:215–220

    Google Scholar 

  • Wang Y-C, Tsai C-W (2008) An efficient coral reef optimization with substrate layers for clustering problem on Spark. In: Proceedings of IEEE international conference on systems, man and cybernetics

  • Wang B, Yin J, Hua Q, Wu Z, Cao J (2016) Parallelizing \(k\)-means-based clustering on Spark. In: Proceedings of the international conference on advanced cloud and big data, pp 31–36

  • Wu R, Zhang B, Hsu M (2009) Clustering billions of data points using GPUs. In: Proceedings of the combined workshops on unconventional high performance computing workshop plus memory access workshop, pp 1–6

  • Wu B, Wu G, Yang M (2012) A MapReduce based ant colony optimization approach to combinatorial optimization problems. In: Proceedings of the international conference on natural computation, pp 728–732

  • Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  • Zhou J, Yu K-M, Wu B-C (2010) Parallel frequent patterns mining algorithm on GPU. In: Proceedings of the IEEE international conference on systems, man and cybernetics, pp 435–440

  • Zü (2008) K-harmonic means data clustering with tabu-search method. Appl Math Model 32(6):1115–1125

    Article  MATH  Google Scholar 

Download references

Funding

This work was supported in part by the Ministry of Science and Technology of Taiwan, R.O.C., under Contracts MOST106-2221-E-005-094, MOST107-2221-E-005-029, MOST107-2221-E-005-022 and MOST107-2218-E-005-018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huan Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by A.K. Sangaiah, H. Pham, M.-Y. Chen, H. Lu, F. Mercaldo.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsai, CW., Chang, WY., Wang, YC. et al. A high-performance parallel coral reef optimization for data clustering. Soft Comput 23, 9327–9340 (2019). https://doi.org/10.1007/s00500-019-03950-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-03950-3

Keywords

Navigation