Classical data mining algorithms are considered inadequate to manage the volume, variety, velocity, and veracity aspects of big data. The advent of a number of open-source cluster-computing frameworks has opened new interesting perspectives for handling the volume and velocity features. In this context, thanks to their capability of coping with vague and imprecise information, distributed fuzzy models appear to be particularly suitable for handling the variety and veracity features of big data. Moreover, the interpretability of fuzzy models may assume a particular relevance in the context of big data mining. In this work, we propose a novel approach for generating, out of big data, a set of fuzzy rule–based classifiers characterized by different optimal trade-offs between accuracy and interpretability. We extend a state-of-the-art distributed multi-objective evolutionary learning scheme, implemented under the Apache Spark environment. In particular, we exploit a recently proposed distributed fuzzy decision tree learning approach for generating an initial rule base that serves as input to the evolutionary process. Furthermore, we integrate the evolutionary learning scheme with an ad hoc strategy for the granularity learning of the fuzzy partitions, along with the optimization of both the rule base and the fuzzy set parameters. Experimental investigations show that the proposed approach is able to generate fuzzy rule–based classifiers that are significantly less complex than the ones generated by the original multi-objective evolutionary learning scheme, while keeping the same accuracy levels.
Big data mining Multi-objective evolutionary fuzzy systems Fuzzy classification models Distributed algorithms
This is a preview of subscription content, log in to check access.
This work was been partially supported by the University of Pisa under grant PRA_2017 “IoT e Big Data: metodologie e tecnologie per la raccolta e l’elaborazione di grosse moli di dati.” Moreover, the work carried out in implementing the described approach is part of the efforts for the development of the projects “SIBILLA” and “TALENT,” co-financed by Regione Toscana under the framework POR-FESR 2014-2020 - Bando 2.
Compliance with Ethical Standards
Conflict of Interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with the active participation of humans. Furthermore, this article does not contain any studies on animals. The data collected and processed will be solely used for research related to this work and it will be ensured that they will not allow to identify any of the authors of such data.
Antonelli M, Ducange P, Marcelloni F. Multi-objective evolutionary design of fuzzy rule-based systems. Handbook on computational intelligence: vol 2: Evolutionary Computation, hybrid systems, and applications. World Scientific; 2016. p. 635–670.Google Scholar
Bechini A, Matteis ADD, Marcelloni F, Segatori A. Spreading fuzzy random forests with MapReduce. 2016 IEEE Int’l conf. on systems, man, and cybernetics (SMC); 2016. p. 2641–0646. https://doi.org/10.1109/SMC.2016.7844638.
Chi Z, Yan H, Phạm T. 1996. Fuzzy algorithms: with applications to image processing and pattern recognition, Advances in Fuzzy Systems - Applications and Theory, vol 10 World Scientific. https://doi.org/10.1142/3132.
Fernández A, Almansa E, Herrera F. Chi-spark-RS: an Spark-built evolutionary fuzzy rule selection algorithm in imbalanced classification for big data problems. 2017 IEEE International conference on fuzzy systems (FUZZ-IEEE). IEEE; 2017. p. 1–6. https://doi.org/10.1109/FUZZ-IEEE.2017.8015520.
Fernández A, del Río S, López V, Bawakid A, del Jesus MJ, Benítez JM, Herrera F. Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2014;4(5):380–409. https://doi.org/10.1002/widm.1134.CrossRefGoogle Scholar
García S, Molina D, Lozano M, Herrera F. A study on the use of non-parametric tests for analyzing the evolutionary algorithms behaviour: a case study on the cec 2005 special session on real parameter optimization. J Heuristics 2009;15(6):617–44.CrossRefGoogle Scholar
Márquez A, Márquez F, Peregrín A. A scalable evolutionary linguistic fuzzy system with adaptive defuzzification in big data. 2017 IEEE International conference on fuzzy systems (FUZZ-IEEE). IEEE; 2017. p. 1–6. https://doi.org/10.1109/FUZZ-IEEE.2017.8015753.
Mayer-Schönberger V, Cukier K. 2013. Big data: a revolution that will transform how we live, work, and think. Eamon Dolan/Houghton Mifflin Harcourt.Google Scholar
Ricatto M, Barsacchi M, Bechini A. Interpretable CNV-based tumour classification using fuzzy rule based classifiers. Proc of the 33rd ACM symposium on applied computing, SAC 18. New York: ACM; 2018. https://doi.org/10.1145/3167132.3167135.
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. Proceedings of the 2nd USENIX conference on Hot topics in cloud computing; 2010. p. 10.Google Scholar