Advertisement

Optimizing Partition Granularity, Membership Function Parameters, and Rule Bases of Fuzzy Classifiers for Big Data by a Multi-objective Evolutionary Approach

  • Marco Barsacchi
  • Alessio Bechini
  • Pietro Ducange
  • Francesco Marcelloni
Article
  • 56 Downloads

Abstract

Classical data mining algorithms are considered inadequate to manage the volume, variety, velocity, and veracity aspects of big data. The advent of a number of open-source cluster-computing frameworks has opened new interesting perspectives for handling the volume and velocity features. In this context, thanks to their capability of coping with vague and imprecise information, distributed fuzzy models appear to be particularly suitable for handling the variety and veracity features of big data. Moreover, the interpretability of fuzzy models may assume a particular relevance in the context of big data mining. In this work, we propose a novel approach for generating, out of big data, a set of fuzzy rule–based classifiers characterized by different optimal trade-offs between accuracy and interpretability. We extend a state-of-the-art distributed multi-objective evolutionary learning scheme, implemented under the Apache Spark environment. In particular, we exploit a recently proposed distributed fuzzy decision tree learning approach for generating an initial rule base that serves as input to the evolutionary process. Furthermore, we integrate the evolutionary learning scheme with an ad hoc strategy for the granularity learning of the fuzzy partitions, along with the optimization of both the rule base and the fuzzy set parameters. Experimental investigations show that the proposed approach is able to generate fuzzy rule–based classifiers that are significantly less complex than the ones generated by the original multi-objective evolutionary learning scheme, while keeping the same accuracy levels.

Keywords

Big data mining Multi-objective evolutionary fuzzy systems Fuzzy classification models Distributed algorithms 

Notes

Funding Information

This work was been partially supported by the University of Pisa under grant PRA_2017 “IoT e Big Data: metodologie e tecnologie per la raccolta e l’elaborazione di grosse moli di dati.” Moreover, the work carried out in implementing the described approach is part of the efforts for the development of the projects “SIBILLA” and “TALENT,” co-financed by Regione Toscana under the framework POR-FESR 2014-2020 - Bando 2.

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with the active participation of humans. Furthermore, this article does not contain any studies on animals. The data collected and processed will be solely used for research related to this work and it will be ensured that they will not allow to identify any of the authors of such data.

References

  1. 1.
    Abdullah A, Hussain A, Khan IH. Introduction: dealing with big data - lessons from cognitive computing. Cogn Comput 2015;7(6):635–6.  https://doi.org/10.1007/s12559-015-9364-6.CrossRefGoogle Scholar
  2. 2.
    Al-Ali A, Zualkernan IA, Rashid M, Gupta R, Alikarar M. A smart home energy management system using IoT and Big Data analytics approach. IEEE Trans Consum Electron 2017;63(4):426–34.  https://doi.org/10.1109/TCE.2017.015014.CrossRefGoogle Scholar
  3. 3.
    Aljarah I, Al-Zoubi AM, Faris H, Hassonah MA, Mirjalili S, Saadeh H. Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cogn Comput 2018; 10(3):478–95.  https://doi.org/10.1007/s12559-017-9542-9.CrossRefGoogle Scholar
  4. 4.
    Antonelli M, Ducange P, Lazzerini B, Marcelloni F. Learning concurrently partition granularities and rule bases of Mamdani fuzzy systems in a multi-objective evolutionary framework. Int J Approx Reason 2009;50 (7):1066–80.  https://doi.org/10.1016/j.ijar.2009.04.004.CrossRefGoogle Scholar
  5. 5.
    Antonelli M, Ducange P, Lazzerini B, Marcelloni F. Multi-objective evolutionary learning of granularity, membership function parameters and rules of Mamdani fuzzy systems. Evol Intel 2009;2(1-2):21–37.  https://doi.org/10.1007/s12065-009-0022-3.CrossRefGoogle Scholar
  6. 6.
    Antonelli M, Ducange P, Lazzerini B, Marcelloni F. Learning knowledge bases of multi-objective evolutionary fuzzy systems by simultaneously optimizing accuracy, complexity and partition integrity. Soft Comput 2011;15(12):2335–54.  https://doi.org/10.1007/s00500-010-0665-0.CrossRefGoogle Scholar
  7. 7.
    Antonelli M, Ducange P, Lazzerini B, Marcelloni F. Multi-objective evolutionary design of granular rule-based classifiers. Granular Computing 2016;1(1):37–58.CrossRefGoogle Scholar
  8. 8.
    Antonelli M, Ducange P, Marcelloni F. Genetic training instance selection in multiobjective evolutionary fuzzy systems: a coevolutionary approach. IEEE Trans Fuzzy Syst 2012; 20 (2): 276–90.  https://doi.org/10.1109/TFUZZ.2011.2173582.CrossRefGoogle Scholar
  9. 9.
    Antonelli M, Ducange P, Marcelloni F. A fast and efficient multi-objective evolutionary learning scheme for fuzzy rule-based classifiers. Inf Sci 2014;283:36–54.  https://doi.org/10.1016/j.ins.2014.06.014.CrossRefGoogle Scholar
  10. 10.
    Antonelli M, Ducange P, Marcelloni F. Multi-objective evolutionary design of fuzzy rule-based systems. Handbook on computational intelligence: vol 2: Evolutionary Computation, hybrid systems, and applications. World Scientific; 2016. p. 635–670.Google Scholar
  11. 11.
    Anuradha J, et al. A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia computer science 2015;48:319–24.  https://doi.org/10.1016/j.procs.2015.04.188.CrossRefGoogle Scholar
  12. 12.
    Ayesh A, Blewitt W. Models for computational emotions from psychological theories using type I fuzzy logic. Cogn Comput 2015;7(3):285–308.  https://doi.org/10.1007/s12559-014-9287-7.CrossRefGoogle Scholar
  13. 13.
    Baldi P, Sadowski P, Whiteson D. 2014. Searching for exotic particles in high-energy physics with deep learning. Nat Commun, 5.  https://doi.org/10.1038/ncomms5308.
  14. 14.
    Bechini A, Marcelloni F, Segatori A. A MapReduce solution for associative classification of big data. Inf Sci 2016;332:33–55.  https://doi.org/10.1016/j.ins.2015.10.041.CrossRefGoogle Scholar
  15. 15.
    Bechini A, Matteis ADD, Marcelloni F, Segatori A. Spreading fuzzy random forests with MapReduce. 2016 IEEE Int’l conf. on systems, man, and cybernetics (SMC); 2016. p. 2641–0646.  https://doi.org/10.1109/SMC.2016.7844638.
  16. 16.
    Cai Z, Shao L. 2018. RGB-d scene classification via multi-modal feature learning. Cognitive Computation.  https://doi.org/10.1007/s12559-018-9580-y.
  17. 17.
    Chi Z, Yan H, Phạm T. 1996. Fuzzy algorithms: with applications to image processing and pattern recognition, Advances in Fuzzy Systems - Applications and Theory, vol 10 World Scientific.  https://doi.org/10.1142/3132.
  18. 18.
    Cococcioni M, Ducange P, Lazzerini B, Marcelloni F. A Pareto-based multi-objective evolutionary approach to the identification of Mamdani fuzzy systems. Soft Comput 2007;11(11):1013–31.  https://doi.org/10.1007/s00500-007-0150-6.CrossRefGoogle Scholar
  19. 19.
    Coello Coello CA, Lamont GB, Van Veldhuizen DA. 2007. Evolutionary algorithms for solving multi-objective problems, vol 5, 2nd edn Springer.  https://doi.org/10.1007/978-0-387-36797-2.
  20. 20.
    Contreras D, Salamó M. 2018. A cognitively inspired clustering approach for critique-based recommenders. Cognitive Computation.  https://doi.org/10.1007/s12559-018-9586-5.
  21. 21.
    Dai W, Ji W. A MapReduce implementation of C4.5 decision tree algorithm. Int’l Journal of Database Theory and Application 2014;7(1):49–60.  https://doi.org/10.14257/ijdta.2014.7.1.05.CrossRefGoogle Scholar
  22. 22.
    Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Commun ACM 2008;51(1): 107–13.  https://doi.org/10.1145/1327452.1327492.CrossRefGoogle Scholar
  23. 23.
    Ducange P, Pecori R, Mezzina P. A glimpse on big data analytics in the framework of marketing strategies. Soft Comput 2018;22(1):325–42.  https://doi.org/10.1007/s00500-017-2536-4.CrossRefGoogle Scholar
  24. 24.
    Duţu LC, Mauris G, Bolon P. A fast and accurate rule-base generation method for Mamdani fuzzy systems. IEEE Trans Fuzzy Syst 2018;26(2):715–33.  https://doi.org/10.1109/TFUZZ.2017.2688349.CrossRefGoogle Scholar
  25. 25.
    Elkano M, Galar M, Sanz J, Bustince H. CHI-BD: a fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst 2018;348:75–101.  https://doi.org/10.1016/j.fss.2017.07.003.CrossRefGoogle Scholar
  26. 26.
    Elkano M, Galar M, Sanz J, Bustince H. CHI-PG: A fast prototype generation algorithm for Big Data classification problems. Neurocomputing 2018;287:22–33.  https://doi.org/10.1016/j.neucom.2018.01.056.CrossRefGoogle Scholar
  27. 27.
    Fazzolari M, Alcalá R, Nojima Y, Ishibuchi H, Herrera F. A review of the application of multi-objective evolutionary fuzzy systems: current status and further directions. IEEE Trans Fuzzy Syst 2013;21(1): 45–65.  https://doi.org/10.1109/TFUZZ.2012.2201338.CrossRefGoogle Scholar
  28. 28.
    Fernández A, Almansa E, Herrera F. Chi-spark-RS: an Spark-built evolutionary fuzzy rule selection algorithm in imbalanced classification for big data problems. 2017 IEEE International conference on fuzzy systems (FUZZ-IEEE). IEEE; 2017. p. 1–6.  https://doi.org/10.1109/FUZZ-IEEE.2017.8015520.
  29. 29.
    Fernández A, Carmona CJ, del Jesus MJ, Herrera F. A view on fuzzy systems for big data: progress and opportunities. Int’l Journal of Computational Intelligence Systems 2016;9(sup1):69–80.  https://doi.org/10.1080/18756891.2016.1180820.CrossRefGoogle Scholar
  30. 30.
    Fernández A, del Río S, Bawakid A, Herrera F. Fuzzy rule based classification systems for big data with MapReduce: granularity analysis. ADAC 2017;11(4):711–30.  https://doi.org/10.1007/s11634-016-0260-z.CrossRefGoogle Scholar
  31. 31.
    Fernández A, del Río S, López V, Bawakid A, del Jesus MJ, Benítez JM, Herrera F. Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2014;4(5):380–409.  https://doi.org/10.1002/widm.1134.CrossRefGoogle Scholar
  32. 32.
    Ferranti A, Marcelloni F, Segatori A, Antonelli M, Ducange P. A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data. Inf Sci 2017;415:319–40.  https://doi.org/10.1016/j.ins.2017.06.039.CrossRefGoogle Scholar
  33. 33.
    Gacto MJ, Alcalá R, Herrera F. Interpretability of linguistic fuzzy rule-based systems: an overview of interpretability measures. Inf Sci 2011;181(20):4340–60.  https://doi.org/10.1016/j.ins.2011.02.021.CrossRefGoogle Scholar
  34. 34.
    García S, Molina D, Lozano M, Herrera F. A study on the use of non-parametric tests for analyzing the evolutionary algorithms behaviour: a case study on the cec 2005 special session on real parameter optimization. J Heuristics 2009;15(6):617–44.CrossRefGoogle Scholar
  35. 35.
    Han J, Kamber M, Pei J. 2012. Data mining: concepts and techniques, 3rd ed. edn. Data Management Systems Morgan Kaufmann.  https://doi.org/10.1016/C2009-0-61819-5.
  36. 36.
    Ishibuchi H, Nakashima T, Murata T. Three-objective genetics-based machine learning for linguistic rule extraction. Inf Sci 2001;136(1-4):109–33.CrossRefGoogle Scholar
  37. 37.
    Ishibuchi H, Yamamoto T. Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets Syst 2004;141(1):59–88.CrossRefGoogle Scholar
  38. 38.
    Kim SS, McLoone S, Byeon JH, Lee S, Liu H. Cognitively inspired artificial bee colony clustering for cognitive wireless sensor networks. Cogn Comput 2017;9(2):207–24.CrossRefGoogle Scholar
  39. 39.
    Kim Y, Shim K, Kim MS, Lee JS. DBCURE-MR: an efficient density-based clustering algorithm for large data using MapReduce. Inf Syst 2014;42:15–35.  https://doi.org/10.1016/j.is.2013.11.002.CrossRefGoogle Scholar
  40. 40.
    Knowles JD, Corne DW. Approximating the nondominated front using the Pareto archived evolution strategy. Evol Comput 2000;8(2):149–72.  https://doi.org/10.1162/106365600568167.CrossRefPubMedGoogle Scholar
  41. 41.
    López V, del Río S, benítez JM, Herrera F. Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst 2015;258:5–38.  https://doi.org/10.1016/j.fss.2014.01.015.CrossRefGoogle Scholar
  42. 42.
    Ludwig SA. MapReduce-based fuzzy C-means clustering algorithm: implementation and scalability. Int J Mach Learn Cybern 2015;6(6):923–34.  https://doi.org/10.1007/s13042-015-0367-0.CrossRefGoogle Scholar
  43. 43.
    Maillo J, Ramírez S, Triguero I, Herrera F. kNN-IS: an iterative Spark-based design of the k-nearest neighbors classifier for big data. Knowl-Based Syst 2017;117:3–15.  https://doi.org/10.1016/j.knosys.2016.06.012.CrossRefGoogle Scholar
  44. 44.
    Márquez A, Márquez F, Peregrín A. A scalable evolutionary linguistic fuzzy system with adaptive defuzzification in big data. 2017 IEEE International conference on fuzzy systems (FUZZ-IEEE). IEEE; 2017. p. 1–6.  https://doi.org/10.1109/FUZZ-IEEE.2017.8015753.
  45. 45.
    Mayer-Schönberger V, Cukier K. 2013. Big data: a revolution that will transform how we live, work, and think. Eamon Dolan/Houghton Mifflin Harcourt.Google Scholar
  46. 46.
    Miller GA. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 1956;63(2):81.  https://doi.org/10.1037/h0043158.CrossRefPubMedGoogle Scholar
  47. 47.
    Oneto L, Bisio F, Cambria E, Anguita D. Semi-supervised learning for affective common-sense reasoning. Cogn Comput 2017;9(1):18–42.  https://doi.org/10.1007/s12559-016-9433-5.CrossRefGoogle Scholar
  48. 48.
    Ramírez-Gallego S, Fernández A, García S, Chen M, Herrera F. Big data: tutorial and guidelines on information and process fusion for analytics algorithms with mapreduce. Information Fusion 2018;42: 51–61 .  https://doi.org/10.1016/j.inffus.2017.10.001.CrossRefGoogle Scholar
  49. 49.
    Rey M, Galende M, Fuente M, Sainz-Palmero G. Multi-objective based fuzzy rule based systems (FRBSs) for trade-off improvement in accuracy and interpretability: a rule relevance point of view. Knowl-Based Syst 2017;127:67–84.  https://doi.org/10.1016/j.knosys.2016.12.028.CrossRefGoogle Scholar
  50. 50.
    Ricatto M, Barsacchi M, Bechini A. Interpretable CNV-based tumour classification using fuzzy rule based classifiers. Proc of the 33rd ACM symposium on applied computing, SAC 18. New York: ACM; 2018.  https://doi.org/10.1145/3167132.3167135.
  51. 51.
    del Río S, López V, Benítez JM, Herrera F. A MapReduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. Int’l Journal of Computational Intelligence Systems 2015;8(3): 422–37.  https://doi.org/10.1080/18756891.2015.1017377.CrossRefGoogle Scholar
  52. 52.
    Segatori A, Bechini A, Ducange P, Marcelloni F. 2017. A distributed fuzzy associative classifier for big data. IEEE Transactions on Cybernetics.  https://doi.org/10.1109/TCYB.2017.2748225.
  53. 53.
    Segatori A, Marcelloni F, Pedrycz W. On distributed fuzzy decision trees for big data. IEEE Trans Fuzzy Syst 2018;26(1):174–92.  https://doi.org/10.1109/TFUZZ.2016.2646746.CrossRefGoogle Scholar
  54. 54.
    Van Veldhuizen DA, Zydallis JB, Lamont GB. Considerations in engineering parallel multiobjective evolutionary algorithms. IEEE Trans Evol Comput 2003;7(2):144–73.  https://doi.org/10.1109/TEVC.2003.810751.CrossRefGoogle Scholar
  55. 55.
    Wan J, Tang S, Li D, Wang S, Liu C, Abbas H, Vasilakos AV. A manufacturing big data solution for active preventive maintenance. IEEE Trans Ind Inf 2017;13(4):2039–47.  https://doi.org/10.1109/TII.2017.2670505.CrossRefGoogle Scholar
  56. 56.
    Wang H, Xu Z, Pedrycz W. An overview on the roles of fuzzy set techniques in big data processing: trends, challenges and opportunities. Knowl-Based Syst 2017;118:15–30.  https://doi.org/10.1016/j.knosys.2016.11.008.CrossRefGoogle Scholar
  57. 57.
    White T. 2012. Hadoop: the definitive guide. O’Reilly Media, Inc.Google Scholar
  58. 58.
    Wu X, Zhu X, Wu GQ, Ding W. Data mining with big data. IEEE Trans Knowl Data Eng 2014;26 (1):97–107.  https://doi.org/10.1109/TKDE.2013.109.CrossRefGoogle Scholar
  59. 59.
    Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. Proceedings of the 2nd USENIX conference on Hot topics in cloud computing; 2010. p. 10.Google Scholar
  60. 60.
    Zhou L, Pan S, Wang J, Vasilakos AV. Machine learning on big data: opportunities and challenges. Neurocomputing 2017;237:350–61.  https://doi.org/10.1016/j.neucom.2017.01.026.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Dipartimento di Ingegneria dell’InformazioneUniversity of PisaPisaItaly
  2. 2.SMART Engineering Solutions, Technologies (SMARTEST) Research CentreeCAMPUS UniversityNovedrateItaly

Personalised recommendations