Advertisement

A Big Data Approach for the Extraction of Fuzzy Emerging Patterns

  • Ángel Miguel García-VicoEmail author
  • Pedro González
  • Cristóbal José Carmona
  • María José del Jesus
Article
  • 49 Downloads

Abstract

Nowadays, the growth of available data, known as big data, and machine learning techniques are changing our lives. The extraction of insights related to the underlying phenomena in data is key in order to improve decision-making processes. These underlying phenomena are described in emerging pattern mining by means of the description of the discriminative characteristics between the outputs of interest, which is a very important characteristic in machine learning. However, emerging pattern mining algorithms for big data environments have not been widely developed yet. This paper presents the first multi-objective evolutionary algorithm for emerging pattern mining in big data environments called BD-EFEP. BD-EFEP implements novelties for emerging pattern mining such as the MapReduce approach to improve the efficiency of the evaluation of the individuals, or the use of a token-competition-based procedure in order to boost the extraction of simple, general and reliable emerging pattern models. The experimental study performed using datasets with high number of examples shows the advantages of the algorithm proposed for the emerging pattern mining task in big data problems. Results show that the approach used by BD-EFEP opens new research lines for the extraction of high descriptive emerging patterns in big data environments.

Keywords

Emerging pattern mining Evolutionary algorithms Fuzzy systems Big data 

Notes

Funding Information

This study was funded by the Spanish Ministry of Economy and Competitiveness under the project TIN2015-68454-R and FPI 2016 Scholarship reference BES-2016-077738 (FEDER Founds).

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. 1.
    Abbasi A, Sarker S, Chiang RH. Big data research in information systems: toward an inclusive research agenda. J Assoc Inf Syst 2016;17(2):1–32.Google Scholar
  2. 2.
    Aljarah I, Alam AZ, Faris H, Hassonah MA, Mirjalili S, Saadeh H. Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cogn Comput 2018; 10(3):478–495.Google Scholar
  3. 3.
    Antonelli M, Bernardo D, Hagras H, Marcelloni F. Multiobjective evolutionary optimization of type-2 fuzzy rule-based systems for financial data classification. IEEE Trans Fuzzy Syst 2017;25(2):249–264.Google Scholar
  4. 4.
    Asuncion A, Newman DJ. 2007. UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html.
  5. 5.
    Babaei M, Sheidaii M. Desirability-based design of space structures using genetic algorithm and fuzzy logic. International Journal of Civil Engineering 2017;15(2):231–245.Google Scholar
  6. 6.
    Bailey J, Manoukian T, Ramamohanarao K. Fast algorithms for mining emerging patterns. Principles of data mining and knowledge discovery. Berlin: Springer; 2002. p. 187–208.Google Scholar
  7. 7.
    Bethea R, Duran B, Boullion T. 1995. Statistical methods for engineers and scientists.Google Scholar
  8. 8.
    Beyer MA, Laney D. 2012. The importance of ‘big data’: a definition.Google Scholar
  9. 9.
    Carmona CJ, Chrysostomou C, Seker H, del Jesus MJ. Fuzzy rules for describing subgroups from influenza a virus using a multi-objective evolutionary algorithm. Appl Soft Comput 2013;13(8):3439–3448.Google Scholar
  10. 10.
    Carmona CJ, González P, García-Domingo B, del Jesus MJ, Aguilera J. MEFES: An evolutionary proposal for the detection of exceptions in subgroup discovery. An application to Concentrating Photovoltaic Technology. Knowl-Based Syst 2013;54:73–85.Google Scholar
  11. 11.
    Carmona CJ, González P, del Jesus MJ, Herrera F. NMEEF-SD: non-dominated multi-objective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Trans Fuzzy Syst 2010;18(5):958–970.Google Scholar
  12. 12.
    Carmona CJ, González P, del Jesus MJ, Navío M, Jiménez L. Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department. Soft Comput 2011;15(12):2435–2448.Google Scholar
  13. 13.
    Carmona CJ, del Jesus MJ, Herrera F. A unifying analysis for the supervised descriptive rule discovery via the weighted relative accuracy. Knowl-Based Syst 2018;139:89–100.Google Scholar
  14. 14.
    Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, del Jesus MJ, García S. Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Systems with Applications 2012;39: 11,243–11,249.Google Scholar
  15. 15.
    Carmona CJ, Ruiz-Rodado V, del Jesus MJ, Weber A, Grootveld M, González P, Elizondo D. A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans. Inf Sci 2015;298:180–197.Google Scholar
  16. 16.
    Casillas J, Carse B, Bull L. Fuzzy-XCS: a michigan genetic fuzzy system. IEEE Trans Fuzzy Syst 2007; 15(4):536–550.Google Scholar
  17. 17.
    Chakraborty S, Dey N, Samanta S, Ashour AS, Barna C, Balas M. Optimization of non-rigid demons registration using cuckoo search algorithm. Cogn Comput 2017;9(6):817–826.Google Scholar
  18. 18.
    Chi Z, Yan H, Pham T. 1996. Fuzzy algorithms: with applications to image processing and pattern recognition, vol 10 World Scientific.Google Scholar
  19. 19.
    Cordón O, Herrera F, Hoffmann F, Magdalena L. 2001. Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases world scientific.Google Scholar
  20. 20.
    Cordón O., del Jesus MJ, Herrera F, Lozano M. MOGUL: A methodology To obtain genetic fuzzy rule-based systems under the iterative rule learning approach. Int J Intell Syst 1999;14:1123–1153.Google Scholar
  21. 21.
    Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Operating systems design and implementation (OSDI); 2004. p. 137–150.Google Scholar
  22. 22.
    Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Commun ACM 2008;51(1): 107–113.Google Scholar
  23. 23.
    Dean J, Ghemawat S. Mapreduce: A flexible data processing tool. Commun ACM 2010;53(1):72–77.Google Scholar
  24. 24.
    Deb K. Multi-objective optimization using evolutionary algorithms. Hoboken: Willey; 2001.Google Scholar
  25. 25.
    Deb K, Pratap A, Agrawal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 2002;6(2):182–197.Google Scholar
  26. 26.
    DeJong K, Spears W, Gordon DF. Using genetic algorithms for concept learning. Mach Learn 1997;13 (2):161–188.Google Scholar
  27. 27.
    Dheeru D, Karra Taniskidou E. 2017. UCI machine learning repository. http://archive.ics.uci.edu/ml.
  28. 28.
    Dong GZ, Li JY. Efficient mining of emerging patterns: discovering trends and differences. Proc of the 5th ACM SIGKDD international conference on knowledge discovery and data mining. New York : ACM Press; 1999. p. 43–52.Google Scholar
  29. 29.
    Dong GZ, Zhang X, Wong L, Li JY. CAEP: Classification By aggregating emerging patterns. Proc of the discovery science, LNCS. Berlin: Springer; 1999. p. 30–42.Google Scholar
  30. 30.
    Elkano M, Galar M, Sanz J, Bustince H. Chi-bd: a fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst 2018;348:75–101.Google Scholar
  31. 31.
    Eshelman LJ. 1991. Foundations of genetic algorithms, chap. The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination, pp 265–283.Google Scholar
  32. 32.
    Fan H, Ramamohanarao K. Efficiently mining interesting emerging patterns. Proc of the 4th international conference on web-age information management; 2003. p. 189–201.Google Scholar
  33. 33.
    Fan H, Ramamohanarao K. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans Knowl Data Eng 2006;18(6):721–737.Google Scholar
  34. 34.
    Fayyad UM, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery: an overview. Advances in knowledge discovery and data mining. Palo Alto: AAAI/MIT Press; 1996. p. 1–34.Google Scholar
  35. 35.
    Fernández A, Altalhi A, Alshomrani S, Herrera F. Why linguistic fuzzy rule based classification systems perform well in big data applications. Int J Comput Intell Syst 2017;10(1):1211–1225.Google Scholar
  36. 36.
    Fernández A, Carmona CJ, del Jesus MJ, Herrera F. A view on fuzzy systems for big data: progress and opportunities. International Journal of Computational Intelligence Systems 2016;9(1):69–80.Google Scholar
  37. 37.
    Fernández A, Río S, López V, Bawakid A, del Jesus M, Benítez J, Herrera F. Big data with cloud computing: an insight on the computing environment, mapreduce and programming frameworks. WIREs Data Mining and Knowledge Discovery 2014;5(4):380–409.Google Scholar
  38. 38.
    Fogel DB. 1995. Evolutionary computation - toward a new philosophy of machine intelligence. IEEE Press.Google Scholar
  39. 39.
    Gamberger D, Lavrac N. Expert-guided subgroup discovery: methodology and application. J Artif Intell Res 2002;17:501–527.Google Scholar
  40. 40.
    García-Borroto M, Martínez-Trinidad J, Carrasco-Ochoa J. Fuzzy emerging patterns for classifying hard domains. Knowl Inf Syst 2011;28(2):473–489.Google Scholar
  41. 41.
    García-Borroto M, Martínez-Trinidad JF, Carrasco-Ochoa JA. A survery of emerging patters for supervised classification. Artif Intell Rev 2014;42(4):705–721.Google Scholar
  42. 42.
    García-Borroto M, Martínez-Trinidad JF, Carrasco-Ochoa JA, Medina-Pérez MA, Ruiz-Shulcloper J. LCMine: an efficient algorithm for mining discriminative regularities and its application in supervised classifications. Pattern Recogn 2010;43(9):3025–3034.Google Scholar
  43. 43.
    García-Vico AM, Carmona CJ, González P, del Jesus MJ. Moea-efep: Multi-objective evolutionary algorithm for extracting fuzzy emerging patterns. IEEE Transactions on Fuzzy Systems (In Press).Google Scholar
  44. 44.
    García-Vico A, Carmona C, Martín D., García-Borroto M, del Jesus M. An overview of emerging pattern mining in supervised descriptive rule discovery: taxonomy, empirical study, trends, and prospects. WIREs Data Mining Knowl Discov. 2018;8:e1231.  https://doi.org/10.1002/widm.1231.
  45. 45.
    García-Vico AM, González P, del Jesus MJ, Carmona CJ. A first approach to handle emergining patterns mining on big data problems: the evaefp-spark algorithm. IEEE International conference on fuzzy systems; 2017. p. 1–6.Google Scholar
  46. 46.
    García-Vico AM, Montes J, Aguilera J, Carmona CJ, del Jesus MJ. Analysing concentrating photovoltaics technology through the use of emerging pattern mining. Proc of the 11th international conference on soft computing models in industrial and environmental applications. Berlin: Springer; 2016. p. 1–8.Google Scholar
  47. 47.
    Geng L, Hamilton HJ. Interestingness measures for data mining: a survey. ACM Comput Surv (CSUR) 2006; 38(3):9.Google Scholar
  48. 48.
    Goldberg DE. 1989. Genetic algorithms in search, optimization and machine learning. Addison-wesley Longman Publishing Co. Inc.Google Scholar
  49. 49.
    Herrera F. Genetic fuzzy systems: taxomony, current research trends and prospects. Evol Intel 2008;1:27–46.Google Scholar
  50. 50.
    Holland JH. Adaptation in natural and artificial systems. Cambridge: University of Michigan Press; 1975.Google Scholar
  51. 51.
    Huang HC, Chiang CH. Backstepping holonomic tracking control of wheeled robots using an evolutionary fuzzy system with qualified ant colony optimization. Int J Fuzzy Syst 2016;18(1):28–40.Google Scholar
  52. 52.
    Hüllermeier E. Fuzzy methods in machine learning and data mining: status and prospects. Fuzzy Sets Syst 2005;156(3):387–406.Google Scholar
  53. 53.
    Hüllermeier E. Fuzzy sets in machine learning and data mining. Appl Soft Comput 2011;11(2):1493–1505.Google Scholar
  54. 54.
    Ishibuchi H, Tsukamoto N, Hitotsuyanagi Y, Nojima Y. Effectiveness of scalability improvement attempts on the performance of nsga-ii for many-objective problems. Proceedings of the 10th annual conference on genetic and evolutionary computation (GECCO ’08); 2008. p. 649–656.Google Scholar
  55. 55.
    del Jesus MJ, González P, Herrera F, Mesonero M. Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing. IEEE Trans Fuzzy Syst 2007;15(4):578–592.Google Scholar
  56. 56.
    Kloesgen W. Explora: a multipattern and multistrategy discovery assistant. Advances in knowledge discovery and data mining, pp 249–271. American association for artificial intelligence; 1996.Google Scholar
  57. 57.
    Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge: MIT Press; 1992.Google Scholar
  58. 58.
    Kralj-Novak P, Lavrac N, Webb GI. Supervised descriptive rule discovery: a unifying survey of constrast set, emerging pattern and subgroup mining. J Mach Learn Res 2009;10:377–403.Google Scholar
  59. 59.
    Larson D, Chang V. A review and future direction of agile, business intelligence, analytics and data science. Int J Inf Manag 2016;36(5):700–710.Google Scholar
  60. 60.
    Leung KS, Leung Y, So L, Yam KF. Rule learning in expert systems using genetic algorithm: 1, concepts. Proc of the 2nd international conference on fuzzy logic and neural networks. In: Jizuka K, editors; 1992. p. 201–204.Google Scholar
  61. 61.
    Li G, Law R, Vu HQ, Rong J, Zhao XR. Identifying emerging hotel preferences using emerging pattern mining technique. Tour Manag 2015;46:311–321.Google Scholar
  62. 62.
    Li JY, Dong GZ, Ramamohanarao K, Wong L. DeEPs: a new instance-based lazy discovery and classification system. Mach Learn 2004;54(2):99–124.Google Scholar
  63. 63.
    Lin J. Mapreduce is good enough? if all you have is a hammer, throw away everything that’s not a nail!. Big Data 2013;1(1):28–37.PubMedGoogle Scholar
  64. 64.
    Liu Q, Shi P, Hu Z, Zhang Y. A novel approach of mining strong jumping emerging patterns based on BSC-tree. Int J Syst Sci 2014;45(3):598–615.Google Scholar
  65. 65.
    Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M. Effect of class imbalance on quality measures for contrast patterns: an experimental study. Inf Sci 2016;374:179–192.Google Scholar
  66. 66.
    Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M. Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases. Neurocomputing 2016; 175:935–947.Google Scholar
  67. 67.
    Loyola-González O, Medina-Pérez MA, Martínez-Trinidad JF, Carrasco-Ochoa JA, Monroy R, García-Borroto M. Pbc4cip: a new contrast pattern-based classifier for class imbalance problems. Knowl-Based Syst 2017;115:100–109.Google Scholar
  68. 68.
    L’heureux A, Grolinger K, Elyamany HF, Capretz MA. Machine learning with big data: challenges and approaches. IEEE Access 2017;5(5):777–797.Google Scholar
  69. 69.
    Martens D, Baesens B, Van Gestel T, Vanthienen J. Comprehensible credit scoring models using rule extraction from support vector machines. Eur J Oper Res 2007;183(3):1466–1476.Google Scholar
  70. 70.
    Métivier JP, Lepailleur A, Buzmakov A, Poezevara G, Crémilleux B, Kuznetsov SO, Goff JL, Napoli A, Bureau R, Cuissart B. Discovering structural alerts for mutagenicity using stable emerging molecular patterns. J Chem Inf Model 2015;55(5):925–940.PubMedGoogle Scholar
  71. 71.
    Michalski RS, Stepp R. Revealing conceptual structure in data by inductive inference. Machine Intelligence 1982;10:173–196.Google Scholar
  72. 72.
    Miller BL, Goldberg DE. Genetic algorithms, tournament selection, and the effects of noise. Complex System 1995;9:193–212.Google Scholar
  73. 73.
    Molina D, LaTorre A, Herrera F. 2018. An insight into bio-inspired and evolutionary algorithms for global optimization: review, analysis, and lessons learnt over a decade of competitions. Cognitive Computation, pp 1–28.Google Scholar
  74. 74.
    Nie Y, Wang H, Lu X, Qin Y. Parallel emerging patterns in microarray. Proc of the 6th intelligent human-machine systems and cybernetics; 2014. p. 82–85.Google Scholar
  75. 75.
    Onieva E, Hernandez-Jayo U, Osaba E, Perallos A, Zhang X. A multi-objective evolutionary algorithm for the tuning of fuzzy rule bases for uncoordinated intersections in autonomous driving. Inf Sci 2015;321: 14–30.Google Scholar
  76. 76.
    Padillo F, Luna JM, Herrera F, Ventura S. 2018. Mining association rules on big data through mapreduce genetic programming. Integrated Computer-Aided Engineering (In Press), 1–19.Google Scholar
  77. 77.
    Padillo F, Luna JM, Ventura S. An evolutionary algorithm for mining rare association rules: a big data approach. 2017 IEEE Congress on evolutionary computation (CEC); 2017. p. 2007–2014.Google Scholar
  78. 78.
    Peralta D, Río S, Ramíez-Gallego S, Triguero I, Beníez JM, Herrera F. Evolutionary feature selection for big Data classification: a mapreduce approach. Math Probl Eng 2015;2015:1–11.Google Scholar
  79. 79.
    Pulgar-Rubio F, Rivera-Rivas AJ, Pérez-Godoy MD, González P, Carmona CJ, Del Jesus MJ. MEFASD-BD: multi-objective evolutionary fuzzy algorithm for subgroup discovery in big data environments - a mapreduce solution. Knowl-Based Syst 2017;117:70–78.Google Scholar
  80. 80.
    Ramamohanarao K, Fan H. Patterns based classifiers. World Wide Web 2007;10(1):71–83 .Google Scholar
  81. 81.
    Ramírez-Gallego S, Fernández A., García S, Chen M, Herrera F. Big data: tutorial and guidelines on information and process fusion for analytics algorithms with mapreduce. Information Fusion 2018;42: 51–61.Google Scholar
  82. 82.
    Ramírez-Gallego S, García S, Benítez J, Herrera F. A distributed evolutionary multivariate discretizer for big data processing on apache spark. Swarm Evol Comput 2018;38:240–250.Google Scholar
  83. 83.
    del Río S, López V, Benítez JM, Herrera F. A mapreduce approach to address big data classification problems based on the fusion of linguistic fuzzy rules. International Journal of Computational Intelligence Systems 2015;8(3):422–437.Google Scholar
  84. 84.
    Rodríguez-Fdez I, Mucientes M, Bugarín A. FRULER: Fuzzy rule learning through evolution for regression. Inf Sci 2016;354:1–18.Google Scholar
  85. 85.
    Ruiz E, Casillas J. Adaptive fuzzy partitions for evolving association rules in big data stream. Int J Approx Reason 2018;93:463–486.Google Scholar
  86. 86.
    Sanz JA, Bernardo D, Herrera F, Bustince H, Hagras H. A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data. IEEE Trans Fuzzy Syst 2015;23(4):973–990.Google Scholar
  87. 87.
    Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. Proceedings of the 2010 IEEE 26th symposium on mass storage systems and technologies (MSST2010); 2010. p. 1–10.Google Scholar
  88. 88.
    sSiddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015;7(6):706–714.Google Scholar
  89. 89.
    Storn R, Price K. 1995. Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. Tech. Rep TR-95-012.Google Scholar
  90. 90.
    Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst Man Cybern 1985;15(1):116–132.Google Scholar
  91. 91.
    Tan PN, Kumar V, Srivastava J. Selecting the right objective measure for association analysis. Inf Syst 2004;29(4):293–313. Knowledge Discovery and Data Mining (KDD 2002).Google Scholar
  92. 92.
    Terlecki P, Walczak K. Efficient discovery of Top-K minimal jumping emerging patterns. Proc of the 6th international conference rough sets and current trends in computing. Berlin: Springer; 2008. p. 438–447.Google Scholar
  93. 93.
    Wang L, Wang Y, Zhao D. Building emerging pattern (ep) random forest for recognition. Proc of the 17th IEEE international conference on image processing; 2010. p. 1457–1460.Google Scholar
  94. 94.
    Wang Z, Fan H, Ramamohanarao K. Exploiting maximal emerging patterns for classification. Proc of the 17th australian joint conference on artificial intelligence, LNCS. Berlin: Springer; 2005. p. 1062–1068.Google Scholar
  95. 95.
    Wixom B, Ariyachandra T, Douglas DE, Goul M, Gupta B, Iyer LS, Kulkarni UR, Mooney JG, Phillips-Wren GE, Turetken O. The current state of business intelligence in academia: the arrival of big data. Commun Assoc Inf Syst 2014;34(1):1–13.Google Scholar
  96. 96.
    Wong ML, Leung KS. Data mining using grammar based genetic programming and applications. Dordrecht: Kluwer Academics Publishers; 2000.Google Scholar
  97. 97.
    Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB, Vasilakos AV. Big data: from beginning to future. Int J Inf Manag 2016;36(6):1231–1247.Google Scholar
  98. 98.
    Yu Y, Yan K, Zhu X, Wang G. Detecting of PIU behaviors based on discovered generators and emerging patterns from Computer-Mediated interaction events. Proc of the 15th international conference on web-age information management, LNCS. Amsterdam: Elsevier; 2014. p. 277–293.Google Scholar
  99. 99.
    Zadeh LA. The concept of a linguistic variable and its applications to approximate reasoning. Parts I, II, III. Inf Sci 1975;8-9:199–249,301–357, 43–80.Google Scholar
  100. 100.
    Zadeh LA. Soft computing and fuzzy logic. IEEE Softw 1994;11(6):48–56.Google Scholar
  101. 101.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin M, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. Proceedings of the 9th USENIX symposium on networked systems design and implementation; 2012.Google Scholar
  102. 102.
    Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: Cluster computing with working sets. Proceedings of the 2nd USENIX conference on hot topics in cloud computing; 2010. p. 10–10.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science, Interuniversity Andalusian Institute on Data Science and Computational IntelligenceUniversity of JaénJaénSpain
  2. 2.Leicester School of PharmacyDe Montfort UniversityLeicesterUK

Personalised recommendations