Advertisement

Scalability in Pattern Mining

  • Sebastián Ventura
  • José María Luna
Chapter

Abstract

The pattern mining task is the keystone of data analysis, describing and representing any type of homogeneity and regularity in data. Abundant research studies have been dedicated to this task, providing overwhelming improvements in both efficiency and scalability. Nevertheless, the growing interest in data collection is giving rise to extremely large datasets that hinder the mining process. Thus, it is essential to provide solutions to the challenges derived from the processing of such high dimensional datasets in an efficient way. This chapter aims to describe different ways of speeding up the pattern mining process, presenting some traditional methods for handling very large data collections, and new trends in the mining of patterns in Big Data.

Keywords

Graphic Processing Unit Association Rule Pattern Mining Frequent Itemsets Mining Association Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    D. J. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD Conference, pages 671–682, Chicago, Illinois, USA, 2006.Google Scholar
  2. 2.
    S. H. Adil and S. Qamar. Implementation of association rule mining using CUDA. In Proceedings of the 2009 International Conference on Emerging Technologies, ICET 2009, pages 332–336, Islamabad, Pakistan, 2009.Google Scholar
  3. 3.
    C. C. Aggarwal and J. Han. Frequent Pattern Mining. Springer International Publishing, 2014.CrossRefzbMATHGoogle Scholar
  4. 4.
    R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD Conference ’93, pages 207–216, Washington, DC, USA, 1993.Google Scholar
  5. 5.
    E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 6(5):443–462, 2002.CrossRefGoogle Scholar
  6. 6.
    A. Cano, J. M. Luna, and S. Ventura. High performance evaluation of evolutionary-mined association rules on gpus. The Journal of Supercomputing, 66(3):1438–1461, 2013.CrossRefGoogle Scholar
  7. 7.
    C. J. Carmona, P. González, M. J. del Jesus, and F. Herrera. NMEEF-SD: Non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery. IEEE Transactions on Fuzzy Systems, 18(5):958–970, 2010.CrossRefGoogle Scholar
  8. 8.
    C. J. Carmona, P. González, M. J. del Jesus, and F. Herrera. Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(2): 87–103, 2014.Google Scholar
  9. 9.
    J. M. Cecilia, A. Nisbet, M. Amos, J. M. García, and M. Ujaldón. Enhancing GPU parallelism in nature-inspired algorithms. Journal of Supercomputing, 63(3):773–789, 2013.CrossRefGoogle Scholar
  10. 10.
    Q. Cui and X. Guo. Research on parallel association rules mining on GPU. In Proceedings of the 2nd International Conference on Green Communications and Networks, GCN 2012, pages 215–222, Gandia, Spain, 2012.Google Scholar
  11. 11.
    J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.CrossRefGoogle Scholar
  12. 12.
    M. J. del Jesús, J. A. Gámez, P. González, and J. M. Puerta. On the discovery of association rules by means of evolutionary algorithms. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(5):397–415, 2011.Google Scholar
  13. 13.
    Y. Feng, M. Ji, J. Xiao, X. Yang, J. J. Zhang, Y. Zhuang, and X. Li. Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Transactions on Cybernetics, 45(12):2693–2706, 2015.CrossRefGoogle Scholar
  14. 14.
    A. A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag Berlin Heidelberg, 2002.CrossRefzbMATHGoogle Scholar
  15. 15.
    H. Gao, S. Shiji, J.N.D. Gupta, and W. Cheng. Semi-supervised and unsupervised extreme learning machines. IEEE Transactions on Cybernetics, 44(12):2405–2417, 2014.CrossRefGoogle Scholar
  16. 16.
    M. Gendreau and J. Potvin. Handbook of Metaheuristics. Springer Publishing Company, Incorporated, 2nd edition, 2010.CrossRefzbMATHGoogle Scholar
  17. 17.
    T. George, M. Nathan, M. Wagner, and F. Renato. Tree projection-based frequent itemset mining on multi-core CPUs and GPUs. In Proceedings of the 22nd International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2010, pages 47–54, Petrópolis, Brazil, October 2010.Google Scholar
  18. 18.
    S. Ghemawat, H. Gobioff, and S. Leung. The google file system. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP ’03, pages 29–43, New York, NY, USA, 2003. ACM.Google Scholar
  19. 19.
    B. Goethals and M.J. Zaki. Advances in frequent itemset mining implementations: report on fimi’03. ACM SIGKDD Explorations Newsletter, 6(1):109–117, 2004.CrossRefGoogle Scholar
  20. 20.
    B. Goethals, S. Moens, and J. Vreeken. MIME: A Framework for Interactive Visual Pattern Mining. In D. Gunopulos, T. Hofmann, D. Malerba, and M. Vazirgiannis, editors, Machine Learning and Knowledge Discovery in Databases, volume 6913 of Lecture Notes in Computer Science, pages 634–637. Springer Berlin Heidelberg, 2011.Google Scholar
  21. 21.
    R. C. Green II, L. Wang, M. Alam, and R. A. Formato. Central force optimization on a GPU: A case study in high performance metaheuristics. Journal of Supercomputing, 62(1):378–398, 2012.CrossRefGoogle Scholar
  22. 22.
    J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.zbMATHGoogle Scholar
  23. 23.
    J. Han, J. Pei, Y. Yin, and R. Mao. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery, 8:53–87, 2004.MathSciNetCrossRefGoogle Scholar
  24. 24.
    H. Kwasnicka and K. Switalski. Discovery of association rules from medical data: classical and evolutionary approaches. Annales UMCS, Informatica, 4(1):204–217, 2006.Google Scholar
  25. 25.
    W. B. Langdon. Performing with CUDA. In Proceedings of the 13th annual Genetic and Evolutionary Computation Conference, GECCO 2011, pages 423–430, Dublin, Ireland, 2011.Google Scholar
  26. 26.
    R. W. P. Luk and W. Lam. Efficient in-memory extensible inverted file. Information Systems, 32(5):733–754, 2007.CrossRefGoogle Scholar
  27. 27.
    J. M. Luna, J. R. Romero, and S. Ventura. G3PARM: A Grammar Guided Genetic Programming Algorithm for Mining Association Rules. In Proceedings of the IEEE Congress on Evolutionary Computation, IEEE CEC 2010, pages 2586–2593, Barcelona, Spain, 2010.Google Scholar
  28. 28.
    J. M. Luna, J. R. Romero, and S. Ventura. Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowledge and Information Systems, 32(1):53–76, 2012.CrossRefGoogle Scholar
  29. 29.
    J. M. Luna, J. R. Romero, C. Romero, and S. Ventura. Reducing gaps in quantitative association rules: a genetic programming free-parameter algorithm. Integrated Computer Aided Engineering, 21(4):321–337, 2014.Google Scholar
  30. 30.
    J. M. Luna, A. Cano, M. Pechenizkiy, and S. Ventura. Speeding-Up Association Rule Mining With Inverted Index Compression. IEEE Transactions on Cybernetics, pp(99):1–14, 2016.Google Scholar
  31. 31.
    D. Martín, A. Rosete, J. Alcalá, and F. Herrera. A new multiobjective evolutionary algorithm for mining a reduced set of interesting positive and negative quantitative association rules. IEEE Transactions on Evolutionary Computation, 18(1):54–69, 2014.CrossRefGoogle Scholar
  32. 32.
    D. Martín, A. Rosete, J. Alcalá-Fdez, and F. Herrera. Qar-cip-nsga-ii: A new multi-objective evolutionary algorithm to mine quantitative association rules. Information Sciences, 258:1–28, 2014.MathSciNetCrossRefGoogle Scholar
  33. 33.
    D. Martín, M. Martínez-Ballesteros, S. Río, J. Alcalá-Fdez, J. Riquelme, and F. Herrera. MOPNAR-BigData: un diseno MapReduce para la extracción de reglas de asociación cuantitativas en problemas de Big Data. In Actas de la XVI Conferencia de la Asociación Española para la Inteligencia Artificial, CAEPIA 2015, pages 979–989, Albacete, Spain, November 2015.Google Scholar
  34. 34.
    M. Martinez-Ballesteros, S. Salcedo-Sanz, J. C. Riquelme, C. Casanova-Mateo, and J. L. Camacho. Evolutionary association rules for total ozone content modeling from satellite observations. Chemometrics and Intelligent Laboratory Systems, 109(2):217–227, 2011.CrossRefGoogle Scholar
  35. 35.
    V. Marx. The big challenges of big data. Nature, 498(7453):255–260, 2013.CrossRefGoogle Scholar
  36. 36.
    J. Mata, J. L. Alvarez, and J. C. Riquelme. Mining numeric association rules with genetic algorithms. In Proceedings of the 5th International Conference on Artificial Neural Networks and Genetic Algorithms, ICANNGA 2001, pages 264–267, Taipei, Taiwan, 2001.Google Scholar
  37. 37.
    J. Mata, J. L. Alvarez, and J. C. Riquelme. Discovering numeric association rules via evolutionary algorithm. In Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2002, pages 40–51, Taipei, Taiwan, 2002.Google Scholar
  38. 38.
    R. McKay, N. Hoai, P. Whigham, Y. Shan, and M. O’Neill. Grammar-based Genetic Programming: a Survey. Genetic Programming and Evolvable Machines, 11:365–396, 2010.CrossRefGoogle Scholar
  39. 39.
    S. Moens, E. Aksehirli, and B. Goethals. Frequent itemset mining for big data. In Proceedings of the 2013 IEEE International Conference on Big Data, pages 111–118, Santa Clara, CA, USA, October 2013.Google Scholar
  40. 40.
    J. L. Olmo, J. M. Luna, J. R. Romero, and S. Ventura. Mining association rules with single and multi-objective grammar guided ant programming. Integrated Computer-Aided Engineering, 20(3):217–234, 2013.Google Scholar
  41. 41.
    F. Pulgar-Rubio, C. J. Carmona, A. J. Rivera-Rivas, P. González, and M. J. del Jesus. MUna primera aproximación al descubrimiento de subgrupos bajo el paradigma MapReduce. In Actas de la XVI Conferencia de la Asociación Española para la Inteligencia Artificial, CAEPIA 2015, pages 991–1000, Albacete, Spain, November 2015.Google Scholar
  42. 42.
    Y. Qian, J. Liang, W. Pedrycz, and C. Dang. An efficient accelerator for attribute reduction from incomplete data in rough set framework. Pattern Recognition, 44(8):1658–1670, 2011.CrossRefzbMATHGoogle Scholar
  43. 43.
    D. Wegener, M. Mock, D. Adranale, and S. Wrobel. Toolkit-based high-performance data mining of large data on MapReduce clusters. In Proceedings of the IEEE International Conference on Data Mining, ICDM 2009, pages 296–301, Miami, Florida, USA, 2009.Google Scholar
  44. 44.
    F. Wenbin, L. Mian, X. Xiangye, H. Bingsheng, and L. Qiong. Frequent itemset mining on graphics processors. In Proceedings of the 5th International Workshop on Data Management on New Hardware, DaMoN ’09, pages 34–42, Providence, Rhode Island, 2009.Google Scholar
  45. 45.
    X. Yan, C. Zhang, and S. Zhang. Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support. Expert Systems with Appications, 36:3066–3076, 2009.CrossRefGoogle Scholar
  46. 46.
    M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3):372–390, 2000.MathSciNetCrossRefGoogle Scholar
  47. 47.
    C. Zhang and S. Zhang. Association rule mining: models and algorithms. Springer Berlin / Heidelberg, 2002.CrossRefzbMATHGoogle Scholar
  48. 48.
    J. Zhou, K. M. Yu, and B. C. Wu. Parallel frequent patters mining algorithm on GPU. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, SMC 2010, pages 435–440, Istanbul, Turkey, 2010.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Sebastián Ventura
    • 1
  • José María Luna
    • 1
  1. 1.Department of Computer Science and Numerical AnalysisUniversity of CordobaCordobaSpain

Personalised recommendations