Advertisement

Cluster Computing

, Volume 18, Issue 2, pp 577–585 | Cite as

Mining association rules in big data with NGEP

  • Yunliang Chen
  • Fangyuan Li
  • Junqing Fan
Article

Abstract

Analyses and applications of big data require special technologies to efficiently process large number of data. Mining association rules focus on obtaining relations between data. When mining association rules in big data, conventional methods encounter severe problems incurred by the tremendous cost of computing and inefficiency to achieve the goal. This study proposes an evolutionary algorithm to address these problems, namely Niche-Aided Gene Expression Programming (NGEP). The NGEP algorithm (1) divides individuals to several niches to evolve separately and fuses selected niches according to the similarities of the best individuals to ensure the dispersibility of chromosomes, and (2) adjusts the fitness function to adapt to the needs of the underlying applications. A number of experiments have been performed to compare NGEP with the FP-Growth and Apriori algorithms to evaluate the NGEP’s performance in mining association rules with a dataset of measurement for environment pressure (Iris dataset) and an Artificial Simulation Database (ASD). Experimental results indicate that NGEP can efficiently achieve more association rules (36 vs. 33 vs. 25 in Iris dataset experiments and 57 vs. 44 vs. 44 in ASD experiments) with a higher accuracy rate (74.8 vs. 53.2 vs. 50.6 % in Iris dataset experiments and 95.8 vs. 77.4 vs. 80.3 % in ASD experiments) and the time of computing is also much less than the other two methods.

Keywords

Big data Association rules Gene expression programming Niche 

Notes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Nos. 61272314, 61361120098, 61440018), the China Postdoctoral Science Foundation (2014M552112), the Hubei Natural Science Foundation (No. 2014CF-B904).

References

  1. 1.
    Lizhe, W., Ke, L., Peng, L., et al.: IK-SVD: dictionary learning for spatial big data via incremental atom update. Comput. Sci. Eng. 16(4), 41–52 (2014)CrossRefGoogle Scholar
  2. 2.
    Barnes, J.: Data, data, everywhere. ITS Int. 20(1), 44–49 (2014)Google Scholar
  3. 3.
    Deng, Z., Wu, X., Wang, L., et al.: Parallel processing of dynamic continuous queries over streaming data flows. IEEE Trans. Parallel Distrib. Syst. (2014). doi: 10.1109/TPDS.2014.2311811
  4. 4.
    Chen, D., Wang, L., Wu, X., et al.: Hybrid modeling and simulation of huge crowd over a hierarchical grid architecture. Future Gener. Comput. Syst. 29(5), 1309–1317 (2013)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Chen, D., Wang, L., Zomaya, A., et al.: Parallel simulation of complex evacuation scenarios with adaptive agent models. IEEE Trans. Parallel Distrib. Syst. (2014). doi: 10.1109/TPDS.2014.2311805
  6. 6.
    Xue, W., Yang, C., Fu, H. et al.: Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. In: Proceedings of the 28th International Parallel and Distributed Processing Symposium (2014). IEEEGoogle Scholar
  7. 7.
    Zhao, J., Wang, L., Tao, J., et al.: A security framework in G-Hadoop for big data computing across distributed cloud data centres. J. Comput. Syst. Sci. 80(5), 994–1007 (2014)CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    Chen, D., Turner, S.J., Cai, W., et al.: Synchronization in federation community networks. J. Parallel Distrib. Comput. 70(2), 144–159 (2010)CrossRefMATHGoogle Scholar
  9. 9.
    Ma, Y., Wang, L., Liu, D., et al.: Distributed data structure templates for data-intensive remote sensing applications. Concurr. Comput. Prac. Exper. 25(12), 1784–1797 (2013)CrossRefGoogle Scholar
  10. 10.
    Ma, Y., Wang, L., Zomaya, A., et al.: Task-tree based large-scale Mosaicking for remote sensed imageries with dynamic DAG scheduling. IEEE Trans. Parallel Distrib. Syst. 25(8), 2126–2137 (2013)CrossRefGoogle Scholar
  11. 11.
    Wang, L., von Laszewski, G., Younge, A., et al.: Cloud computing: a perspective study. New Gener. Comput. 28(2), 137–146 (2010)CrossRefMATHGoogle Scholar
  12. 12.
    Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W.J. (eds.) Knowledge Discovery in Databases, pp. 229–248. AAAI Press (1991)Google Scholar
  13. 13.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD Record (1993)Google Scholar
  14. 14.
    Li, L., Xue, W., Ranjan, R., et al.: A scalable Helmholtz solver in GRAPES over large-scale multicore cluster. Concurr. Comput. Prac. Exper. 25(12), 1722–1737 (2013)CrossRefGoogle Scholar
  15. 15.
    Chen, D., Li, X., Cui, D., Wang, L., Lu, D.: Global synchronization measurement of multivariate neural signals with massively parallel nonlinear interdependence analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 22(1), 33–43 (2014)Google Scholar
  16. 16.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB (1994)Google Scholar
  17. 17.
    Duru, N.: An application of apriori algorithm on a diabetic database. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 398–404. Springer, Berlin (2005)Google Scholar
  18. 18.
    Aflori, C., Craus, M.: Grid implementation of the Apriori algorithm. Adv. Eng. Softw. 38(5), 295–300 (2007)CrossRefGoogle Scholar
  19. 19.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)Google Scholar
  21. 21.
    Shaheen, M., Shahbaz, M., Guergachi, A.: Context based positive and negative spatio-temporal association rule mining. Knowledge-Based Syst. 37, 261–273 (2013)CrossRefGoogle Scholar
  22. 22.
    Deng, Z.-H., Lv, S.-L.: Fast mining frequent itemsets using Nodesets. Exper. Syst. Appl. 41(10), 4505–4512 (2014)CrossRefGoogle Scholar
  23. 23.
    Deng, Z., Wang, Z., Jiang, J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inform. Sci. 55(9), 2008–2030 (2012)CrossRefMATHMathSciNetGoogle Scholar
  24. 24.
    Deng, Z., Wang, Z.: A new fast vertical method for mining frequent patterns. Int. J. Comput. Intell. Syst. 3(6), 733–744 (2010)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Romão, W., Freitas, A.A., Gimenes, I.M.D.S.: Discovering interesting knowledge from a science and technology database with a genetic algorithm. Appl. Soft Comput. 4(2), 121–137 (2004)Google Scholar
  26. 26.
    Kołodziej, J., González-Vélez, H., Wang, L.: Advances in data-intensive modelling and simulation. Future Gener. Comput. Syst. 37, 282–283 (2014)CrossRefGoogle Scholar
  27. 27.
    Chen, D., Li, D., Xiong, M., et al.: GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia. IEEE Trans. Inform. Technol. Biomed. 14(6), 1417–1427 (2010)CrossRefGoogle Scholar
  28. 28.
    Ferreira, C.: Gene expression programming: a new adaptive algorithm for solving problems. arXiv:cs/0102027 (2001)
  29. 29.
    Chen, Y., Chen, D., Khan, S.U., et al.: Solving symbolic regression problems with uniform design-aided gene expression programming. J. Supercomput. 66(3), 1553–1575 (2013)CrossRefMathSciNetGoogle Scholar
  30. 30.
    Wei, W., Wang, Q., Wang, H., et al.: The feature extraction of nonparametric curves based on niche genetic algorithms and multi-population competition. Pattern Recognit. Lett. 26(10), 1483–1497 (2005)CrossRefGoogle Scholar
  31. 31.
    Ferreira, C.: Mutation, transposition, and recombination: an analysis of the evolutionary dynamics. In: 4th International Workshop on Frontiers in Evolutionary Algorithms (2002)Google Scholar
  32. 32.
    Wang, L., Chen, D., Hu, Y., et al.: Towards enabling cyberinfrastructure as a service in clouds. Comput. Electr. Eng. 39(1), 3–14 (2013)CrossRefGoogle Scholar
  33. 33.
    Freitas, A.A.: A survey of evolutionary algorithms for data mining and knowledge discovery. In Advances in Evolutionary Computing, pp. 819–845. Springer, Berlin (2003)Google Scholar
  34. 34.
    Noda, E., Freitas, A.A., Lopes, H.S.: Discovering interesting prediction rules with a genetic algorithm. In: Proceedings of the 1999 Congress on Evolutionary Computation (1999)Google Scholar
  35. 35.
    Lopes, H.S., Weinert, W.R.: EGIPSYS: an enhanced gene expression programming approach for symbolic regression problems. Int. J. Appl. Math. Comput. Sci. 14(3), 375–384 (2004)MATHMathSciNetGoogle Scholar
  36. 36.
    Ferreira, C.: Function finding and the creation of numerical constants in gene expression programming. In Advances in Soft Computing: Engineering Design and Manufacturing, p. 265 (2003)Google Scholar
  37. 37.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-wesley, Boston (1989)Google Scholar
  38. 38.
    Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press , Cambridge (1994)Google Scholar
  39. 39.
    Zhang, J., Huang, D.-S., Lok, T.-M., et al.: A novel adaptive sequential niche technique for multimodal function optimization. Neurocomputing 69(16), 2396–2401 (2006)CrossRefGoogle Scholar
  40. 40.
    Ferreira, C.: Genetic representation and genetic neutrality in gene expression programming. Adv. Complex Syst. 5(04), 389–408 (2002)CrossRefMATHGoogle Scholar
  41. 41.
    Siwei, J., Zhihua, C., Dang, Z.: Parallel gene expression programming algorithm based on simulated annealing method. ACTA Electr. Sinica 33, 2017–2021 (2005)Google Scholar
  42. 42.
    Zuo, J., Tang, C., Zhang, T.: Mining predicate association rule by gene expression programming. In Advances in Web-Age Information Management, pp. 281–294. Springer, Berlin (2002)Google Scholar
  43. 43.
    Kuok, C.M., Fu, A., Wong, M.H.: Mining fuzzy association rules in databases. ACM Sigmod Rec. 27(1), 41–46 (1998)CrossRefGoogle Scholar
  44. 44.
    Chen, D., Li, X., Wang, L., Khan, S., Wang, J., Zeng, K., Cai, C.: Fast and scalable multi-way analysis of massive neural data. IEEE Trans. Comput. (2014). doi: 10.1109/TC.2013.2295806

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.School of Computer ScienceChina University of Geosciences (Wuhan)WuhanChina

Personalised recommendations