Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems

  • Tongtong Wang
  • Hao Huang
  • Wei Lu
  • Zhe Peng
  • Xiaoyong Du
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10827)

Abstract

Mining frequent subgraphs in large scale graph data sets helps reveal underlying knowledge. Since the mining approaches in centralized systems are often bottlenecked on calculation capacity, many parallelized solutions based on the MapReduce framework are proposed to scale out the mining process, which usually extracts frequent subgraphs in an iterative way. Nonetheless, the efficiency and scalability of these MapReduce based approaches are still bounded by the communication cost for passing the intermediate results and the unbalanced workload after a few iterations. In this paper, we propose an efficient and scalable framework for frequent subgraph mining by using distributed graph processing systems. It adopts a message-passing-free scheme among workers to reduce the communication cost, and utilizes a task scheduler to dynamically balance the workload. Experimental results on both synthetic and real-world data sets verify the efficacy of our proposed framework.

Notes

Acknowledgment

We would like to thank the anonymous reviewers for their helpful and insightful comments. This work was in part supported by the National Natural Science Foundation of China (61502504, 61732014, 61502347, U1711261), and the Technological Innovation Projects of HuBei Province (2017AAA125).

References

  1. 1.
    National library of medicine. http://chem.sis.nlm.nih.gov/chemidplus
  2. 2.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRefGoogle Scholar
  3. 3.
    Lowe, D.G.: Local feature view clustering for 3D object recognition. In: CVPR, pp. 682–688 (2001)Google Scholar
  4. 4.
    Petrakis, E.G.M., Faloutsos, C.: Similarity searching in medical image databases. IEEE Trans. Knowl. Data Eng. 9(3), 435–447 (1997)CrossRefGoogle Scholar
  5. 5.
  6. 6.
    Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014)Google Scholar
  7. 7.
    Bhuiyan, M., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans. Knowl. Data Eng. 27(3), 608–620 (2015)CrossRefGoogle Scholar
  8. 8.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)Google Scholar
  9. 9.
    Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM, pp. 721–724 (2002)Google Scholar
  10. 10.
    Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)Google Scholar
  11. 11.
    Cheng, J., Ke, Y., Ng, W., Lu, A.: FG-index: towards verification-free query processing on graph databases. In: SIGMOD, pp. 857–872 (2007)Google Scholar
  12. 12.
    Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)Google Scholar
  13. 13.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)CrossRefGoogle Scholar
  14. 14.
    Wang, C., Wang, W., Pei, J., Zhu, Y., Shi, B.: Scalable mining of large disk-based graph databases. In: KDD, pp. 316–325 (2004)Google Scholar
  15. 15.
    Nguyen, S.N., Orlowska, M.E., Li, X.: Graph mining based on a data partitioning approach. In: ADC, pp. 31–37 (2008)Google Scholar
  16. 16.
    Miliaraki, I., Berberich, K., Gemulla, R., Zoupanos, S.: Mind the gap: large-scale frequent sequence mining. In: SIGMOD, pp. 797–808 (2013)Google Scholar
  17. 17.
    Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)Google Scholar
  18. 18.
    Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: EuroSys, pp. 169–182 (2013)Google Scholar
  19. 19.
    Zhao, X., Chen, Y., Xiao, C., Ishikawa, Y., Tang, J.: Frequent subgraph mining based on pregel. Comput. J. 59(8), 1113–1128 (2016)CrossRefGoogle Scholar
  20. 20.
    Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: WWW, pp. 1307–1317 (2015)Google Scholar
  21. 21.
    Giraph - Welcome To Apache Giraph! http://giraph.apache.org/
  22. 22.
    Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X.: Hybrid pulling/pushing for I/O-efficient distributed and iterative graph computing. In: SIGMOD, pp. 479–494 (2016)Google Scholar
  23. 23.
    Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol. 12, no. 1, p. 2 (2012)Google Scholar
  24. 24.
    Peng, Z., Wang, T., Lu, W., Huang, H., Du, X., Zhao, F., Tung, A.K.H.: Mining frequent subgraphs from tremendous amount of small graphs using MapReduce. Knowl. Inf. Syst. 1–28 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Tongtong Wang
    • 1
  • Hao Huang
    • 2
  • Wei Lu
    • 1
  • Zhe Peng
    • 1
  • Xiaoyong Du
    • 1
  1. 1.School of Information and DEKE, MOERenmin University of ChinaBeijingChina
  2. 2.State Key Laboratory of Software EngineeringWuhan UniversityWuhanChina

Personalised recommendations