Skip to main content

Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Abstract

Mining frequent subgraphs in large scale graph data sets helps reveal underlying knowledge. Since the mining approaches in centralized systems are often bottlenecked on calculation capacity, many parallelized solutions based on the MapReduce framework are proposed to scale out the mining process, which usually extracts frequent subgraphs in an iterative way. Nonetheless, the efficiency and scalability of these MapReduce based approaches are still bounded by the communication cost for passing the intermediate results and the unbalanced workload after a few iterations. In this paper, we propose an efficient and scalable framework for frequent subgraph mining by using distributed graph processing systems. It adopts a message-passing-free scheme among workers to reduce the communication cost, and utilizes a task scheduler to dynamically balance the workload. Experimental results on both synthetic and real-world data sets verify the efficacy of our proposed framework.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. National library of medicine. http://chem.sis.nlm.nih.gov/chemidplus

  2. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)

    Article  Google Scholar 

  3. Lowe, D.G.: Local feature view clustering for 3D object recognition. In: CVPR, pp. 682–688 (2001)

    Google Scholar 

  4. Petrakis, E.G.M., Faloutsos, C.: Similarity searching in medical image databases. IEEE Trans. Knowl. Data Eng. 9(3), 435–447 (1997)

    Article  Google Scholar 

  5. Bill of materials. https://en.wikipedia.org/wiki/Bill_of_materials

  6. Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014)

    Google Scholar 

  7. Bhuiyan, M., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans. Knowl. Data Eng. 27(3), 608–620 (2015)

    Article  Google Scholar 

  8. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

    Google Scholar 

  9. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM, pp. 721–724 (2002)

    Google Scholar 

  10. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)

    Google Scholar 

  11. Cheng, J., Ke, Y., Ng, W., Lu, A.: FG-index: towards verification-free query processing on graph databases. In: SIGMOD, pp. 857–872 (2007)

    Google Scholar 

  12. Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)

    Google Scholar 

  13. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)

    Article  Google Scholar 

  14. Wang, C., Wang, W., Pei, J., Zhu, Y., Shi, B.: Scalable mining of large disk-based graph databases. In: KDD, pp. 316–325 (2004)

    Google Scholar 

  15. Nguyen, S.N., Orlowska, M.E., Li, X.: Graph mining based on a data partitioning approach. In: ADC, pp. 31–37 (2008)

    Google Scholar 

  16. Miliaraki, I., Berberich, K., Gemulla, R., Zoupanos, S.: Mind the gap: large-scale frequent sequence mining. In: SIGMOD, pp. 797–808 (2013)

    Google Scholar 

  17. Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)

    Google Scholar 

  18. Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: EuroSys, pp. 169–182 (2013)

    Google Scholar 

  19. Zhao, X., Chen, Y., Xiao, C., Ishikawa, Y., Tang, J.: Frequent subgraph mining based on pregel. Comput. J. 59(8), 1113–1128 (2016)

    Article  Google Scholar 

  20. Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: WWW, pp. 1307–1317 (2015)

    Google Scholar 

  21. Giraph - Welcome To Apache Giraph! http://giraph.apache.org/

  22. Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X.: Hybrid pulling/pushing for I/O-efficient distributed and iterative graph computing. In: SIGMOD, pp. 479–494 (2016)

    Google Scholar 

  23. Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol. 12, no. 1, p. 2 (2012)

    Google Scholar 

  24. Peng, Z., Wang, T., Lu, W., Huang, H., Du, X., Zhao, F., Tung, A.K.H.: Mining frequent subgraphs from tremendous amount of small graphs using MapReduce. Knowl. Inf. Syst. 1–28 (2017)

    Google Scholar 

Download references

Acknowledgment

We would like to thank the anonymous reviewers for their helpful and insightful comments. This work was in part supported by the National Natural Science Foundation of China (61502504, 61732014, 61502347, U1711261), and the Technological Innovation Projects of HuBei Province (2017AAA125).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, T., Huang, H., Lu, W., Peng, Z., Du, X. (2018). Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91452-7_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91451-0

  • Online ISBN: 978-3-319-91452-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics