Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems

Wang, Tongtong; Huang, Hao; Lu, Wei; Peng, Zhe; Du, Xiaoyong

doi:10.1007/978-3-319-91452-7_57

Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems

Tongtong Wang²⁴,
Hao Huang²⁵,
Wei Lu²⁴,
Zhe Peng²⁴ &
…
Xiaoyong Du²⁴

Conference paper
First Online: 13 May 2018

3375 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Abstract

Mining frequent subgraphs in large scale graph data sets helps reveal underlying knowledge. Since the mining approaches in centralized systems are often bottlenecked on calculation capacity, many parallelized solutions based on the MapReduce framework are proposed to scale out the mining process, which usually extracts frequent subgraphs in an iterative way. Nonetheless, the efficiency and scalability of these MapReduce based approaches are still bounded by the communication cost for passing the intermediate results and the unbalanced workload after a few iterations. In this paper, we propose an efficient and scalable framework for frequent subgraph mining by using distributed graph processing systems. It adopts a message-passing-free scheme among workers to reduce the communication cost, and utilizes a task scheduler to dynamically balance the workload. Experimental results on both synthetic and real-world data sets verify the efficacy of our proposed framework.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

National library of medicine. http://chem.sis.nlm.nih.gov/chemidplus
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)
Article Google Scholar
Lowe, D.G.: Local feature view clustering for 3D object recognition. In: CVPR, pp. 682–688 (2001)
Google Scholar
Petrakis, E.G.M., Faloutsos, C.: Similarity searching in medical image databases. IEEE Trans. Knowl. Data Eng. 9(3), 435–447 (1997)
Article Google Scholar
Bill of materials. https://en.wikipedia.org/wiki/Bill_of_materials
Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in mapreduce. In: ICDE, pp. 844–855 (2014)
Google Scholar
Bhuiyan, M., Hasan, M.A.: An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans. Knowl. Data Eng. 27(3), 608–620 (2015)
Article Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
Google Scholar
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM, pp. 721–724 (2002)
Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)
Google Scholar
Cheng, J., Ke, Y., Ng, W., Lu, A.: FG-index: towards verification-free query processing on graph databases. In: SIGMOD, pp. 857–872 (2007)
Google Scholar
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12 (2000)
Article Google Scholar
Wang, C., Wang, W., Pei, J., Zhu, Y., Shi, B.: Scalable mining of large disk-based graph databases. In: KDD, pp. 316–325 (2004)
Google Scholar
Nguyen, S.N., Orlowska, M.E., Li, X.: Graph mining based on a data partitioning approach. In: ADC, pp. 31–37 (2008)
Google Scholar
Miliaraki, I., Berberich, K., Gemulla, R., Zoupanos, S.: Mind the gap: large-scale frequent sequence mining. In: SIGMOD, pp. 797–808 (2013)
Google Scholar
Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
Google Scholar
Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., Kalnis, P.: Mizan: a system for dynamic load balancing in large-scale graph processing. In: EuroSys, pp. 169–182 (2013)
Google Scholar
Zhao, X., Chen, Y., Xiao, C., Ishikawa, Y., Tang, J.: Frequent subgraph mining based on pregel. Comput. J. 59(8), 1113–1128 (2016)
Article Google Scholar
Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: WWW, pp. 1307–1317 (2015)
Google Scholar
Giraph - Welcome To Apache Giraph! http://giraph.apache.org/
Wang, Z., Gu, Y., Bao, Y., Yu, G., Yu, J.X.: Hybrid pulling/pushing for I/O-efficient distributed and iterative graph computing. In: SIGMOD, pp. 479–494 (2016)
Google Scholar
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol. 12, no. 1, p. 2 (2012)
Google Scholar
Peng, Z., Wang, T., Lu, W., Huang, H., Du, X., Zhao, F., Tung, A.K.H.: Mining frequent subgraphs from tremendous amount of small graphs using MapReduce. Knowl. Inf. Syst. 1–28 (2017)
Google Scholar

Download references

Acknowledgment

We would like to thank the anonymous reviewers for their helpful and insightful comments. This work was in part supported by the National Natural Science Foundation of China (61502504, 61732014, 61502347, U1711261), and the Technological Innovation Projects of HuBei Province (2017AAA125).

Author information

Authors and Affiliations

School of Information and DEKE, MOE, Renmin University of China, Beijing, China
Tongtong Wang, Wei Lu, Zhe Peng & Xiaoyong Du
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China
Hao Huang

Authors

Tongtong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Lu .

Editor information

Editors and Affiliations

Simon Fraser University, Burnaby, BC, Canada
Jian Pei
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
University of Queensland, Brisbane, QLD, Australia
Shazia Sadiq
University of Western Australia, Crawley, WA, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, T., Huang, H., Lu, W., Peng, Z., Du, X. (2018). Efficient and Scalable Mining of Frequent Subgraphs Using Distributed Graph Processing Systems. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-91452-7_57
Published: 13 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics