Abstract
The graph similarity join retrieves all pairs of similar graphs on graph datasets. In this paper, we propose an efficient MapReduce-friendly algorithm tackling with the graph similarity join problem on large-scale graph datasets. In particular, the efficiency of our algorithm is guaranteed by: 1) scalable prefix-filtering suitable for q-gram alphabet that is beyond the memory; 2) an effective candidate reduction strategy that greatly cuts down the data communication cost; 3) a two-round data access proposal that reduces the data access overhead. Extensive experiments on large-scale real and synthetic datasets demonstrate that our proposal outperforms the state-of-the-art method with higher system scalability and faster speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zeng, Z., Tung, A.K.H., Wang, J., et al.: Comparing stars: On approximating graph edit distance. In: PVLDB, vol. 2(1), pp. 25–36 (2009)
MapReduce, http://hadoop.apache.org/
MapReduce, http://en.wikipedia.org/wiki/MapReduce
A graph synthetic generator, http://www.cse.ust.hk/graphgen/
Wang, G., Wang, B., Yang, X., et al.: Efficiently indexing large sparse graphs for similarity search. In: TKDE, pp. 440–451 (2010)
Zhao, X., Xiao, C., Lin, X.M., et al.: Efficient graph similarity joins with edit distance constraints. In: ICDE, pp. 834–845 (2012)
Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: SIGMOD, pp. 495–506 (2010)
Baraglia, R., De Morales, G.F., Lucchese, C.: Document similarity self-Join with MapReduce. In: ICDM, pp. 731–736 (2010)
Metwally, A., Faloutsos, C.: V-SMART-Join a scalable MapReduce framework for all-pair similarity joins of multisets and vectors. In: PVLDB, vol. 5(8), pp. 704–715 (2012)
Afrati, N.F., Sarma, D., et al.: Fuzzy joins using MapReduce. In: ICDE, pp. 498–509 (2012)
Elsayed, T., Lin, J., Oard, D.W.: Pairwise document similarity in large collections with MapReduce. In: ACL, pp. 265–268 (2008)
Silva, N.Y., Jason, M., et al.: Exploiting MapReduce-based similarity joins. In: SIGMOD, pp. 693–696 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pang, J., Gu, Y., Xu, J., Bao, Y., Yu, G. (2014). Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_43
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)