Skip to main content

Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce

  • Conference paper
Web-Age Information Management (WAIM 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Included in the following conference series:

Abstract

The graph similarity join retrieves all pairs of similar graphs on graph datasets. In this paper, we propose an efficient MapReduce-friendly algorithm tackling with the graph similarity join problem on large-scale graph datasets. In particular, the efficiency of our algorithm is guaranteed by: 1) scalable prefix-filtering suitable for q-gram alphabet that is beyond the memory; 2) an effective candidate reduction strategy that greatly cuts down the data communication cost; 3) a two-round data access proposal that reduces the data access overhead. Extensive experiments on large-scale real and synthetic datasets demonstrate that our proposal outperforms the state-of-the-art method with higher system scalability and faster speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zeng, Z., Tung, A.K.H., Wang, J., et al.: Comparing stars: On approximating graph edit distance. In: PVLDB, vol. 2(1), pp. 25–36 (2009)

    Google Scholar 

  2. MapReduce, http://hadoop.apache.org/

  3. MapReduce, http://en.wikipedia.org/wiki/MapReduce

  4. A graph synthetic generator, http://www.cse.ust.hk/graphgen/

  5. Wang, G., Wang, B., Yang, X., et al.: Efficiently indexing large sparse graphs for similarity search. In: TKDE, pp. 440–451 (2010)

    Google Scholar 

  6. Zhao, X., Xiao, C., Lin, X.M., et al.: Efficient graph similarity joins with edit distance constraints. In: ICDE, pp. 834–845 (2012)

    Google Scholar 

  7. Vernica, R., Carey, M.J., Li, C.: Efficient parallel set-similarity joins using MapReduce. In: SIGMOD, pp. 495–506 (2010)

    Google Scholar 

  8. Baraglia, R., De Morales, G.F., Lucchese, C.: Document similarity self-Join with MapReduce. In: ICDM, pp. 731–736 (2010)

    Google Scholar 

  9. Metwally, A., Faloutsos, C.: V-SMART-Join a scalable MapReduce framework for all-pair similarity joins of multisets and vectors. In: PVLDB, vol. 5(8), pp. 704–715 (2012)

    Google Scholar 

  10. Afrati, N.F., Sarma, D., et al.: Fuzzy joins using MapReduce. In: ICDE, pp. 498–509 (2012)

    Google Scholar 

  11. Elsayed, T., Lin, J., Oard, D.W.: Pairwise document similarity in large collections with MapReduce. In: ACL, pp. 265–268 (2008)

    Google Scholar 

  12. Silva, N.Y., Jason, M., et al.: Exploiting MapReduce-based similarity joins. In: SIGMOD, pp. 693–696 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Pang, J., Gu, Y., Xu, J., Bao, Y., Yu, G. (2014). Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08010-9_43

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08009-3

  • Online ISBN: 978-3-319-08010-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics