Advertisement

The VLDB Journal

, Volume 28, Issue 1, pp 99–122 | Cite as

Accelerating pairwise SimRank estimation over static and dynamic graphs

  • Yue WangEmail author
  • Lei Chen
  • Yulin Che
  • Qiong Luo
Regular Paper
  • 88 Downloads

Abstract

Measuring similarities among different vertices is a fundamental problem in graph analysis. Among different similarity measurements, SimRank is one of the most promising and popular. In reality, instead of computing the whole similarity matrix, people often issue SimRank queries in a pairwise manner, each of which needs to estimate an approximate SimRank value within a specified accuracy for a given pair of nodes. These pairwise SimRank queries are often processed on real-life graphs, which typically evolve over time, requiring efficient algorithms that can query pairwise SimRank under dynamic graph updates. However, current single-pair SimRank solutions are either static or inefficient in handling dynamic cases with good-quality results. Observing that the sample size is the major factor that determines the efficiency and the accuracy in Monte Carlo methods to estimate pairwise SimRank, in this paper, we propose three algorithms to query pairwise SimRank over static and dynamic graphs efficiently, by using different sample reduction strategies. The accuracy of our algorithms is guaranteed by the different invariants we propose for pairwise SimRank. We show that our algorithms outperform the state-of-the-art static and dynamic solutions for pairwise SimRank estimation.

Keywords

SimRank Dynamic graph Graph theory Similarity measure 

Notes

Acknowledgements

Yue Wang and Lei Chen are supported in part by the Hong Kong RGC GRF Project 16214716, National Grand Fundamental Research 973 Program of China under Grant 2014CB340303, the National Science Foundation of China (NSFC) under Grant No. 61729201, Science and Technology Planning Project of Guangdong Province, China, No. 2015B010110006, Huawei Co.Ltd Collaboration Project, YBCB2009041-45, Hong Kong ITC ITF grants ITS/391/15FX and ITS/212/16FP, Microsoft Research Asia Collaborative Research Grant, and WeChat-HKUST Joint Lab on Artificial Intelligence Technology, RDC-17182280. Yulin Che and Qiong Luo are supported in part by grants 16206414 from the Hong Kong Research Grants Council and MRA11EG01 from Microsoft.

References

  1. 1.
    Abbassi, Z., Mirrokni, V.S.: A recommender system based on local random walks and spectral methods. In: WebKDD/SNA-KDD (2007).  https://doi.org/10.1145/1348549.1348561
  2. 2.
    Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: FOCS (2006)Google Scholar
  3. 3.
    Antonellis, I., Garcia-Molina, H., Chang, C.: Simrank++: query rewriting through link analysis of the click graph. PVLDB 1(1), 408–421, (2008). http://www.vldb.org/pvldb/1/1453903.pdf
  4. 4.
    Fogaras, D., Rácz, B.: Scaling link-based similarity search. In: WWW (2005)Google Scholar
  5. 5.
    Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Onizuka, M.: Efficient search algorithm for SimRank. In: ICDE (2013)Google Scholar
  6. 6.
    He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: KDD (2010).  https://doi.org/10.1145/1835804.1835874
  7. 7.
    He, J., Liu, H., Yu, J.X., Li, P., He, W., Du, X.: Assessing single-pair similarity over graphs by aggregating first-meeting probabilities. Inf. Syst. 42, 107–122 (2014).  https://doi.org/10.1016/j.is.2013.12.008 CrossRefGoogle Scholar
  8. 8.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963). http://www.jstor.org/stable/2282952
  9. 9.
    Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: KDD (2002)Google Scholar
  10. 10.
    Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, 20–24 May 2003, pp. 271–279 (2003).  https://doi.org/10.1145/775152.775191
  11. 11.
    Jiang, M., Fu, A.W., Wong, R.C., Wang, K.: READS: a random walk approach for efficient and accurate dynamic SimRank. PVLDB 10(9), 937–948 (2017)Google Scholar
  12. 12.
    Kusumoto, M., Maehara, T., Kawarabayashi, K: Scalable similarity search for SimRank. In: SIGMOD (2014).  https://doi.org/10.1145/2588555.2610526
  13. 13.
    Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., Wu, T.: Fast computation of SimRank for static and dynamic information networks. In: EDBT (2010a).  https://doi.org/10.1145/1739041.1739098
  14. 14.
    Li, P., Liu, H., Yu, J.X., He, J., Du, X.: Fast single-pair SimRank computation. In: Proceedings of the 2010 SIAM International Conference on Data Mining, SIAM, pp. 571–582 (2010b)Google Scholar
  15. 15.
    Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.C.S.: Walking in the cloud: parallel simrank at scale. PVLDB 9(1), 24–35 (2015)Google Scholar
  16. 16.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)CrossRefGoogle Scholar
  17. 17.
    Liu, Y., Zheng, B., He, X., Wei, Z., Xiao, X., Zheng, K., Lu, J.: Probesim: scalable single-source and top-k SimRank computations on dynamic graphs. PVLDB 11(1), 14–26 (2017). http://www.vldb.org/pvldb/vol11/p14-liu.pdf
  18. 18.
    Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy estimate and optimization techniques for SimRank computation. VLDB 1(1), 422–433 (2008)Google Scholar
  19. 19.
    Lofgren, P., Banerjee, S., Goel, A.: Personalized PageRank estimation and search: a bidirectional approach. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, ACM, New York (WSDM ’16), pp. 163–172 (2016).  https://doi.org/10.1145/2835776.2835823
  20. 20.
    Lu, J., Gong, Z., Lin, X.: A novel and fast SimRank algorithm. IEEE Trans. Knowl. Data Eng. (2017).  https://doi.org/10.1109/TKDE.2016.2626282 Google Scholar
  21. 21.
    Maehara, T., Kusumoto, M., Kawarabayashi, K.: Efficient SimRank computation via linearization (2014). CoRR arXiv:1411.7228
  22. 22.
    Mislove, A., Koppula, H.S., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Growth of the flickr social network. In: Proceedings of the First Workshop on Online Social Networks (WOSN 2008), Seattle, 17–22 Aug 2008, pp. 25–30 (2008).  https://doi.org/10.1145/1397735.1397742
  23. 23.
    Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for SimRank over large dynamic graphs. PVLDB 8(8), 838–849 (2015)Google Scholar
  24. 24.
    Spirin, N., Han, J.: Survey on web spam detection: principles and algorithms. SIGKDD Explor. Newsl. 13(2), 50–64 (2012)CrossRefGoogle Scholar
  25. 25.
    Tao, W., Yu, M., Li, G.: Efficient top-k SimRank-based similarity join. PVLDB 8(3):317–328, (2014). http://www.vldb.org/pvldb/vol8/p317-tao.pdf
  26. 26.
    Tian, B., Xiao, X.: Sling: A near-optimal index structure for SimRank. SIGMOD (2016). https://doi.org/10.1145/2882903.2915243
  27. 27.
    Wang, Y., Lian, X., Chen, L.: Efficient SimRank tracking in dynamic graphs. In: ICDE (2018)Google Scholar
  28. 28.
    Yin, X., Han, J., Yu, P.S.: Linkclus: efficient clustering via heterogeneous semantic links. In: VLDB (2006)Google Scholar
  29. 29.
    Yoon, M., Jin, W., Kang, U.: Fast and accurate random walk with restart on dynamic graphs with guarantees (2017). CoRR arXiv:1712.00595
  30. 30.
    Yu, W., McCann, J.A.: Efficient partial-pairs SimRank search for large networks. PVLDB 8(5), 569–580 (2015)Google Scholar
  31. 31.
    Yu, W., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for simrank computation. WWW 15(3) (2012).  https://doi.org/10.1007/s11280-010-0100-6
  32. 32.
    Yu, W., Lin, X., Zhang, W.: Towards efficient SimRank computation on large networks. In: ICDE, pp. 601–612 (2013a).  https://doi.org/10.1109/ICDE.2013.6544859
  33. 33.
    Yu, W., Lin, X., Zhang, W., Chang, L., Pei, J.: More is simpler: effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB 7(1), 13–24 (2013b)Google Scholar
  34. 34.
    Yu, W., Lin, X., Zhang, W.: Fast incremental SimRank on link-evolving graphs. In: ICDE, pp. 304–315 (2014).  https://doi.org/10.1109/ICDE.2014.6816660
  35. 35.
    Yu, W., Lin, X., Zhang, W., McCann, J.A.: Dynamical simrank search on time-varying networks. VLDB J. 27(1), 79–104 (2018).  https://doi.org/10.1007/s00778-017-0488-z CrossRefGoogle Scholar
  36. 36.
    Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, ACM, pp. 553–562 (2009)Google Scholar
  37. 37.
    Zheng, W., Zou, L., Chen, L., Zhao, D.: Efficient simrank-based similarity join. ACM Trans. Database Syst. 42(3), 16:1–16:37 (2017).  https://doi.org/10.1145/3083899 MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyKowloonChina

Personalised recommendations