Skip to main content

A Fast Two-Stage Algorithm for Computing SimRank and Its Extensions

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6185))

Abstract

Similarity estimation can be used in many applications such as recommender system, cluster analysis, information retrieval and link prediction. SimRank is a famous algorithm to measure objects’ similarities based on link structure. We observe that if one node has no in-link, similarity score between this node and any of the others is always zero. Based on this observation, we propose a new algorithm, fast two-stage SimRank (F2S-SimRank), which can avoid storing unnecessary zeros and can accelerate the computation without accuracy loss. Under the circumstance of no accuracy loss, this algorithm uses less computation time and occupies less main memory. Experiments conducted on real and synthetic datasets demonstrate the effectiveness and efficiency of our F2S-SimRank.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jeh, G., Widom, J.: SimRank: A Measure of Structural-Context Similarity. In: SIGKDD, pp. 538–543 (2002)

    Google Scholar 

  2. Small, H.: Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24(4), 265–269 (1973)

    Article  Google Scholar 

  3. Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14(1), 10–25 (1963)

    Article  Google Scholar 

  4. Amsler, R.: Applications of citation-based automatic classification. Linguistic Research Center (1972)

    Google Scholar 

  5. Fogaras, D., Racz, B.: Scaling link-based similarity search. In: WWW, Chiba, Japan, pp. 641–650 (2005)

    Google Scholar 

  6. Xi, W., Fox, E.A., Zhang, B., Cheng, Z.: SimFusion: Measuring Similarity Using Unified Relationship Matrix. In: SIGIR, Salvador, Brazil, pp. 130–137 (2005)

    Google Scholar 

  7. Yin, X.X., Han, J.W., Yu, P.S.: LinkClus: Efficient Clustering via Heterogeneous Semantic Links. In: VLDB, Seoul, Korea, pp. 427–438 (2006)

    Google Scholar 

  8. Lin, Z.J., King, I., Lyu, M.R.: PageSim: A Novel Link-Based Measure of Web Page Similarity. In: Edinburgh, W.W.W. (ed.) WWW, Edinburgh, Scotland, pp. 1019–1020 (2006)

    Google Scholar 

  9. Tong, H.H., Faloutsos, C., Pan, J.Y.: Random walk with restart: fast solutions and applica-tions. In: ICDM, Hong Kong, China, pp. 613–622 2006)

    Google Scholar 

  10. Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy Estimate and Optimization Techniques for SimRank Computation. In: VLDB, Auckland, New Zealand, pp. 422–433 (2008)

    Google Scholar 

  11. Antonellis, I., Garcia-Molina, H., Chang, C.C.: SimRank++: Query rewrite through link analysis of the click graph. In: VLDB, Auckland, New Zealand, pp. 408–421 (2008)

    Google Scholar 

  12. Chris, P.L., Gene, H.G., Stefanos, A.Z.: A Fast Two-Stage Algorithm for Computing Pag-eRank and Its Extension. Technical Report SCCM 2003-15, Stanford University (2003)

    Google Scholar 

  13. Page, L., Brin, S.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107–117 (1998)

    Article  Google Scholar 

  14. Langville, A.N., Meyer, C.D.: Deeper Inside PageRank. Internet Mathematics, 335–400 (2004)

    Google Scholar 

  15. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  16. Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector spaces model in information retrieval. In: SIGIR, Montreal, Canada, pp. 18–25 (1985)

    Google Scholar 

  17. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On Power-Law Relationships of the Internet Topology. In: SIGCOMM, Cambridge, MA, USA, pp. 251–262 (1999)

    Google Scholar 

  18. ACM dataset, http://www.acm.org/

  19. RUC dataset, http://www.ruc.edu.cn/

  20. Cai, Y.Z., Cong, G., Jia, X., Liu, H.Y., He, J.: Efficient Algorithms for Computing Link-based Similarity in Real World Networks. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects. LNCS, vol. 5633. Springer, Heidelberg (2009)

    Google Scholar 

  21. Jia, X., Cai, Y.Z., Liu, H.Y., He, J., Du, X.Y.: Calculating Similarity Efficiently in a Small World. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) Advanced Data Mining and Applications. LNCS, vol. 5678, pp. 175–187. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  22. Cai, Y.Z., Liu, H.Y., He, J., Du, X.Y., Jia, X.: An Adaptive Method for Efficient Similarity Calculation. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) Database Systems for Advanced Applications. LNCS, vol. 5667, pp. 339–353. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  23. Li, P., Cai, Y.Z., Liu, H.Y., He, J., Du, X.Y.: Exploiting the Block Structure of Link Graph for Efficient Similarity Computation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 389–400. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  24. Zhao, P.X., Han, J.W., Sun, Y.Z.: P-Rank: a comprehensive structural similarity measure over information networks. In: CIKM, Hong Kong, China, pp. 553–562 (2009)

    Google Scholar 

  25. Blondel, V.D., Gajardo, A., Heymans, M., Senellart, P., Dooren, P.V.: A Measure of Simi-larity between Graph Vertices: Applications to Synonym Extraction and Web Searching. SIAM Review 46(4), 647–666 (2004)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jia, X., Liu, H., Zou, L., He, J., Du, X. (2010). A Fast Two-Stage Algorithm for Computing SimRank and Its Extensions. In: Shen, H.T., et al. Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16720-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16720-1_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16719-5

  • Online ISBN: 978-3-642-16720-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics