A Fast Two-Stage Algorithm for Computing SimRank and Its Extensions

Jia, Xu; Liu, Hongyan; Zou, Li; He, Jun; Du, Xiaoyong

doi:10.1007/978-3-642-16720-1_6

A Fast Two-Stage Algorithm for Computing SimRank and Its Extensions

Xu Jia^25,26,
Hongyan Liu²⁷,
Li Zou^25,26,
Jun He^25,26 &
…
Xiaoyong Du^25,26

Conference paper

1433 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6185))

Abstract

Similarity estimation can be used in many applications such as recommender system, cluster analysis, information retrieval and link prediction. SimRank is a famous algorithm to measure objects’ similarities based on link structure. We observe that if one node has no in-link, similarity score between this node and any of the others is always zero. Based on this observation, we propose a new algorithm, fast two-stage SimRank (F2S-SimRank), which can avoid storing unnecessary zeros and can accelerate the computation without accuracy loss. Under the circumstance of no accuracy loss, this algorithm uses less computation time and occupies less main memory. Experiments conducted on real and synthetic datasets demonstrate the effectiveness and efficiency of our F2S-SimRank.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jeh, G., Widom, J.: SimRank: A Measure of Structural-Context Similarity. In: SIGKDD, pp. 538–543 (2002)
Google Scholar
Small, H.: Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24(4), 265–269 (1973)
Article Google Scholar
Kessler, M.M.: Bibliographic coupling between scientific papers. American Documentation 14(1), 10–25 (1963)
Article Google Scholar
Amsler, R.: Applications of citation-based automatic classification. Linguistic Research Center (1972)
Google Scholar
Fogaras, D., Racz, B.: Scaling link-based similarity search. In: WWW, Chiba, Japan, pp. 641–650 (2005)
Google Scholar
Xi, W., Fox, E.A., Zhang, B., Cheng, Z.: SimFusion: Measuring Similarity Using Unified Relationship Matrix. In: SIGIR, Salvador, Brazil, pp. 130–137 (2005)
Google Scholar
Yin, X.X., Han, J.W., Yu, P.S.: LinkClus: Efficient Clustering via Heterogeneous Semantic Links. In: VLDB, Seoul, Korea, pp. 427–438 (2006)
Google Scholar
Lin, Z.J., King, I., Lyu, M.R.: PageSim: A Novel Link-Based Measure of Web Page Similarity. In: Edinburgh, W.W.W. (ed.) WWW, Edinburgh, Scotland, pp. 1019–1020 (2006)
Google Scholar
Tong, H.H., Faloutsos, C., Pan, J.Y.: Random walk with restart: fast solutions and applica-tions. In: ICDM, Hong Kong, China, pp. 613–622 2006)
Google Scholar
Lizorkin, D., Velikhov, P., Grinev, M., Turdakov, D.: Accuracy Estimate and Optimization Techniques for SimRank Computation. In: VLDB, Auckland, New Zealand, pp. 422–433 (2008)
Google Scholar
Antonellis, I., Garcia-Molina, H., Chang, C.C.: SimRank++: Query rewrite through link analysis of the click graph. In: VLDB, Auckland, New Zealand, pp. 408–421 (2008)
Google Scholar
Chris, P.L., Gene, H.G., Stefanos, A.Z.: A Fast Two-Stage Algorithm for Computing Pag-eRank and Its Extension. Technical Report SCCM 2003-15, Stanford University (2003)
Google Scholar
Page, L., Brin, S.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107–117 (1998)
Article Google Scholar
Langville, A.N., Meyer, C.D.: Deeper Inside PageRank. Internet Mathematics, 335–400 (2004)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector spaces model in information retrieval. In: SIGIR, Montreal, Canada, pp. 18–25 (1985)
Google Scholar
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On Power-Law Relationships of the Internet Topology. In: SIGCOMM, Cambridge, MA, USA, pp. 251–262 (1999)
Google Scholar
ACM dataset, http://www.acm.org/
RUC dataset, http://www.ruc.edu.cn/
Cai, Y.Z., Cong, G., Jia, X., Liu, H.Y., He, J.: Efficient Algorithms for Computing Link-based Similarity in Real World Networks. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects. LNCS, vol. 5633. Springer, Heidelberg (2009)
Google Scholar
Jia, X., Cai, Y.Z., Liu, H.Y., He, J., Du, X.Y.: Calculating Similarity Efficiently in a Small World. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) Advanced Data Mining and Applications. LNCS, vol. 5678, pp. 175–187. Springer, Heidelberg (2009)
Chapter Google Scholar
Cai, Y.Z., Liu, H.Y., He, J., Du, X.Y., Jia, X.: An Adaptive Method for Efficient Similarity Calculation. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) Database Systems for Advanced Applications. LNCS, vol. 5667, pp. 339–353. Springer, Heidelberg (2009)
Chapter Google Scholar
Li, P., Cai, Y.Z., Liu, H.Y., He, J., Du, X.Y.: Exploiting the Block Structure of Link Graph for Efficient Similarity Computation. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 389–400. Springer, Heidelberg (2009)
Chapter Google Scholar
Zhao, P.X., Han, J.W., Sun, Y.Z.: P-Rank: a comprehensive structural similarity measure over information networks. In: CIKM, Hong Kong, China, pp. 553–562 (2009)
Google Scholar
Blondel, V.D., Gajardo, A., Heymans, M., Senellart, P., Dooren, P.V.: A Measure of Simi-larity between Graph Vertices: Applications to Synonym Extraction and Web Searching. SIAM Review 46(4), 647–666 (2004)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Key Labs of Data Engineering and Knowledge Engineering, Ministry of Education, China
Xu Jia, Li Zou, Jun He & Xiaoyong Du
Department of Computer Science, Renmin University of China, 100872, China
Xu Jia, Li Zou, Jun He & Xiaoyong Du
Department of Management Science and Engineering, Tsinghua University, 100084, China
Hongyan Liu

Authors

Xu Jia
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jun He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
David R. Cheriton School of Computer Science, University of Waterloo, Canada
M. Tamer Özsu
Peking University, China
Lei Zou
Renmin University of China, China
Jiaheng Lu
National University of Singapore, Singapore
Tok-Wang Ling
Northeastern University, 110004, Shenyang, China
Ge Yu
College of Computer Science, Zhejiang University, 310027, Hangzhou, P.R. China
Yi Zhuang
University of Melbourne, Australia
Jie Shao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, X., Liu, H., Zou, L., He, J., Du, X. (2010). A Fast Two-Stage Algorithm for Computing SimRank and Its Extensions. In: Shen, H.T., et al. Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16720-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-16720-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16719-5
Online ISBN: 978-3-642-16720-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics