Advertisement

Applied Intelligence

, Volume 49, Issue 7, pp 2762–2779 | Cite as

Level-2 node clustering coefficient-based link prediction

  • Ajay KumarEmail author
  • Shashank Sheshar Singh
  • Kuldeep Singh
  • Bhaskar Biswas
Article
  • 114 Downloads

Abstract

Link prediction finds missing links in static networks or future (or new) links in dynamic networks. Its study is crucial to the analysis of the evolution of networks. In the last decade, lots of works have been presented on link prediction in social networks. Link prediction has been playing a pivotal role in course of analyzing complex networks including social networks, biological networks, etc. In this work, we propose a new approach to link prediction based on level-2 node clustering coefficient. This approach defines the notion of level-2 common node and its corresponding clustering coefficient that extracts clustering information of level-2 common neighbors of the seed node pair and computes the similarity score based on this information. We performed the simulation of the existing methods (i.e. three classical methods viz., common neighbors, resource allocation, preferential attachment, clustering coefficient-based methods (CCLP and NLC), local naive based common neighbor (LNBCN), Cannistrai-Alanis-Ravai (CAR), recent Node2vec method) and the proposed method over 11 real-world network datasets. Accuracy is estimated in terms of four well-known single point summary statistics viz., area under the ROC curve (AUROC), area under the precision-recall curve (AUPR), average precision and recall. The comprehensive experiment on four metric and 11 datasets show the better performance results of the proposed method. The time complexity of the proposed method is also given and is of the order of time required by the existing method CCLP. The statistical test (The Friedman Test) justifies that the proposed method is significantly different from the existing methods in the paper.

Keywords

Link prediction Level-2 node clustering coefficient Similarity measures Social network 

Notes

References

  1. 1.
    Liben-Nowell D, Kleinberg J The link-prediction problem for social networks. J Am Soc Inf Sci TechnolGoogle Scholar
  2. 2.
    Adafre SF, de Rijke M Discovering missing links in wikipedia. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05, pp 90–97Google Scholar
  3. 3.
    Zhu J, Hong J, Hughes JG Using Markov models for web site link prediction. In: Proceedings of the thirteenth ACM conference on hypertext and hypermedia, HYPERTEXT ’02, pp 169–170Google Scholar
  4. 4.
    Huang Z, Li X, Chen H Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, JCDL ’05, pp 141–142Google Scholar
  5. 5.
    Airodi E, Blei D, Xing E, Fienberg S Mixed membership stochastic block models for relational data, with applications to protein-protein interactions. In: Proceedings of international biometric society-ENAR annual meetingsGoogle Scholar
  6. 6.
    Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64:025102.  https://doi.org/10.1103/PhysRevE.64.025102 CrossRefGoogle Scholar
  7. 7.
    Jaccard P (1901) Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat 37:241–272Google Scholar
  8. 8.
    Lada A, Adar E (2003) Friends and neighbors on the web. Soc Netw 25:211–230.  https://doi.org/10.1016/S0378-8733(03)00009-1 CrossRefGoogle Scholar
  9. 9.
    Zhou T, Lu L, Zhang Y-C (2009) Predicting missing links via local information. Europ Phys J B 71:623–630.  https://doi.org/10.1140/epjb/e2009-00335-8 CrossRefzbMATHGoogle Scholar
  10. 10.
    Barabasi A, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Physica A Stat Mech Appl 311:590–614.  https://doi.org/10.1016/S0378-4371(02)00736-7 MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43CrossRefzbMATHGoogle Scholar
  12. 12.
    Liu W, Lü L (2010) Link prediction based on local random walk. EPL (Europhys Lett) 89(5):58007. http://stacks.iop.org/0295-5075/89/i=5/a=58007 CrossRefGoogle Scholar
  13. 13.
    Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on World Wide Web 7, WWW7. http://dl.acm.org/citation.cfm?id=297805.297827. Elsevier Science Publishers B. V., Amsterdam, pp 107–117
  14. 14.
    Leicht EA, Holme P, Newman MEJ (2006) Vertex similarity in networks. Phys Rev E 73:026120.  https://doi.org/10.1103/PhysRevE.73.026120 CrossRefGoogle Scholar
  15. 15.
    Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of the sixth international conference on data mining, ICDM ’06. IEEE Computer Society, Washington, pp 613–622, DOI  https://doi.org/10.1109/ICDM.2006.70
  16. 16.
    Wu Z, Lin Y, Wang J, Gregory S (2016) Link prediction with node clustering coefficient. Physica A: Stat Mech Appl 452:1–8.  https://doi.org/10.1016/j.physa.2016.01.038 CrossRefGoogle Scholar
  17. 17.
    Liu Y, Zhao C, Wang X, Huang Q, Zhang X, Yi D (2016) The degree-related clustering coefficient and its application to link prediction. Physica A: Stat Mech Appl 454:24–33.  https://doi.org/10.1016/j.physa.2016.02.014 CrossRefGoogle Scholar
  18. 18.
    Wu Z, Lin Y, Wan H, Jamil W (2016) Predicting top-L missing links with node and link clustering information in large-scale networks. J Stat Mech Theory Exper 8:083202.  https://doi.org/10.1088/1742-5468/2016/08/083202 CrossRefGoogle Scholar
  19. 19.
    Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proc. of SDM 06 workshop on link analysis, counterterrorism and securityGoogle Scholar
  20. 20.
    Popescul A, Popescul R, Ungar LH (2003) Statistical relational learning for link predictionGoogle Scholar
  21. 21.
    Popescul A, Popescul R, Ungar LH (2003) Structural logistic regression for link analysisGoogle Scholar
  22. 22.
    Taskar B, Wong M-F, Abbeel P, Koller D (2003) Link prediction in relational data. In: Proceedings of the 16th international conference on neural information processing systems, NIPS’03. MIT Press, Cambridge, pp 659–666. http://dl.acm.org/citation.cfm?id=2981345.2981428
  23. 23.
    Sarukkai RR (2000) Link prediction and path analysis using Markov chains1. Comput Netw 33(1-6):377–386CrossRefGoogle Scholar
  24. 24.
    Shapiro EY (1983) Algorithmic program debugging. MIT Press, CambridgezbMATHGoogle Scholar
  25. 25.
    Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models of link structure. J Mach Learn Res 3:679–707MathSciNetzbMATHGoogle Scholar
  26. 26.
    Nallapati RM, Ahmed A, Xing EP, Cohen WW Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, pp 542–550Google Scholar
  27. 27.
    Fu W, Song L, Xing EP Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the 26th annual international conference on machine learning, ICML ’09, pp 329–336Google Scholar
  28. 28.
    Xu Z, Tresp V, Yu S, Yu K (2008) Nonparametric relational learning for social network analysis. In: KDD’2008 Workshop on social network mining and analysisGoogle Scholar
  29. 29.
    Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 14th international conference on neural information processing systems: natural and synthetic, NIPS’01. MIT Press, Cambridge, pp 585–591. http://dl.acm.org/citation.cfm?id=2980539.2980616
  30. 30.
    Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 855-864, DOI  https://doi.org/10.1145/2939672.2939754
  31. 31.
    Mehran Kazemi S, Poole D SimplE embedding for link prediction in knowledge graphs. arXiv:1802.04868
  32. 32.
    Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14. ACM, New York, pp 701–710, DOI  https://doi.org/10.1145/2623330.2623732
  33. 33.
    Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290 (5500):2323–2326.  https://doi.org/10.1126/science.290.5500.2323. http://science.sciencemag.org/content/290/5500/2323 CrossRefGoogle Scholar
  34. 34.
    Mikolov T, Chen K, Corrado G, Dean J Efficient estimation of word representations in vector space. arXiv:1301.3781
  35. 35.
    Mikolov T, Sutskever I, Chen K, Corrado G, Dean J Distributed representations of words and phrases and their compositionality. arXiv:1310.4546
  36. 36.
    Tylenda T, Angelova R, Bedathur S Towards time-aware link prediction in evolving social networks. In: Proceedings of the 3rd workshop on social network mining and analysis, SNA-KDD ’09, pp 9:1–9:10Google Scholar
  37. 37.
    Song HH, Cho TW, Dave V, Zhang Y, Qiu L Scalable proximity estimation and link prediction in online social networks. In: Proceedings of the 9th ACM SIGCOMM conference on internet measurement, IMC ’09, pp 322–335Google Scholar
  38. 38.
    Acar E, Dunlavy DM, Kolda TG (2009) Link prediction on evolving data using matrix and tensor factorizations. In: 2009 IEEE International conference on data mining workshops, pp 262–269.  https://doi.org/10.1109/ICDMW.2009.54
  39. 39.
    Zan H (2006) Link prediction based on graph topology: the predictive value of generalized clustering coefficientGoogle Scholar
  40. 40.
    Cannistraci CV, Alanis-Lobato G, Ravasi T (2013) From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci Rep 3:1613.  https://doi.org/10.1038/srep01613 CrossRefGoogle Scholar
  41. 41.
    Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97.  https://doi.org/10.1103/RevModPhys.74.47 MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393(6684):440–442.  https://doi.org/10.1038/30918 CrossRefzbMATHGoogle Scholar
  43. 43.
    Kleinberg JM (2000) Navigation in a small world. Nature 406(6798):845CrossRefGoogle Scholar
  44. 44.
    Milgram S (1967) The small world problem. Psychol Today 2:60–67Google Scholar
  45. 45.
    Barabasi A-L, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512.  https://doi.org/10.1126/science.286.5439.509. http://science.sciencemag.org/content/286/5439/509 MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  47. 47.
    Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 143(1):29–36.  https://doi.org/10.1148/radiology.143.1.7063747 CrossRefGoogle Scholar
  48. 48.
    Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874.  https://doi.org/10.1016/j.patrec.2005.10.010 MathSciNetCrossRefGoogle Scholar
  49. 49.
    Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on machine learning, ICML ’06. ACM, New York, pp 233-240, DOI  https://doi.org/10.1145/1143844.1143874
  50. 50.
    Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10:1–21.  https://doi.org/10.1371/journal.pone.0118432 Google Scholar
  51. 51.
    Markov NT, Ercsey-Ravasz MM, Ribeiro Gomes AR, Lamy C, Magrou L, Vezoli J, Misery P, Falchier A, Quilodran R, Gariel MA, Sallet J, Gamanut R, Huissoud C, Clavagnier S, Giroud P, Sappey-Marinier D, Barone P, Dehay C, Toroczkai Z, Knoblauch K, Van Essen DC, Kennedy H (2014) A weighted and directed interareal connectivity matrix for macaque cerebral cortex. Cereb Cortex 24 (1):17–36.  https://doi.org/10.1093/cercor/bhs270 CrossRefGoogle Scholar
  52. 52.
    Girvan MM, Newman EJ (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99(12):7821–7826.  https://doi.org/10.1073/pnas.122653799 MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    Adamic LA, Glance N (2005) The political blogosphere and the 2004 U.S. election: Divided they blog. In: Proceedings of the 3rd international workshop on link discovery, LinkKDD ’05. ACM, New York, pp 36–43, DOI  https://doi.org/10.1145/1134271.1134277
  54. 54.
    Bu D, Zhao Y, Cai L, Xue H, Zhu X, Lu H, Zhang J, Sun S, Ling L, Zhang N, Li G, Chen R (2003) Topological structure analysis of the protein–protein interaction network in budding yeast. Nucleic Acids Res 31(9):2443–2450.  https://doi.org/10.1093/nar/gkg340 CrossRefGoogle Scholar
  55. 55.
    Šubelj L, Bajec M (2012) Ubiquitousness of link-density and link-pattern communities in real-world networks. Europ Phys J B 85(1):32.  https://doi.org/10.1140/epjb/e2011-20448-7 CrossRefGoogle Scholar
  56. 56.
    Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Int Math 1:226–251MathSciNetzbMATHGoogle Scholar
  57. 57.
    Ou Q, Jin Y-D, Zhou T, Wang B-H, Yin B-Q (2007) Power-law strength-degree correlation from resource-allocation dynamics on weighted networks. Phys Rev E 75:021102.  https://doi.org/10.1103/PhysRevE.75.021102 CrossRefGoogle Scholar
  58. 58.
    Liu Z, Zhang Q-M, Lü L, Zhou T (2011) Link prediction in complex networks: a local naïve bayes model. EPL (Europhys Lett) 96(4):48007. http://stacks.iop.org/0295-5075/96/i=4/a=48007 CrossRefGoogle Scholar
  59. 59.
    Schank T, Wagner D (2005) Approximating clustering coefficient and transitivity. J Graph Algorithms Appl 9:265–275MathSciNetCrossRefzbMATHGoogle Scholar
  60. 60.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30. http://dl.acm.org/citation.cfm?id=1248547.1248548 MathSciNetzbMATHGoogle Scholar
  61. 61.
    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32 (200):675–701.  https://doi.org/10.1080/01621459.1937.10503522 CrossRefzbMATHGoogle Scholar
  62. 62.
    Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Statist 11(1):86–92.  https://doi.org/10.1214/aoms/1177731944 MathSciNetCrossRefzbMATHGoogle Scholar
  63. 63.
    Lü L, Pan L, Zhou T, Zhang Y-C, Stanley HE (2015) Toward link predictability of complex networks. Proc Natl Acad Sci 112(8):2325–2330.  https://doi.org/10.1073/pnas.1424644112 MathSciNetCrossRefzbMATHGoogle Scholar
  64. 64.
    Wang X, Sukthankar G (2013) Link prediction in multi-relational collaboration networks. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM ’13. ACM, New YorkA, pp 1445–1447, DOI  https://doi.org/10.1145/2492517.2492584

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology (BHU)VaranasiIndia

Personalised recommendations