Practical characterization of large networks using neighborhood information

  • Pinghui Wang
  • Junzhou Zhao
  • Bruno Ribeiro
  • John C. S. Lui
  • Don Towsley
  • Xiaohong Guan
Regular Paper


Characterizing large complex networks such as online social networks through node querying is a challenging task. Network service providers often impose severe constraints on the query rate, hence limiting the sample size to a small fraction of the total network of interest. Various ad hoc subgraph sampling methods have been proposed, but many of them give biased estimates and no theoretical basis on the accuracy. In this work, we focus on developing sampling methods for large networks where querying a node also reveals partial structural information about its neighbors. Our methods are optimized for NoSQL graph databases (if the database can be accessed directly), or utilize Web APIs available on most major large networks for graph sampling. We show that our sampling method has provable convergence guarantees on being an unbiased estimator, and it is more accurate than state-of-the-art methods. We also explore methods to uncover shortest paths between a subset of nodes and detect high degree nodes by sampling only a small fraction of the network of interest. Our results demonstrate that utilizing neighborhood information yields methods that are two orders of magnitude faster than state-of-the-art methods.


Crawling Graph sampling Online social network Random walk 



The authors wish to thank the anonymous reviewers for their helpful feedback. This work was supported in part by Army Research Office Contract W911NF-12-1-0385, and ARL under Cooperative Agreement W911NF-09-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied of the ARL, or the U.S. Government. The work was also supported in part by National Natural Science Foundation of China (61603290, 61602371, U1301254), Ministry of Education & China Mobile Research Fund (MCM20160311), China Postdoctoral Science Foundation (2015M582663), Natural Science Basic Research Plan in Zhejiang Province of China (LGG18F020016), Natural Science Basic Research Plan in Shaanxi Province of China (2016JQ6034, 2017JM6095), Shenzhen Basic Research Grant (JCYJ20160229195940462).


  1. 1.
    Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: SIGKDD, pp 631–636Google Scholar
  2. 2.
    Hubler C et al (2008) Metropolis algorithms for representative subgraph sampling. In: ICDM, pp 283–292Google Scholar
  3. 3.
    Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: SIGKDD, pp 105–113Google Scholar
  4. 4.
    Ahmed NK et al (2012) Network sampling: from static to streaming graphs. TKDD 8(2):7:1–7:56Google Scholar
  5. 5.
    Dasgupta A et al (2012) Social sampling. In: SIGKDD, pp 235–243Google Scholar
  6. 6.
    Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: IMC, pp 390–403Google Scholar
  7. 7.
    Gjoka M et al (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: INFOCOM, pp 2498–2506Google Scholar
  8. 8.
    Ribeiro B, Towsley D (2012) On the estimation accuracy of degree distributions from graph sampling. In: CDC, pp 1–6Google Scholar
  9. 9.
    Avrachenkov K et al (2010) Improving random walk estimation accuracy with uniform restarts. In: WAW, pp 98–109Google Scholar
  10. 10.
    Graybill FA, Deal RB (1959) Combining unbiased estimators. Biometrics 15(4):543–550MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Lovász L (1993) Random walks on graphs: a survey. Combinatorics 2:1–46Google Scholar
  12. 12.
    Ribeiro B et al (2010) Multiple random walks to uncover short paths in power law networks. In: INFOCOM NetSciCom, pp 1–6Google Scholar
  13. 13.
    Roberts GO, Rosenthal JS (2004) General state space Markov chains and MCMC algorithms. Probab Surv 1:20–71MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Jones GL (2004) On the Markov chain central limit theorem. Probab Surv 1:299–320MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Kurant M et al (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: SIGMETRICS, pp 281–292Google Scholar
  16. 16.
    Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. JASA 47:663–685MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Lee CH et al (2012) Beyond random walk and Metropolis–Hastings samplers: Why you should not backtrack for unbiased graph sampling. In: SIGMETRICS/Performance, pp 319–330Google Scholar
  18. 18.
    Lim Y et al (2011) Online estimating the \(k\) central nodes of a network. In: NSW, pp 1–6Google Scholar
  19. 19.
    Cooper C et al (2012) A fast algorithm to find all high degree vertices in power law graphs. In: WWW LSNA, pp 1007–1016Google Scholar
  20. 20.
    Coppersmith D et al (1993) Random walks on weighted graphs, and applications to on-line algorithms (extended). J ACM 40:421–453CrossRefMATHGoogle Scholar
  21. 21.
    Maiya AS, Berger-Wolf TY (2010) Online sampling of high centrality individuals in social networks. In: PAKDD, pp 91–98Google Scholar
  22. 22.
    Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: SIGKDD, pp 105–113Google Scholar
  23. 23.
    Hui P et al (2008) BUBBLE Rap: social-based forwarding in delay tolerant networks. In: MobiHoc, pp 241–250Google Scholar
  24. 24.
    Ribeiro B et al (2012) Multiple random walks to uncover short paths in power law networks. In: Infocom NetSciCom, pp 1–6Google Scholar
  25. 25.
    Wang P et al (2012) Sampling contents distributed over graphs. Technical Report TR-1201, Xi’an Jiaotong UniversityGoogle Scholar
  26. 26.
    Mislove A et al (2007) Measurement and analysis of online social networks. In: IMC, pp 29–42Google Scholar
  27. 27.
    Richardson M et al (2003) Trust management for the semantic web. In: ISWC, pp 351–368Google Scholar
  28. 28.
    Leskovec J et al (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Ribeiro B et al (2012) Sampling directed graphs with random walks. In: INFOCOM, pp 1692–1700Google Scholar
  30. 30.
    Kurant M et al (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: SIGMETRICS, pp 241–252Google Scholar
  31. 31.
    Kurant M et al (2011) Towards unbiased BFS sampling. JSAC 29(9):1799–1809Google Scholar
  32. 32.
    Heckathorn DD (2002) Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Soc Probl 49(1):11–34CrossRefGoogle Scholar
  33. 33.
    Salganik MJ, Heckathorn DD (2004) Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Methodol 49(1):11–34Google Scholar
  34. 34.
    Stutzbach D et al (2009) On unbiased sampling for unstructured peer-to-peer networks. TON 17(2):377–390Google Scholar
  35. 35.
    Rasti AH et al (2009) Respondent-driven sampling for characterizing unstructured overlays. In: INFOCOM Mini-conference, pp 2701–2705Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  • Pinghui Wang
    • 1
    • 2
  • Junzhou Zhao
    • 3
  • Bruno Ribeiro
    • 4
  • John C. S. Lui
    • 5
  • Don Towsley
    • 6
  • Xiaohong Guan
    • 1
    • 7
  1. 1.MOE Key Laboratory for Intelligent Networks and Network SecurityXi’an Jiaotong UniversityXi’anChina
  2. 2.Shenzhen Research Institute of Xi’an Jiaotong UniversityShenzhenChina
  3. 3.Division of Computer, Electrical and Mathematical Sciences and EngineeringKing Abdullah University of Science and TechnologyThuwalSaudi Arabia
  4. 4.School of Computer SciencePurdue UniversityWest LafayetteUSA
  5. 5.Department of Computer Science and EngineeringThe Chinese University of Hong KongSha TinHong Kong
  6. 6.Department of Computer ScienceUniversity of Massachusetts AmherstAmherstUSA
  7. 7.Center for Intelligent and Networked SystemsTsinghua UniversityBeijingChina

Personalised recommendations