Skip to main content

On Sampling Type Distribution from Heterogeneous Social Networks

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6635))

Abstract

Social network analysis has drawn the attention of many researchers recently. As the advance of communication technologies, the scale of social networks grows rapidly. To capture the characteristics of very large social networks, graph sampling is an important approach that does not require visiting the entire network. Prior studies on graph sampling focused on preserving the properties such as degree distribution and clustering coefficient of a homogeneous graph, where each node and edge is treated equally. However, a node in a social network usually has its own attribute indicating a specific group membership or type. For example, people are of different races or nationalities. The link between individuals from the same or different types can thus be classified to intra- and inter-connections. Therefore, it is important whether a sampling method can preserve the node and link type distribution of the heterogeneous social networks. In this paper, we formally address this issue. Moreover, we apply five algorithms to the real Twitter data sets to evaluate their performance. The results show that respondent-driven sampling works well even if the sample sizes are small while random node sampling works best only under large sample sizes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Navlakha, S., Rastogi, R., Shrivastava, N.: Graph summarization with bounded error. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 419–432 (2008)

    Google Scholar 

  2. Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proc. of Int. Conf. on Very Large Data Bases, p. 732 (2005)

    Google Scholar 

  3. Raghavan, S., Garcia-Molina, H.: Representing web graphs. In: Proc. of IEEE Int. Conf. on Data Engineering, pp. 405–416 (2003)

    Google Scholar 

  4. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: Proc. of Int. Conf. on Very Large Data Bases, pp. 639–650 (1999)

    Google Scholar 

  5. Li, C.T., Lin, S.D.: Egocentric Information Abstraction for Heterogeneous Social Networks. In: Proc. of Int. Conf. on Advances in Social Network Analysis and Mining, pp. 255–260 (2009)

    Google Scholar 

  6. Tian, Y., Hankins, R., Patel, J.: Efficient aggregation for graph summarization. In: Proc. of ACM SIGMOD Int. Conf. on Management of Data, pp. 567–580 (2008)

    Google Scholar 

  7. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proc. of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p. 636 (2006)

    Google Scholar 

  8. Hübler, C., Kriegel, H., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Proc. of IEEE Int. Conf. on Data Mining, pp. 283–292 (2008)

    Google Scholar 

  9. Ma, H., Gustafson, S., Moitra, A., Bracewell, D.: Ego-centric Network Sampling in Viral Marketing Applications. In: Int. Conf. on Computational Science and Engineering, pp. 777–781 (2009)

    Google Scholar 

  10. Heckathorn, D.: Respondent-driven sampling: a new approach to the study of hidden populations. Social problems 44, 174–199 (1997)

    Article  Google Scholar 

  11. Choudhury, M.D.: Social datasets by munmun de choudhury (2010), http://www.public.asu.edu/~mdechoud/datasets.html

  12. Krishnamurthy, V., Faloutsos, M., Chrobak, M., Lao, L., Cui, J.-H., Percus, A.G.: Reducing large internet topologies for faster simulations. In: Boutaba, R., Almeroth, K.C., Puigjaner, R., Shen, S., Black, J.P. (eds.) NETWORKING 2005. LNCS, vol. 3462, pp. 328–341. Springer, Heidelberg (2005)

    Google Scholar 

  13. Heckathorn, D.: Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations. Social Problems 49, 11–34 (2002)

    Article  Google Scholar 

  14. Lovász, L.: Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty 2, 1–46 (1993)

    Google Scholar 

  15. Kemeny, J.G., Snell, J.L.: Finite Markov Chains, pp. 69–72. Springer, Heidelberg (1960)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, JY., Yeh, MY. (2011). On Sampling Type Distribution from Heterogeneous Social Networks. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20847-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20847-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20846-1

  • Online ISBN: 978-3-642-20847-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics