Skip to main content

Crawling Social Web with Cluster Coverage Sampling

  • Conference paper
  • First Online:
Software Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 731))

  • 2353 Accesses

Abstract

Social network can be viewed as a huge container of nodes and relationship edges between the nodes. Covering every node of social network in the analysis process faces practical inabilities due to gigantic size of social network. Solution to this is to take a sample by collecting few nodes and relationship status of huge network. This sample can be considered as a representative of complete network, and analysis is carried out on this sample. Resemblance of results derived by analysis with reality majorly depends on the extent up to which a sample resembles with its actual network. Sampling, hence, appears to be one of the major challenges for social network analysis. Most of the social networks are scale-free networks and can be seen having overlapping clusters. This paper develops a robust social Web crawler that uses a sampling algorithm which considers clustered view of social graph. Sample will be a good representative of the network if it has similar clustered view as actual graph.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. http://www.socialbakers.com/statistics/facebook/

  2. http://www.socialbakers.com/statistics/twitter/

  3. Srivastava, A., Anuradha, Gupta, D.J.: Social network analysis: hardly easy. IEEE Int. Conf. Reliab. Optim. Inf. Technol. 6, 128–135 (2014)

    Google Scholar 

  4. Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Practical recommendations on crawling online social networks. IEEE J. Sel. Areas Commun. 29(9), 1872–1892 (2011)

    Article  Google Scholar 

  5. Lee, S.H., Kim, P.-J., Jeong, H.: Statistical properties of sampled networks. Phy. Rev. E 73, 016102 (2006)

    Article  Google Scholar 

  6. Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM (2010)

    Google Scholar 

  7. Ribeiro, B., Towsley, D.: Estimating and sampling graphs with multidimensional random walks. In: Proceedings of ACM IMC (2010)

    Google Scholar 

  8. Cho, M., Lee, J., Lee, K.M.: Reweighted random walks for graph matching. ECCV 2010, Part V, LNCS 6315, pp. 492–505 (2010)

    Chapter  Google Scholar 

  9. Lee, C.H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM (2013)

    Google Scholar 

  10. Li, R.H., Yu, J.X., Qin, L., Mao, R., Jin, T.: On random walk based graph sampling. IEEE ICDE Conference (2015)

    Google Scholar 

  11. Ribeiro, B., Wang, P., Murai, F., Towsley, D.: Sampling directed graphs with random walks. In: Ribeiro, B., Wang, P., Murai, F., Towsley, D. (eds.) UMass CMPSCI Technical Report UMCS (2011)

    Google Scholar 

  12. Wilson, C., Boe, B., Sala, A., Puttaswamy, K.P.N., Zhao, B.Y.: User interactions in social networks and their implications. In: Proceedings of ACM EuroSys (2009)

    Google Scholar 

  13. Wang, T., Chen, Y., Zhang, Z., Xu, T., Jin, L., Hui, P., Deng, B., Li, X.: Understanding graph sampling algorithms for social network analysis. In: 2011 31st International Conference on Distributed Computing Systems Workshops (ICDCSW), pp. 123, 128, 20–24 June 2011

    Google Scholar 

  14. Ahn, Y., Han, S., Kwak, H., Moon, S., Jeong, H.: Analysis of topological characteristics of huge online social networking services. In: Proceedings of WWW (2007)

    Google Scholar 

  15. Corlette, D., Shipman, F.: Capturing on-line social network link dynamics using event-driven sampling. In: IEEE International Conference on Computational Science and Engineering (2009)

    Google Scholar 

  16. Gjoka, M., Butts, C.T., Kurant, M., Markopoulou, A.: Multigraph sampling of online social networks. IEEE J. Sel. Areas Commun. 29(9), 1893–1905 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Atul Srivastava .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Srivastava, A., Anuradha, Gupta, D.J. (2019). Crawling Social Web with Cluster Coverage Sampling. In: Hoda, M., Chauhan, N., Quadri, S., Srivastava, P. (eds) Software Engineering. Advances in Intelligent Systems and Computing, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-10-8848-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8848-3_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8847-6

  • Online ISBN: 978-981-10-8848-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics