Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API

Joseph, Kenneth; Landwehr, Peter M.; Carley, Kathleen M.

doi:10.1007/978-3-319-05579-4_10

Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API

Kenneth Joseph¹⁹,
Peter M. Landwehr¹⁹ &
Kathleen M. Carley¹⁹

Conference paper

4403 Accesses
31 Citations
5 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8393))

Abstract

We compare samples of tweets from the Twitter Streaming API constructed from different connections that tracked the same popular keywords at the same time. We find that on average, over 96% of the tweets seen in one sample are seen in all others. Those tweets found only in a subset of samples do not significantly differ from tweets found in all samples in terms of user popularity or tweet structure. We conclude they are likely the result of a technical artifact rather than any systematic bias.

Practically, our results show that an infinite number of Streaming API samples are necessary to collect “most” of the tweets containing a popular keyword, and that findings from one sample from the Streaming API are likely to hold for all samples that could have been taken. Methodologically, our approach is extendible to other types of social media data beyond Twitter.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

National Research Council: Frontiers in Massive Data Analysis. The National Academies Press (2013)
Google Scholar
Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is the sample good enough? comparing data from twitter’s streaming API with twitter’s firehose. In: The 7th International Conference on Weblogs and Social Media (ICWSM 2013), Boston, MA (2013)
Google Scholar
Li, R., Wang, S., Chen-Chuan, K.: Towards social data platform: Automatic topic-focused monitor for twitter stream. Proceedings of the VLDB Endowment 6(14) (2013)
Google Scholar
Boyd, D., Crawford, K.: Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15(5), 662–679 (2012)
Article Google Scholar
Wu, S., Hofman, J.M., Mason, W.A., Watts, D.J.: Who says what to whom on twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 705–714. ACM, New York (2011)
Google Scholar
Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter contribute to situational awareness. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, pp. 1079–1088. ACM, New York (2010)
Google Scholar
Ghosh, S., Zafar, M.B., Bhattacharya, P., Sharma, N., Ganguly, N., Gummadi, K.P.: On sampling the wisdom of crowds: Random vs. expert sampling of the twitter stream. In: CIKM (2013)
Google Scholar
González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., Moreno, Y.: Assessing the bias in communication networks sampled from twitter. Available at SSRN (2012)
Google Scholar
Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM 2011, pp. 65–74. ACM, New York (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Kenneth Joseph, Peter M. Landwehr & Kathleen M. Carley

Authors

Kenneth Joseph
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. Landwehr
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen M. Carley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Social Complexity and Department of Computational Social Science, George Mason University, 4400 University Drive, MS 6B2, 22030-4400, Fairfax, VA, USA
William G. Kennedy
University of Arkansas at Little Rock, 2801 South University Avenue, EIT Building, Room 553, 72204, Little Rock, AR, USA
Nitin Agarwal
Department of Computer Engineering, Rochester Institute of Technology, 83 Lomb Memorial Drive, Bldg 09, 14623-5603, Rochester, NY, USA
Shanchieh Jay Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Joseph, K., Landwehr, P.M., Carley, K.M. (2014). Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API. In: Kennedy, W.G., Agarwal, N., Yang, S.J. (eds) Social Computing, Behavioral-Cultural Modeling and Prediction. SBP 2014. Lecture Notes in Computer Science, vol 8393. Springer, Cham. https://doi.org/10.1007/978-3-319-05579-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-05579-4_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05578-7
Online ISBN: 978-3-319-05579-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics