An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years
The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.
KeywordsHIV Chlamydia Gonorrhea Social media Big data
This work was funded by a National Institutes of Health grant. We are grateful to Travis Sanchez, Patrick S. Sullivan, and Yisi Liu for their help in data collection.
Compliance with Ethical Standards
Conflict of interest
The authors declare that they have no conflict of interest.
- 1.Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2016. Atlanta, GA; 2017Google Scholar
- 2.Centers for Disease Control and Prevention. NCHHSTP AtlasPlus. https://www.cdc.gov/nchhstp/atlas/. Accessed 25 May 2017.
- 5.Centers for Disease Control and Prevention. Overview of the CDC FY 2018 budget request. 2017. https://www.cdc.gov/budget/documents/fy2018/fy-2018-cdc-budget-overview.pdf.
- 6.Garcia-Calleja JM, Jacobson J, Garg R, et al. Has the quality of serosurveillance in low- and middle-income countries improved since the last HIV estimates round in 2007? Status and trends through 2009. Sex Transm Infect. 2010;86(Suppl 2):ii35–ii42. https://doi.org/10.1136/sti.2010.043653.
- 10.Aicken CR, Estcourt CS, Johnson AM, Sonnenberg P, Wellings K, Mercer CH. Use of the internet for sexual health among sexually experienced persons aged 16 to 44 years: evidence from a nationally representative survey of the British population. J Med Internet Res. 2016;18(1):e14. https://doi.org/10.2196/jmir.4373.CrossRefPubMedPubMedCentralGoogle Scholar
- 14.Blackstock OJ, Cunningham CO, Haughton LJ, Garner RY, Norwood C, Horvath KJ. Higher eHealth literacy is associated with HIV risk behaviors among HIV-infected women who use the internet. J Assoc Nurses AIDS Care. 2016;27(1):102–8. https://doi.org/10.1016/j.jana.2015.09.001.CrossRefPubMedGoogle Scholar
- 16.Lenhart A, Purcell K, Smith A, Zickuhr K. Social media & mobile internet use among teens and young adults. Pew Research Center. http://www.pewinternet.org/2010/02/03/social-media-and-young-adults/. Published 2010.
- 17.Pew Research Center. Social media fact sheet. http://www.pewinternet.org/fact-sheet/social-media/.
- 19.Harfenist E, Cohen A. How opioid addicts are using social media to get clean. The Week. April 30, 2017.Google Scholar
- 24.TechAmerica Foundation. Demystifying Big Data: A Practical Guide to Transforming the Business of Government. 2012. https://bigdatawg.nist.gov/_uploadfiles/M0068_v1_3903747095.pdf.
- 32.Twitter. Twitter usage. Company Facts. https://about.twitter.com/company. Published 2016.
- 33.Twitter. Getting started with Twitter. The Basics. https://support.twitter.com/articles/215585. Published 2016. Accessed 18 April 2016.
- 34.Statista. Social media: daily usage in selected countries as 4th quarter 2015 (fee-based). Social Media & User-Generated Content. http://www.statista.com/statistics/270229/usage-duration-of-social-networks-by-country/. Published 2015. Accessed April 18, 2016.
- 35.Lenhart A, Smith A, Anderson M, Duggan M, Perrin A. Teens, technology and friendships. Pew Research Center. http://www.pewinternet.org/2015/08/06/teens-technology-and-friendships/. Published 2015. Accessed 21 March 2016.
- 36.Greenwood S, Perrin A, Duggan M. Social Media Update 2016. 2016. http://www.pewinternet.org/2016/11/11/social-media-update-2016/.
- 37.Schwartz HA, Eichstaedt JC, Kern ML, et al. Characterizing geographic variation in well-being using tweets. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM). Boston, MA 2013.Google Scholar
- 38.Schwartz HA, Giorgi S, Sap M, Crutchley P, Eichstaedt JC, Ungar LH. DLATK: Differential language analysis ToolKit. In: Proceedings of the 2017 EMNLP system demonstrations. 2017:55–60.Google Scholar
- 39.Pennebaker JW, Mehl MR, Niederhoffer KG. Psychological aspects of natural language use: our words, our selves. Annu Rev Psychol. 2003;54:547–77. https://doi.org/10.1146/annurev.psych.54.101601.145041.CrossRefPubMedGoogle Scholar
- 40.Lazer D, Kennedy R, King G, Vespignani A. The parable of google flu: traps in big data analysis. Science (80-). 2014;343(6176):1203–1205. https://doi.org/10.1126/science.1248506.
- 41.Gouws S, Metzler D, Cai C, Hovy E, Rey M. Contextual bearing on linguistic variation in social media. In: Proceedings of the workshop on languages in social media. 2011, pp. 20–29Google Scholar
- 47.Iacobelli F, Gill AJ, Nowson S, Oberlander J. Large scale personality classification of bloggers. In: Proceedings of the 4th international conference on affective computing and intelligent interaction. New York, NY: Springer 2011:568–577. https://doi.org/10.1007/978-3-642-24571-8_71.
- 48.Centers for Disease Control and Prevention. National center for health: health indicators warehouse. www.healthindicators.gov. Accessed February 28, 2016.
- 49.Emory University. Rollins school of public health. AIDSVu. 2016. www.aidsvu.org.
- 51.Howell DC. Statistical methods for psychology. 6th ed. Belmont, CA: Thomson Wadsworth; 2007.Google Scholar