Skip to main content

Advertisement

Log in

An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years

  • Original Paper
  • Published:
AIDS and Behavior Aims and scope Submit manuscript

Abstract

The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We compared a set of random tweets with and without location/coordinate information (N = 3,000), which showed remarkably similar vocabulary sizes: yes = 10,911 and no = 11,148, character count (per tweet): yes = 90 and no = 87, and word count (per tweet): yes = 15 and no = 14.

  2. Google Maps Geocoding API is not a free web service (see https://developers.google.com/maps/faq for detailed pricing). Therefore, it is necessary to develop a reliable geo-mapping program for mapping millions of tweets.

  3. About 19% of all tweets can be geo-mapped and left a large part of tweets excluded from the analyses. We can’t assess the model performance of tweets that are with and without location/coordinate information because all HIV/STI new diagnoses rates are reported at the county-level.

  4. We present the rank-based residuals of the actual (non-log-transformed) STI rates and the back-transformed ORIs (see Table in Supplementary Information). The overall residuals showed negligible differences for HIV, gonorrhea, and chlamydia, implying that the semantic models showed no strong biases. Altogether, the semantic models using Twitter language thus provided satisfactory performance in estimating the county-level STI risk.

References

  1. Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2016. Atlanta, GA; 2017

  2. Centers for Disease Control and Prevention. NCHHSTP AtlasPlus. https://www.cdc.gov/nchhstp/atlas/. Accessed 25 May 2017.

  3. Owusu-Edusei K, Chesson HW, Gift TL, et al. The estimated direct medical cost of selected sexually transmitted infections in the United States, 2008. Sex Transm Dis. 2013;40(3):197–201. https://doi.org/10.1097/OLQ.0b013e318285c6d2.

    Article  PubMed  Google Scholar 

  4. Himmelstein DU, Woolhandler S. Public health’s falling share of US health spending. Am J Public Health. 2016;106(1):56–7. https://doi.org/10.2105/AJPH.2015.302908.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Centers for Disease Control and Prevention. Overview of the CDC FY 2018 budget request. 2017. https://www.cdc.gov/budget/documents/fy2018/fy-2018-cdc-budget-overview.pdf.

  6. Garcia-Calleja JM, Jacobson J, Garg R, et al. Has the quality of serosurveillance in low- and middle-income countries improved since the last HIV estimates round in 2007? Status and trends through 2009. Sex Transm Infect. 2010;86(Suppl 2):ii35–ii42. https://doi.org/10.1136/sti.2010.043653.

  7. Davis SL, Goedel WC, Emerson J, Guven BS. Punitive laws, key population size estimates, and Global AIDS response progress reports: an ecological study of 154 countries. J Int AIDS Soc. 2017;20(1):21386. https://doi.org/10.7448/IAS.20.1.21386.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Sun CJ, Reboussin B, Mann L, Garcia M, Rhodes SD. The HIV risk profiles of Latino sexual minorities and transgender women who use websites and mobile apps designed for social and sexual networking. Heal Educ Behav. 2016;43(1):86–93. https://doi.org/10.1177/1090198115596735.

    Article  Google Scholar 

  9. Ayers JW, Althouse BM, Dredze M, Leas EC, Noar SM. News and internet searches about human immunodeficiency virus after Charlie Sheen’s disclosure. JAMA Intern Med. 2016;176(4):552. https://doi.org/10.1001/jamainternmed.2016.0003.

    Article  PubMed  Google Scholar 

  10. Aicken CR, Estcourt CS, Johnson AM, Sonnenberg P, Wellings K, Mercer CH. Use of the internet for sexual health among sexually experienced persons aged 16 to 44 years: evidence from a nationally representative survey of the British population. J Med Internet Res. 2016;18(1):e14. https://doi.org/10.2196/jmir.4373.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Young SD, Nianogo RA, Chiu CJ, Menacho L, Galea J. Substance use and sexual risk behaviors among Peruvian MSM social media users. AIDS Care. 2016;28(1):112–8. https://doi.org/10.1080/09540121.2015.1069789.

    Article  PubMed  Google Scholar 

  12. Saberi P, Johnson MO. Correlation of Internet use for health care engagement purposes and HIV clinical outcomes among HIV-positive individuals using online social media. J Health Commun. 2015;20(9):1026–32. https://doi.org/10.1080/10810730.2015.1018617.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Leite L, Buresh M, Rios N, Conley A, Flys T, Page KR. Cell phone utilization among foreign-born Latinos: a promising tool for dissemination of health and HIV information. J Immigr Minor Heal. 2014;16(4):661–9. https://doi.org/10.1007/s10903-013-9792-x.

    Article  Google Scholar 

  14. Blackstock OJ, Cunningham CO, Haughton LJ, Garner RY, Norwood C, Horvath KJ. Higher eHealth literacy is associated with HIV risk behaviors among HIV-infected women who use the internet. J Assoc Nurses AIDS Care. 2016;27(1):102–8. https://doi.org/10.1016/j.jana.2015.09.001.

    Article  PubMed  Google Scholar 

  15. Pennise M, Inscho R, Herpin K, et al. Using smartphone apps in STD interviews to find sexual partners. Public Health Rep. 2015;130(3):245–52. https://doi.org/10.1177/003335491513000311.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lenhart A, Purcell K, Smith A, Zickuhr K. Social media & mobile internet use among teens and young adults. Pew Research Center. http://www.pewinternet.org/2010/02/03/social-media-and-young-adults/. Published 2010.

  17. Pew Research Center. Social media fact sheet. http://www.pewinternet.org/fact-sheet/social-media/.

  18. Benotsch EG, Kalichman S, Cage M. Men who have met sex partners via the Internet: prevalence, predictors, and implications for HIV prevention. Arch Sex Behav. 2002;31(2):177–83. https://doi.org/10.1023/A:1014739203657.

    Article  PubMed  Google Scholar 

  19. Harfenist E, Cohen A. How opioid addicts are using social media to get clean. The Week. April 30, 2017.

  20. Saito S, Howard AA, Chege D, et al. Monitoring quality at scale. AIDS. 2015;29:S129–36. https://doi.org/10.1097/QAD.0000000000000713.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Bushman FD, Barton S, Bailey A, et al. Bringing it all together. AIDS. 2013;27(5):835–8. https://doi.org/10.1097/QAD.0b013e32835cb785.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Young SD, Rivers C, Lewis B. Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Prev Med An Int J Devoted to Pract Theory. 2014;63:112–5. https://doi.org/10.1016/j.ypmed.2014.01.024.

    Article  Google Scholar 

  23. Khoury MJ, Ioannidis JPA. Medicine. Big data meets public health. Science. 2014;346(6213):1054–5. https://doi.org/10.1126/science.aaa2709.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. TechAmerica Foundation. Demystifying Big Data: A Practical Guide to Transforming the Business of Government. 2012. https://bigdatawg.nist.gov/_uploadfiles/M0068_v1_3903747095.pdf.

  25. Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS ONE. 2011;6(5):e19467. https://doi.org/10.1371/journal.pone.0019467.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Aslam AA, Tsou M-H, Spitzberg BH, et al. The reliability of tweets as a supplementary method of seasonal influenza surveillance. J Med Internet Res. 2014;16(11):e250. https://doi.org/10.2196/jmir.3532.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Santos JC, Matos S. Analysing Twitter and web queries for flu trend prediction. Theor Biol Med Model. 2014;11(Suppl 1):S6. https://doi.org/10.1186/1742-4682-11-S1-S6.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Young SD. Behavioral insights on big data: using social media for predicting biomedical outcomes. Trends Microbiol. 2014;22(11):601–2. https://doi.org/10.1016/j.tim.2014.08.004.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Ireland ME, Chen Q, Schwartz HA, Ungar LH, Albarracín D. Action tweets linked to reduced county-level HIV prevalence in the United States: online messages and structural determinants. AIDS Behav. 2016;20(6):1256–64. https://doi.org/10.1007/s10461-015-1252-2.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Ireland ME, Schwartz HA, Chen Q, Ungar LH, Albarracín D. Future-oriented tweets predict lower county-level HIV prevalence in the United States. Heal Psychol. 2015;34(Suppl):1252–60. https://doi.org/10.1037/hea0000279.

    Article  Google Scholar 

  31. Eichstaedt JC, Schwartz HA, Kern ML, et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci. 2015;26(2):159–69. https://doi.org/10.1177/0956797614557867.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Twitter. Twitter usage. Company Facts. https://about.twitter.com/company. Published 2016.

  33. Twitter. Getting started with Twitter. The Basics. https://support.twitter.com/articles/215585. Published 2016. Accessed 18 April 2016.

  34. Statista. Social media: daily usage in selected countries as 4th quarter 2015 (fee-based). Social Media & User-Generated Content. http://www.statista.com/statistics/270229/usage-duration-of-social-networks-by-country/. Published 2015. Accessed April 18, 2016.

  35. Lenhart A, Smith A, Anderson M, Duggan M, Perrin A. Teens, technology and friendships. Pew Research Center. http://www.pewinternet.org/2015/08/06/teens-technology-and-friendships/. Published 2015. Accessed 21 March 2016.

  36. Greenwood S, Perrin A, Duggan M. Social Media Update 2016. 2016. http://www.pewinternet.org/2016/11/11/social-media-update-2016/.

  37. Schwartz HA, Eichstaedt JC, Kern ML, et al. Characterizing geographic variation in well-being using tweets. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM). Boston, MA 2013.

  38. Schwartz HA, Giorgi S, Sap M, Crutchley P, Eichstaedt JC, Ungar LH. DLATK: Differential language analysis ToolKit. In: Proceedings of the 2017 EMNLP system demonstrations. 2017:55–60.

  39. Pennebaker JW, Mehl MR, Niederhoffer KG. Psychological aspects of natural language use: our words, our selves. Annu Rev Psychol. 2003;54:547–77. https://doi.org/10.1146/annurev.psych.54.101601.145041.

    Article  PubMed  Google Scholar 

  40. Lazer D, Kennedy R, King G, Vespignani A. The parable of google flu: traps in big data analysis. Science (80-). 2014;343(6176):1203–1205. https://doi.org/10.1126/science.1248506.

  41. Gouws S, Metzler D, Cai C, Hovy E, Rey M. Contextual bearing on linguistic variation in social media. In: Proceedings of the workshop on languages in social media. 2011, pp. 20–29

  42. Schwartz HA, Eichstaedt JC, Kern ML, et al. Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE. 2013;8(9):e73791. https://doi.org/10.1371/journal.pone.0073791.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2012;3(4–5):993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993.

    Article  Google Scholar 

  44. Park G, Schwartz HA, Eichstaedt JC, et al. Automatic personality assessment through social media language. J Pers Soc Psychol. 2015;108(6):934–52. https://doi.org/10.1037/pspp0000020.

    Article  PubMed  Google Scholar 

  45. Kosinski M, Stillwell D, Graepel T. Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci. 2013;110(15):5802–5. https://doi.org/10.1073/pnas.1218772110.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Karon BP. The clinical interpretation of the Thematic Apperception Test, Rorschach, and other clinical data: a reexamination of statistical versus clinical prediction. Prof Psychol Res Pract. 2000;31(2):230–3.

    Article  Google Scholar 

  47. Iacobelli F, Gill AJ, Nowson S, Oberlander J. Large scale personality classification of bloggers. In: Proceedings of the 4th international conference on affective computing and intelligent interaction. New York, NY: Springer 2011:568–577. https://doi.org/10.1007/978-3-642-24571-8_71.

  48. Centers for Disease Control and Prevention. National center for health: health indicators warehouse. www.healthindicators.gov. Accessed February 28, 2016.

  49. Emory University. Rollins school of public health. AIDSVu. 2016. www.aidsvu.org.

  50. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1.

    Article  Google Scholar 

  51. Howell DC. Statistical methods for psychology. 6th ed. Belmont, CA: Thomson Wadsworth; 2007.

    Google Scholar 

  52. Cohen J. A power primer. Psychol Bull. 1992;112(1):155–9. https://doi.org/10.1037//0033-2909.112.1.155.

    Article  PubMed  CAS  Google Scholar 

  53. Adrover C, Bodnar T, Huang Z, Telenti A, Salathé M. Identifying adverse effects of HIV drug treatment and associated sentiments using Twitter. JMIR Public Heal Surveill. 2015;1(2):e7. https://doi.org/10.2196/publichealth.4488.

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by a National Institutes of Health grant. We are grateful to Travis Sanchez, Patrick S. Sullivan, and Yisi Liu for their help in data collection.

Funding

This study was funded by the National Institutes of Health (Grant Number R56 AI114501 to D. A.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Man-pui Sally Chan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 78 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chan, Mp.S., Lohmann, S., Morales, A. et al. An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years. AIDS Behav 22, 2322–2333 (2018). https://doi.org/10.1007/s10461-018-2046-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10461-018-2046-0

Keywords

Navigation