An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years

  • Man-pui Sally Chan
  • Sophie Lohmann
  • Alex Morales
  • Chengxiang Zhai
  • Lyle Ungar
  • David R. Holtgrave
  • Dolores Albarracín
Original Paper

Abstract

The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.

Keywords

HIV Chlamydia Gonorrhea Social media Big data 

Notes

Acknowledgements

This work was funded by a National Institutes of Health grant. We are grateful to Travis Sanchez, Patrick S. Sullivan, and Yisi Liu for their help in data collection.

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

10461_2018_2046_MOESM1_ESM.docx (79 kb)
Supplementary material 1 (DOCX 78 kb)

References

  1. 1.
    Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2016. Atlanta, GA; 2017Google Scholar
  2. 2.
    Centers for Disease Control and Prevention. NCHHSTP AtlasPlus. https://www.cdc.gov/nchhstp/atlas/. Accessed 25 May 2017.
  3. 3.
    Owusu-Edusei K, Chesson HW, Gift TL, et al. The estimated direct medical cost of selected sexually transmitted infections in the United States, 2008. Sex Transm Dis. 2013;40(3):197–201.  https://doi.org/10.1097/OLQ.0b013e318285c6d2.CrossRefPubMedGoogle Scholar
  4. 4.
    Himmelstein DU, Woolhandler S. Public health’s falling share of US health spending. Am J Public Health. 2016;106(1):56–7.  https://doi.org/10.2105/AJPH.2015.302908.CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Centers for Disease Control and Prevention. Overview of the CDC FY 2018 budget request. 2017. https://www.cdc.gov/budget/documents/fy2018/fy-2018-cdc-budget-overview.pdf.
  6. 6.
    Garcia-Calleja JM, Jacobson J, Garg R, et al. Has the quality of serosurveillance in low- and middle-income countries improved since the last HIV estimates round in 2007? Status and trends through 2009. Sex Transm Infect. 2010;86(Suppl 2):ii35–ii42.  https://doi.org/10.1136/sti.2010.043653.
  7. 7.
    Davis SL, Goedel WC, Emerson J, Guven BS. Punitive laws, key population size estimates, and Global AIDS response progress reports: an ecological study of 154 countries. J Int AIDS Soc. 2017;20(1):21386.  https://doi.org/10.7448/IAS.20.1.21386.CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Sun CJ, Reboussin B, Mann L, Garcia M, Rhodes SD. The HIV risk profiles of Latino sexual minorities and transgender women who use websites and mobile apps designed for social and sexual networking. Heal Educ Behav. 2016;43(1):86–93.  https://doi.org/10.1177/1090198115596735.CrossRefGoogle Scholar
  9. 9.
    Ayers JW, Althouse BM, Dredze M, Leas EC, Noar SM. News and internet searches about human immunodeficiency virus after Charlie Sheen’s disclosure. JAMA Intern Med. 2016;176(4):552.  https://doi.org/10.1001/jamainternmed.2016.0003.CrossRefPubMedGoogle Scholar
  10. 10.
    Aicken CR, Estcourt CS, Johnson AM, Sonnenberg P, Wellings K, Mercer CH. Use of the internet for sexual health among sexually experienced persons aged 16 to 44 years: evidence from a nationally representative survey of the British population. J Med Internet Res. 2016;18(1):e14.  https://doi.org/10.2196/jmir.4373.CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Young SD, Nianogo RA, Chiu CJ, Menacho L, Galea J. Substance use and sexual risk behaviors among Peruvian MSM social media users. AIDS Care. 2016;28(1):112–8.  https://doi.org/10.1080/09540121.2015.1069789.CrossRefPubMedGoogle Scholar
  12. 12.
    Saberi P, Johnson MO. Correlation of Internet use for health care engagement purposes and HIV clinical outcomes among HIV-positive individuals using online social media. J Health Commun. 2015;20(9):1026–32.  https://doi.org/10.1080/10810730.2015.1018617.CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Leite L, Buresh M, Rios N, Conley A, Flys T, Page KR. Cell phone utilization among foreign-born Latinos: a promising tool for dissemination of health and HIV information. J Immigr Minor Heal. 2014;16(4):661–9.  https://doi.org/10.1007/s10903-013-9792-x.CrossRefGoogle Scholar
  14. 14.
    Blackstock OJ, Cunningham CO, Haughton LJ, Garner RY, Norwood C, Horvath KJ. Higher eHealth literacy is associated with HIV risk behaviors among HIV-infected women who use the internet. J Assoc Nurses AIDS Care. 2016;27(1):102–8.  https://doi.org/10.1016/j.jana.2015.09.001.CrossRefPubMedGoogle Scholar
  15. 15.
    Pennise M, Inscho R, Herpin K, et al. Using smartphone apps in STD interviews to find sexual partners. Public Health Rep. 2015;130(3):245–52.  https://doi.org/10.1177/003335491513000311.CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Lenhart A, Purcell K, Smith A, Zickuhr K. Social media & mobile internet use among teens and young adults. Pew Research Center. http://www.pewinternet.org/2010/02/03/social-media-and-young-adults/. Published 2010.
  17. 17.
    Pew Research Center. Social media fact sheet. http://www.pewinternet.org/fact-sheet/social-media/.
  18. 18.
    Benotsch EG, Kalichman S, Cage M. Men who have met sex partners via the Internet: prevalence, predictors, and implications for HIV prevention. Arch Sex Behav. 2002;31(2):177–83.  https://doi.org/10.1023/A:1014739203657.CrossRefPubMedGoogle Scholar
  19. 19.
    Harfenist E, Cohen A. How opioid addicts are using social media to get clean. The Week. April 30, 2017.Google Scholar
  20. 20.
    Saito S, Howard AA, Chege D, et al. Monitoring quality at scale. AIDS. 2015;29:S129–36.  https://doi.org/10.1097/QAD.0000000000000713.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Bushman FD, Barton S, Bailey A, et al. Bringing it all together. AIDS. 2013;27(5):835–8.  https://doi.org/10.1097/QAD.0b013e32835cb785.CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Young SD, Rivers C, Lewis B. Methods of using real-time social media technologies for detection and remote monitoring of HIV outcomes. Prev Med An Int J Devoted to Pract Theory. 2014;63:112–5.  https://doi.org/10.1016/j.ypmed.2014.01.024.CrossRefGoogle Scholar
  23. 23.
    Khoury MJ, Ioannidis JPA. Medicine. Big data meets public health. Science. 2014;346(6213):1054–5.  https://doi.org/10.1126/science.aaa2709.CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    TechAmerica Foundation. Demystifying Big Data: A Practical Guide to Transforming the Business of Government. 2012. https://bigdatawg.nist.gov/_uploadfiles/M0068_v1_3903747095.pdf.
  25. 25.
    Signorini A, Segre AM, Polgreen PM. The use of Twitter to track levels of disease activity and public concern in the U.S. during the influenza A H1N1 pandemic. PLoS ONE. 2011;6(5):e19467.  https://doi.org/10.1371/journal.pone.0019467.CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Aslam AA, Tsou M-H, Spitzberg BH, et al. The reliability of tweets as a supplementary method of seasonal influenza surveillance. J Med Internet Res. 2014;16(11):e250.  https://doi.org/10.2196/jmir.3532.CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Santos JC, Matos S. Analysing Twitter and web queries for flu trend prediction. Theor Biol Med Model. 2014;11(Suppl 1):S6.  https://doi.org/10.1186/1742-4682-11-S1-S6.CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Young SD. Behavioral insights on big data: using social media for predicting biomedical outcomes. Trends Microbiol. 2014;22(11):601–2.  https://doi.org/10.1016/j.tim.2014.08.004.CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Ireland ME, Chen Q, Schwartz HA, Ungar LH, Albarracín D. Action tweets linked to reduced county-level HIV prevalence in the United States: online messages and structural determinants. AIDS Behav. 2016;20(6):1256–64.  https://doi.org/10.1007/s10461-015-1252-2.CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Ireland ME, Schwartz HA, Chen Q, Ungar LH, Albarracín D. Future-oriented tweets predict lower county-level HIV prevalence in the United States. Heal Psychol. 2015;34(Suppl):1252–60.  https://doi.org/10.1037/hea0000279.CrossRefGoogle Scholar
  31. 31.
    Eichstaedt JC, Schwartz HA, Kern ML, et al. Psychological language on Twitter predicts county-level heart disease mortality. Psychol Sci. 2015;26(2):159–69.  https://doi.org/10.1177/0956797614557867.CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Twitter. Twitter usage. Company Facts. https://about.twitter.com/company. Published 2016.
  33. 33.
    Twitter. Getting started with Twitter. The Basics. https://support.twitter.com/articles/215585. Published 2016. Accessed 18 April 2016.
  34. 34.
    Statista. Social media: daily usage in selected countries as 4th quarter 2015 (fee-based). Social Media & User-Generated Content. http://www.statista.com/statistics/270229/usage-duration-of-social-networks-by-country/. Published 2015. Accessed April 18, 2016.
  35. 35.
    Lenhart A, Smith A, Anderson M, Duggan M, Perrin A. Teens, technology and friendships. Pew Research Center. http://www.pewinternet.org/2015/08/06/teens-technology-and-friendships/. Published 2015. Accessed 21 March 2016.
  36. 36.
    Greenwood S, Perrin A, Duggan M. Social Media Update 2016. 2016. http://www.pewinternet.org/2016/11/11/social-media-update-2016/.
  37. 37.
    Schwartz HA, Eichstaedt JC, Kern ML, et al. Characterizing geographic variation in well-being using tweets. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM). Boston, MA 2013.Google Scholar
  38. 38.
    Schwartz HA, Giorgi S, Sap M, Crutchley P, Eichstaedt JC, Ungar LH. DLATK: Differential language analysis ToolKit. In: Proceedings of the 2017 EMNLP system demonstrations. 2017:55–60.Google Scholar
  39. 39.
    Pennebaker JW, Mehl MR, Niederhoffer KG. Psychological aspects of natural language use: our words, our selves. Annu Rev Psychol. 2003;54:547–77.  https://doi.org/10.1146/annurev.psych.54.101601.145041.CrossRefPubMedGoogle Scholar
  40. 40.
    Lazer D, Kennedy R, King G, Vespignani A. The parable of google flu: traps in big data analysis. Science (80-). 2014;343(6176):1203–1205.  https://doi.org/10.1126/science.1248506.
  41. 41.
    Gouws S, Metzler D, Cai C, Hovy E, Rey M. Contextual bearing on linguistic variation in social media. In: Proceedings of the workshop on languages in social media. 2011, pp. 20–29Google Scholar
  42. 42.
    Schwartz HA, Eichstaedt JC, Kern ML, et al. Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE. 2013;8(9):e73791.  https://doi.org/10.1371/journal.pone.0073791.CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2012;3(4–5):993–1022.  https://doi.org/10.1162/jmlr.2003.3.4-5.993.Google Scholar
  44. 44.
    Park G, Schwartz HA, Eichstaedt JC, et al. Automatic personality assessment through social media language. J Pers Soc Psychol. 2015;108(6):934–52.  https://doi.org/10.1037/pspp0000020.CrossRefPubMedGoogle Scholar
  45. 45.
    Kosinski M, Stillwell D, Graepel T. Private traits and attributes are predictable from digital records of human behavior. Proc Natl Acad Sci. 2013;110(15):5802–5.  https://doi.org/10.1073/pnas.1218772110.CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Karon BP. The clinical interpretation of the Thematic Apperception Test, Rorschach, and other clinical data: a reexamination of statistical versus clinical prediction. Prof Psychol Res Pract. 2000;31(2):230–3.CrossRefGoogle Scholar
  47. 47.
    Iacobelli F, Gill AJ, Nowson S, Oberlander J. Large scale personality classification of bloggers. In: Proceedings of the 4th international conference on affective computing and intelligent interaction. New York, NY: Springer 2011:568–577.  https://doi.org/10.1007/978-3-642-24571-8_71.
  48. 48.
    Centers for Disease Control and Prevention. National center for health: health indicators warehouse. www.healthindicators.gov. Accessed February 28, 2016.
  49. 49.
    Emory University. Rollins school of public health. AIDSVu. 2016. www.aidsvu.org.
  50. 50.
    Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn. 2006;63(1):3–42.  https://doi.org/10.1007/s10994-006-6226-1.CrossRefGoogle Scholar
  51. 51.
    Howell DC. Statistical methods for psychology. 6th ed. Belmont, CA: Thomson Wadsworth; 2007.Google Scholar
  52. 52.
    Cohen J. A power primer. Psychol Bull. 1992;112(1):155–9.  https://doi.org/10.1037//0033-2909.112.1.155.CrossRefPubMedGoogle Scholar
  53. 53.
    Adrover C, Bodnar T, Huang Z, Telenti A, Salathé M. Identifying adverse effects of HIV drug treatment and associated sentiments using Twitter. JMIR Public Heal Surveill. 2015;1(2):e7.  https://doi.org/10.2196/publichealth.4488.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of Illinois at Urbana-ChampaignChampaignUSA
  2. 2.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA
  3. 3.Department of Computer and Information ScienceUniversity of PennsylvaniaPhiladelphiaUSA
  4. 4.School of Public HealthJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations