A Literature Review of Social Media-Based Data Mining for Health Outcomes Research

  • Boshu Ru
  • Lixia YaoEmail author


Patient-generated health outcomes data are health outcomes created, recorded, gathered, or inferred by or from patients or their caregivers to address a health concern. A critical mass of patient-generated health outcome data has been accumulated on social media websites, which can offer a new potential data source for health outcomes research, in addition to electronic medical records (EMR), claims databases, the FDA Adverse Event Reporting System (FAERS), and survey data. Using the PubMed search engine, we systematically reviewed emerging research on mining patient-generated health outcomes in social media data to understand how this data and state-of-the-art text analysis techniques are utilized, as well as their related opportunities and challenges. We identified 19 full-text articles as the typical examples on this topic since 2011, indicating its novelty. The most analyzed health outcome was side effects due to medication (in 15 studies), while the most common methods to preprocess unstructured social media data were named entity recognition, normalization, and text mining-based feature construction. For analysis, researchers adopted content analysis, hypothesis testing, and machine learning models. When compared to EMR, claims, FAERS, and survey data, social media data comprise a large volume of information voluntarily contributed by patients not limited to one geographic location. Despite possible limitations, patient-generated health outcomes data from social media might promote further research on treatment effectiveness, adverse drug events, perceived value of treatment, and health-related quality of life. The challenge lies in the further improvement and customization of text mining methods.


Social media Patient-generated health outcomes Data acquisition Text mining Data analysis Data mining Health informatics Systematic literature review 


  1. 1.
    Health Research Institute (PricewaterhouseCoopers). Social media ‘likes’ healthcare—from marketing to social business. 2012. Accessed 10 Nov 2017.
  2. 2.
    Yang CC, Yang H, Jiang L, Zhang M. Social media mining for drug safety signal detection. In: Proceedings of the 2012 International Workshop on Smart Health and Wellbeing, Maui, HI; 2012. p. 33–40.Google Scholar
  3. 3.
    Yates A, Goharian N. ADRTrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites. In: Proceedings of the 35th European Conference on Information Retrieval, Moscow, Russia; 2013. p. 816–9.Google Scholar
  4. 4.
    Lowe HJ, Barnett GO. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. J Am Med Assoc. 1994;271:1103–8.CrossRefGoogle Scholar
  5. 5.
    Pagoto SL, Waring ME, Schneider KL, Oleski JL, Olendzki E, Hayes RB, et al. Twitter-delivered behavioral weight-loss interventions: a pilot series. JMIR Res Protoc. 2015;4:e123.CrossRefGoogle Scholar
  6. 6.
    Ramo DE, Liu H, Prochaska JJ. A mixed-methods study of young adults’ receptivity to using Facebook for smoking cessation: if you build it, will they come? Am J Health Promot. 2015;29:e126–35.CrossRefGoogle Scholar
  7. 7.
    O’Brien S, Duane B. Delivery of information to orthodontic patients using social media. Evid Based Dent. 2017;18:59–60.CrossRefGoogle Scholar
  8. 8.
    Lofters AK, Slater MB, Angl EN, Leung FH. Facebook as a tool for communication, collaboration, and informal knowledge exchange among members of a multisite family health team. J Multidiscip Healthc. 2016;9:29–34.CrossRefGoogle Scholar
  9. 9.
    Chen M, Mangubat E, Ouyang B. Patient-reported outcome measures for patients with cerebral aneurysms acquired via social media: data from a large nationwide sample. J Neurointerv Surg. 2016;8:42–6.CrossRefGoogle Scholar
  10. 10.
    Curtis JR, Chen L, Higginbotham P, Nowell WB, Gal-Levy R, Willig J, et al. Social media for arthritis-related comparative effectiveness and safety research and the impact of direct-to-consumer advertising. Arthritis Res Ther. 2017;19:48.CrossRefGoogle Scholar
  11. 11.
    Hughes S, Lacasse J, Fuller RR, Spaulding-Givens J. Adverse effects and treatment satisfaction among online users of four antidepressants. Psychiatry Res. 2017;255:78–86.CrossRefGoogle Scholar
  12. 12.
    Egan KG, Israel JS, Ghasemzadeh R, Afifi AM. Evaluation of migraine surgery outcomes through social media. Plast Reconstr Surg Glob Open. 2016;4:e1084.CrossRefGoogle Scholar
  13. 13.
    Eshleman R, Singh R. Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams. BMC Bioinformatics. 2016;17:335.CrossRefGoogle Scholar
  14. 14.
    Sullivan R, Sarker A, O’Connor K, Goodin A, Karlsrud M, Gonzalez G. Finding potentially unsafe nutritional supplements from user reviews with topic. Pac Symp Biocomput. 2016;21:528–39.PubMedPubMedCentralGoogle Scholar
  15. 15.
    Powell GE, Seifert HA, Reblin T, Burstein PJ, Blowers J, Menius JA, et al. Social media listening for routine post-marketing safety surveillance. Drug Saf. 2016;39:443–54.CrossRefGoogle Scholar
  16. 16.
    Duh MS, Cremieux P, Audenrode MV, Vekeman F, Karner P, Zhang H, et al. Can social media data lead to earlier detection of drug-related adverse events? Pharmacoepidemiol Drug Saf. 2016;25:1425–33.CrossRefGoogle Scholar
  17. 17.
    Whitman CB, Reid MW, Arnold C, Patel H, Ursos L, Sa’adon R, et al. Balancing opioid-induced gastrointestinal side effects with pain management: insights from the online community. J Opioid Manag. 2015;11:383–91.CrossRefGoogle Scholar
  18. 18.
    Liu X, Chen H. A research framework for pharmacovigilance in health social media: identification and evaluation of patient adverse drug event reports. J Biomed Inform. 2015;58:268–79.CrossRefGoogle Scholar
  19. 19.
    Nikfarjam A, Sarker A, O’Connor K, Ginn R, Gonzalez G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc. 2015;22:671–81.PubMedPubMedCentralGoogle Scholar
  20. 20.
    Sarker A, Gonzalez G. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform. 2015;53:196–207.CrossRefGoogle Scholar
  21. 21.
    Yang M, Kiang M, Shang W. Filtering big data from social media—building an early warning system for adverse drug reactions. J Biomed Inform. 2015;54:230–40.CrossRefGoogle Scholar
  22. 22.
    Carbonell P, Mayer MA, Bravo A. Exploring brand-name drug mentions on Twitter for pharmacovigilance. Stud Health Technol Inform. 2015;210:55–9.PubMedGoogle Scholar
  23. 23.
    de Barra M, Eriksson K, Strimling P. How feedback biases give ineffective medical treatments a good reputation. J Med Internet Res. 2014;16:e193.CrossRefGoogle Scholar
  24. 24.
    Wicks P, Sulham KA, Gnanasakthy A. Quality of life in organ transplant recipients participating in an online transplant community. Patient. 2014;7:73–84.CrossRefGoogle Scholar
  25. 25.
    Wu H, Fang H, Stanhope SJ. Exploiting online discussions to discover unrecognized drug side effects. Methods Inf Med. 2013;52:152–9.CrossRefGoogle Scholar
  26. 26.
    Frost J, Okun S, Vaughan T, Heywood J, Wicks P. Patient-reported outcomes as a source of evidence in off-label prescribing: analysis of data from PatientsLikeMe. J Med Internet Res. 2011;13:e6.CrossRefGoogle Scholar
  27. 27.
    Freedman RA, Viswanath K, Vaz-Luis I, Keating NL. Learning from social media: utilizing advanced data extraction techniques to understand barriers to breast cancer treatment. Breast Cancer Res Treat. 2016;158:395–405.CrossRefGoogle Scholar
  28. 28.
    Mao JJ, Chung A, Benton A, Hill S, Ungar L, Leonard CE, et al. Online discussion of drug side effects and discontinuation among breast cancer survivors. Pharmacoepidemiol Drug Saf. 2013;22:256–62.CrossRefGoogle Scholar
  29. 29.
    Ru B, Harris K, Yao L. A content analysis of patient-reported medication outcomes on social media. In: Proceedings of IEEE 15th International Conference on Data Mining Workshops, Atlantic City, NJ; 2015. p. 472–9.Google Scholar
  30. 30.
    Lalwani AK. Negativity and positivity biases in product evaluations: the impact of consumer goals and prior attitudes: ProQuest. 2006.Google Scholar
  31. 31.
    National Library of Medicine (US). UMLS® Reference Manual. Accessed 10 Nov 2017.
  32. 32.
    FDA. FDA Adverse Event Reporting System (FAERS). 2008. Accessed 5 Nov 2017.
  33. 33.
    Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44:D1075–9.CrossRefGoogle Scholar
  34. 34.
    Bengio Y, Ducharme R, Vincent P, Jauvin C. A neural probabilistic language model. J Mach Learn Res. 2003;3:1137–55.Google Scholar
  35. 35.
    Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Portland, OR; 2011. p. 142–50.Google Scholar
  36. 36.
    Chapman A. Bag of Words Meets Bags of Popcorn. 2014. Accessed 10 Nov 2017.
  37. 37.
    Chen D, Manning C. A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar; 2014. p. 740–50.Google Scholar
  38. 38.
    Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.Google Scholar
  39. 39.
    Wilson J. White Paper: the benefit of using both claims data and electronic medical record data in health care analysis. 2014. Accessed 10 Nov 2017.
  40. 40.
    Rowley R. Claims data: the good, the bad and the ugly. 2014. Accessed 10 Nov 2017.
  41. 41.
    Strom BL. How the us drug safety system should be changed. J Am Med Assoc. 2006;295:2072–5.CrossRefGoogle Scholar
  42. 42.
    About IEEE Xplore. Accessed 20 June 2018.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Software and Information SystemsUniversity of North Carolina at CharlotteCharlotteUSA
  2. 2.Division of Biomedical Statistics and Informatics, Department of Health Sciences ResearchMayo ClinicRochesterUSA

Personalised recommendations