INTRODUCTION

Over 20 million US adults (8.4% of the population) have substance use disorder (SUD) and drug overdose has overtaken motor vehicle crashes as the leading cause of accidental death.1 While there are many evidence-based SUD programs,2 the overall quality of treatment within facilities treating SUD in the USA has not been well defined or measured. An important component in the development of effective quality measures requires understanding the drivers of positive and negative patient experiences within SUD programs. Understanding these qualitative narratives can provide a singular and important component in the development of quality metrics for SUD programs in a patient-centered approach.

Digital health platforms offer new opportunities to capture and share these experiences.3 Patients, and their support networks, may begin an initial search for treatment options via the web. When researching a treatment facility’s location or services, they often will encounter an online review or rating provided by common platforms such as Google or Yelp.4, 5

In health domains, these online reviews correlate strongly with reviews obtained through more systematic surveys, such as the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS).6 Star-based ratings can also drive consumer choice for healthcare. For example, after a star-based rating system of nursing homes was released, 1-star facilities typically lost 8% of their market share and 5-star facilities gained more than 6% of their market share.7,8,9 Online star ratings exist for specialized drug treatment facilities, but little is known about the distribution, content, and implications of these reviews. Online reviews may provide a unique opportunity to help guide agencies by identifying common themes and areas of focus for SUD programs.

We sought to describe and analyze online ratings and reviews of specialized drug treatment facilities (SDTFs). SDTFs are facilities which provide coordinated and specialized care, inpatient or outpatient, for individuals with substance abuse disorder. We focused on online reviews of SDTFs in Pennsylvania. Pennsylvania has been a focal point for addressing OUD and overdose-related deaths. In 2016, Pennsylvania’s death rate was 18.5 deaths per 100,000 persons, compared with the national rate of 13.3 deaths per 100,000; the state remains in the top 25% for overall death rates.10 Moreover, among the nation’s 44 counties with at least 1 million residents, Philadelphia County and Allegheny County (includes Pittsburgh) had the two highest rates of overdose deaths.11 We used machine learning to automate the identification of patient-centered themes within online ratings. We compared these themes from online reviews with the existing surveys assessing services provided by individual facilities, collected by the Substance Abuse and Mental Health Services Administration (SAMHSA).

METHODS

This was a retrospective analysis of all online reviews and ratings posted to Google and Yelp of SDTFs within Pennsylvania. No reviews were excluded. All Pennsylvania SDTFs were identified using the 2016 National Directory of Drug and Alcohol Abuse Treatment Facility Record published by SAMHSA.12 If a facility exists in Pennsylvania but is not formally registered with SAMHSA, then this facility was not included in the analysis. This study was considered exempt by the University of Pennsylvania Institutional Review Board.

Identification of Review Themes

We identified the initial set of Pennsylvania’s SDTFs from the 2016 SAMHSA National Directory of Drug and Alcohol Abuse Treatment Facilities. Yelp and Google ratings and reviews posted for each facility were then identified. Yelp is a dedicated rating and review platform. We used Yelp’s Application Program Interface (API) to identify Yelp13 reviews of SDTFs across Pennsylvania from July 2010 to August 2018. Google, primarily a search engine, incorporates star ratings and reviews into search results which users are presented with when conducting an online search. Google users may enter a rating and review directly through the website, where users are prompted to provide a star rating and an optional narrative review. We manually searched the Google Places pages Pennsylvania SDTFs, to identify Google ratings and reviews when available from July 2010 to August 2018.

Yelp and Google use a proprietary algorithm that considers measures of quality, reliability, and activity to estimate the authenticity of each review.14 For example, this algorithm would prevent a user from leaving multiple reviews on a business page under different usernames. Additionally, each platform states that businesses which buy advertising space on the site are unable to influence which reviews are recommended.

We used Latent Dirichlet Allocation (LDA) to identify the themes expressed within the online reviews. LDA is an automated machine learning process that identifies co-occurrences of words in narrative text, to generate topics.15 It has been used in several studies examining online reviews of hospitals.3, 4, 16 LDA provides a method to use computer-based software and programming to analyze and identify themes in large data sets comprised of narrative text such as online reviews or Twitter tweets. In this study, we chose to focus on extremes of ratings (low or 1-star and high or 5-star). We chose to examine the extremes of the scale in this study as to maximize the possibility of identifying potential drivers of high or low ratings. The next step included applying LDA to identify topics. This process analyzes low- or high-star reviews and searches for co-occurring words in one user review which are then found across multiple reviews to create an a priori number of groups of words which can then be labeled by the research team. In this study, three study team members (AA, VW, RM) independently labeled topics which were found to be similar. Of note as the number of groups becomes smaller, the granularity of the data is lost and as the number of groups becomes too large, the detail across groups becomes lost. The implementation of LDA was provided by the MALLET package.15 LDA has been used in various studies examining online reviews of hospitals.3, 4, 16

Association of Review Themes and Review Star Ratings

We used Differential Language Analysis (DLA),17 to determine which themes were most correlated (Pearson’s r) to the online ratings. Differential Language Analysis provides a method of open-vocabulary analysis to calculate the topic distribution of the reviews and determine which topics were most correlated to online review ratings. DLA builds upon the topics generated using LDA to identify correlation. LDA analysis here provides the topics identified within one-star and five-star reviews as described above. DLA then allows for an analysis to determine which of the LDA topics have a true correlation to one- or five-star reviews. The end product of DLA provides a correlation coefficient from 0 to 1 correlating the topic with the star rating. Due to the bimodal nature of online reviews,13, 14 we first calculated correlations between themes and a binary variable indicating if the review was a 1-star review or not and then calculated correlations between the themes and a binary variable indicating if the review was a 5-star review or not.

Comparing Online Review Themes with Systematic Survey Data

We calculated correlation coefficients between the themes derived from high or low online reviews and service data from the 2017 National Survey of Substance Abuse Treatment Services (NSSATS), administered annually by SAMHSA. The NSSATS assesses individual US facilities in 3 domains: (1) characteristics of individual facilities and treatment types provided; (2) client counts; (3) information such as certification or accreditation. The objective of this analysis was to identify if and how patient-generated themes from online reviews correlated to the current standard approach of assessing facilities’ capabilities through the NSSATS survey. Prior work has demonstrated gaps between the focus of standardized survey domains (e.g., the Hospital Consumer Assessment of Healthcare Providers and Systems) and themes elucidated from patient review data on other platforms such as Yelp or Google.3, 14 Given that there is no current standard for SDTFs, we choose to compare online patient themes to the NSSATS.

RESULTS

There are 539 SDTFs in Pennsylvania listed within the SAMHSA directory; 485 of these facilities (90%) had a rating or review posted during the study period. These 485 sites had 7635 Google ratings and 188 Yelp ratings; 5312 (68%) had narrative reviews. The mean number of ratings per facility was 15.3 (SD 24.0, median 7, IQR [3–16]) and the mean word count within a review was 72.8 words (SD 82.28, median 42 IQR [2–58]).

The distribution of ratings was bimodal: 43% 5-star and 34% 1-star (Fig. 1). The simple mean star rating was 3.24 stars (median 4.0) across all facilities in Pennsylvania. When weighting for varying numbers of reviews per facility, using a Bayesian expected average, the estimated star rating at the facility level was 3.28 stars (median 3.29). Figure 2 displays the distribution of weighted facility ratings. Of the 7823 ratings, 68% (5313) had an accompanying review. The average star rating of those with text reviews was 2.9 stars (SD 1.8, median 4). The average star rating of the 32% (2512) without text reviews was 3.8 stars (SD 1.5, median 5). The count of ratings varies across facilities. Facilities with higher or lower absolute counts of reviews display a range of average star ratings. Figure 3 displays groups of facilities with similar counts of ratings and their weighted average ratings; as the number of reviews per facility increases, we find average star rating to converge towards 3 stars.]-->

Fig. 1
figure 1

Distribution of online star ratings of specialized drug treatment facilities in Pennsylvania.

]-->

Fig. 2
figure 2

Bayesian-weighted facility average ratings accounting for volume of reviews and star rating.

]-->

Fig. 3
figure 3

Distribution of weighted facility average ratings and counts of facility ratings.

Online Review Ratings and Themes

We identified five distinct themes correlated with positive and negative narrative reviews. Themes most positively correlated with 5-star ratings were focus on recovery (r = 0.53), helpfulness of staff (r = 0.43), compassionate care (r = 0.37), experienced a life-changing moment (r = 0.32), and professionalism (r = 0.29). Themes most positively correlated with 1-star reviews were wait time (r = 0.41), poor accommodations (0.26), poor phone communication (r = 0.24), medications offered (0.24), and appointment availability (r = 0.23). Table 1 displays themes, correlation coefficient, and example narrative text from online reviews.

Table 1 Online Review Themes Compared with Facility Codes from SAMHSA and Star Ratings

Comparing Themes from Online Reviews with SAMHSA Services

Themes revealed from online reviews and service codes defined by SAMHSA revealed little overlap. While some of the SAMHSA service codes were identified within online reviews, the majority of themes discovered through online reviews were not included within the NSSATS or within SAMHSA’s service codes. Of the 162 listed services, we found an average of 49 services offered per facility (median 46 services).

Of the 10 themes most correlated with an online review, 3 aligned with the 14 SAMHSA facility-level categories and 12 service codes (7% of 162). The 7 other online themes are noted in Table 1. In addition, we compared 1-star and 5-star Google reviews with NSSATS survey data. Table 2 shows correlation between services provided at the facility level across Pennsylvania. The three services most correlated with 1-star ratings were psychiatric emergency walk-in services (r = − 0.17), state-funded SDTFs (r = − 0.16), and integrated primary care service (r = − 0.15). Five-star correlations included outpatient methadone/buprenorphine or naltrexone (r = 0.15); cash or self-pay SDTFs (r = 0.12); and lesbian, gay, bisexual, or transgender (LGBT) clients (r = 0.12). The unique online themes also were strongly correlated with low or high online ratings as compared with the existing NSSAT service codes.

Table 2 National Survey of Substance Abuse Treatment Services (NSSATS) Survey Data Comparison with Online Review Star Ratings

DISCUSSION

Online star ratings exist for specialized drug treatment facilities. This study provides a state-level analysis of the distribution, content, and implications of these reviews. Online reviews provide a patient-centered opportunity to identifying themes of focus for quality metrics for SUD programs.

This study has three main findings. First, individuals are posting online reviews and contributing to online ratings of specialized drug treatment facilities using multiple platforms. Over an 8-year period, we identified 7823 online reviews covering over 90% of Pennsylvania’s SDTFs. This study is the first to investigate a large volume of online reviews of SDTFs in order to understand the distribution of ratings and the content within text reviews. The initial analysis reveals a bimodal distribution of low and high star ratings, a range of ratings per facility and differences in weighted averages of star ratings. This machine learning approach provides a means of analyzing large amounts of rich qualitative data for both the inpatient and outpatient settings which see high volumes of patients and often struggle to capture high rates of patient feedback.4, 16, 18

Second, the narrative component of online reviews reveals themes driving ratings of SDTFs. Online reviews and ratings of healthcare continue to grow for healthcare services and provide insights on gaps in quality metrics not captured by standardized surveys.4, 14, 19 Our analysis reveals themes not otherwise identified using the current approach. Online reviews reveal patient priorities, with themes such as “focus on recovery” and “compassionate care.” Ratings with an accompanying narrative text review, over two-thirds, were lower as compared with ratings with no additional text. This study suggests comment-based reviews offer a trove of information for facilities to learn directly from patients’ experiences which may be driving lower overall ratings. The positive and negative themes derived from patient reviews identify components of care important to patients but missed by standard approaches including “poor phone communication” and “life changing experiences.” As the opioid epidemic continues to surge across the country, the number of individuals seeking treatment likely will increase. Patients, and their support systems, may lean on advice from their primary care providers to guide them. Themes elucidated from review data may aid primary care providers and provide additional insights to help tailor referrals to future patients.

Third, this study suggests an approach to guide a portion of the development of patient-centered quality metrics for SDTFs. Given that online patient review platforms reveal patient priorities organically, these themes reflect patient-centered concerns.3, 14, 20 These patient-centered concerns should not be the only measures of SDTF quality, but they should sit alongside important measures that are less visible to patients or less evaluable by patients, including structural measures such as staffing, process measures such as fidelity to best evidence, and outcome measures such as risk-adjusted success rates. Policy makers and regulators could incorporate themes from these reviews to help guide the creation of quality metrics for SUD treatment. Indeed, as SUD grows across the country, the need for adherence to evidence-based practice and high-quality programs remains important. This focus has become even more evident as advocacy groups, both public and private groups, such as Shatterproof,21 attempt to create and emphasize quality metrics.22

Both this approach and this study are subject to limitations. Organically derived reviews are unstructured in who completes them and in what is contributed. A typical criticism is that these reviews reflect the opinions of those who have something to say, rather than the broader overall population. Even so, organically derived reviews have the advantage of revealing themes without the pre-direction that structured responses create. Even if they may not be probabilistically representative, they can expand perspectives and, in this case, give voice to a patient population that often has little. That voice can inform the development of systematic surveys, revealing that organic and systematic assessments of quality are more complementary than in conflict. In addition, the opportunity to quickly analyze large amounts of quantitative and qualitative data may help support more in-depth interviews with individuals and focus groups. Unlike systematic surveys, online reviews are not standardized and do not offer the protection from fraudulent responses. And while both online and systematic reviews are likely to favor those who most want to share their opinions, online reviews cater to those who seek out the opportunity to comment. Also, being an open and public forum, any individual is able to post their comments and reviews on Yelp or Google and thus, a service or business may have reviews posted by users who never actually were cared for by the SDTF.

Indeed, we found that some facilities had few or no reviews, making them invisible to organic online processes, a challenge in evaluating online reviews of individual providers.5, 17, 18 Facilities with reviews and those without reviews may be fundamentally different, limiting the ability to extrapolate from available data. Furthermore, identifying a meaningful volume of reviews and ratings remains to be determined. More systematic review processes often oversample to correct or anticipate sparse evaluations.

This study also has strengths. The use of machine learning approaches both automates and, in some ways, objectifies the search for meaning in unstructured narrative. The combination allows the inexpensive analysis of enormous amounts of textual data. The identified themes have immediate face validity, even if they might not have been themes anticipated. Indeed, the ability to identify themes that only in retrospect seem relevant is a key strength of these non-hypothesis-driven approaches.

CONCLUSION

The nation’s management of SUD requires effective treatment and that, in turn, requires effective quality management. In lieu of quality ratings created and presented more systematically, current patients may use ratings contributed organically to online platforms such as Google and Yelp. While these rating platforms reflect biases, so do more systematic rating systems. This study reveals how data gathered from online review platforms can be aggregated and analyzed to identify themes relevant to patient satisfaction, and may provide a patient-centered approach to building components of quality metrics for specialized drug treatment facilities moving forward.