INTRODUCTION

Websites providing user-submitted ratings of goods and services have become ubiquitous and are commonly used by consumers to inform their decision-making. This trend has also spread to healthcare,1,2,3,4,5,6,7,8,9,10,11,12,13,14 as there are now dozens of publicly available websites dedicated to providing online ratings of physicians.11 In a recent survey, 65% of individuals reported being aware of physician rating websites, and 35% reported seeking online physician reviews within the past year.15 Among those who sought physician ratings information online, 35% reported selecting a physician based on good ratings and 37% reported avoiding a physician with bad ratings.15

Reaction to physician rating websites has been mixed. Proponents argue that broad access to patient-submitted ratings promotes transparency, empowers patients to make informed decisions, and provides an impetus for underperforming physicians to improve.16 Detractors, on the other hand, have expressed concern about the websites serving as a forum for disgruntled patients, with the potential to defame physicians and cause them psychological harm.16, 17 Since the websites lack a mechanism of validating reviewer identity, there is also the possibility that multiple reviews could be left by the same individual or that reviews could be submitted by individuals posing as patients, or even by physicians themselves.18 Critics also note that the online ratings have not been found to reflect objective measures of clinical quality.19, 20

Concerns have also been raised regarding the validity of the online ratings, as the sample size used to formulate each rating is often small.3, 5, 8, 21 Detractors also point out that the individuals who post online reviews may not be representative of the population as a whole.16

While the companies that publish these ratings online tend to be the best known, they are not the only entities seeking to assess patient ratings of physicians. Many hospitals, health maintenance organizations (HMOs), and health systems also conduct surveys of patients—for example, via the Press Ganey Medical Practice Survey22 or the Consumer Assessment of Healthcare Providers and Systems Clinician and Group (CG-CAHPS) Survey23—and use the data to generate their own internal ratings of their physicians. Given that these internal ratings are based on validated survey methodologies and include a large number of responses solicited from a broad and random sample of the patient population, it is possible that they could provide a better estimate of each physician’s actual patient satisfaction rating. While some prior reports have sought to assess the correlation between online and internal physician ratings, these have primarily been small-scale studies of academic physicians that were restricted to a limited number of websites24 or a single subspecialty,25 institution,26 or department.27

In this study, we sought to determine the extent to which publicly available online ratings of physicians, which are typically based on a small number of unsolicited reviews, correlate with internal patient-submitted ratings from a large integrated healthcare delivery system. We also examined the association between physician years in practice and patient-submitted rating separately for the online and internal ratings, as younger individuals have been shown to be more likely to leave online reviews,28 and more likely to prefer younger doctors.29

METHODS

Study Population and Selection Criteria

Kaiser Permanente, the nation’s largest HMO, operates in 8 regions throughout the USA. Southern California is the largest such region, with over 4.5 million members as of 2018.30 For this study, all partnered physicians in the Southern California region of Kaiser Permanente were included (n = 6656), including all specialties with ambulatory clinic visits. Physicians were excluded if demographic information was not available (n = 250), if the physician’s age was over 65 (a mandatory retirement age in the Southern California region of Kaiser Permanente; n = 182), or if a valid internal patient-submitted rating was not available (n = 786; see below). Application of these inclusion and exclusion criteria yielded 5438 physicians for analysis.

Demographics

Physician demographics were obtained from the healthcare system including sex, race/ethnicity, board certification, specialty, fellowship training, and medical school graduation year. Physicians in internal medicine, family medicine, general pediatrics, urgent care, and continuing care were categorized as “non-specialists,” while physicians in other fields of medicine were categorized as “specialists.” Years in practice was calculated for each physician as the number of years which had elapsed since the year of medical school graduation minus the average duration of each physician’s residency (e.g., 3 years for internal medicine) and, if applicable, fellowship (e.g., 2 years for infectious disease). Years in practice was categorized as 10 or fewer, 11–19, or 20 or more. Since only partnered physicians were included in this study and it typically takes 3 years to achieve partnership in the Kaiser Permanente system, the 10 or fewer group was primarily comprised of physicians who had between 3 and 10 years in practice.

Internal Ratings

Kaiser Permanente routinely solicits feedback on each physician from the patients that he or she sees in clinic. This feedback is collected via a 27-question Kaiser Permanente–specific survey which is sent via mail as well as e-mail to a random sampling of patients following the clinic appointment. Survey completion is voluntary, and all responses are confidential.

For each physician, the internal patient–submitted rating is based on responses to the question, “How would you rate this doctor or health care provider?” Responses to this question are on a 10-point scale ranging from 1 (“Worst doctor or health care provider possible”) to 10 (“Best doctor or health care provider possible”). The internal rating is then calculated as the number of respondents who select 9 or 10 for this question, divided by the total number of responses. (For example, a physician with 92% of responses to this question in the 9–10 range would have an internal patient-submitted rating of 92.0.) In the Southern California region of Kaiser Permanente, internal ratings are considered valid once they are based on 30 or more responses. For the purposes of this study, internal ratings of less than 80.0 were considered to be “low.” The internal ratings utilized in this analysis were based on survey responses received between July 2013 and June 2015, during which time the response rate was approximately 18%.

Online Ratings

To find each physician’s online rating(s), web-based searches were conducted during the months of June and July 2016 by 3 individuals who were blinded to the internal ratings (NRU, SYMS, and KCX). These searches were performed via the Google search engine utilizing each physician’s first, middle (if available), and last names as well as the degree (“MD”) and state (“CA”). Matching was performed on the basis of this information, as well as each physician’s medical specialty and practice location. All online ratings identified within the first 20 search results were recorded. For each online rating, we recorded the name of the website, the rating, and the number of patient reviews used to formulate the rating. For each physician, the overall online rating was then calculated as the average of all online ratings found, weighted by the number of reviews used to formulate each rating. All ratings were on a 5-point scale, and ratings below 3 were considered to be “low.”

Statistical Analysis

To analyze the correlation between the online and internal ratings, the Spearman rank correlation was calculated. To evaluate the association between years in practice and physician rating, multivariable logistic regression was performed separately for the online and internal ratings, in both instances controlling for physician sex, race/ethnicity, board certification, and specialty (specialist or not). All tests were two sided, and a p value of < 0.05 was considered significant. Statistical analysis was performed by two members of the research team (K.O. and C.Y.K.) with the use of SAS (version 9; SAS Institute, Cary, NC) and SPSS (version 22; IBM Corporation, Armonk, New York).

Funding Source

There was no external source of funding for this study.

IRB Approval

All elements of the study were approved by the Kaiser Permanente Institutional Review Board (IRB).

RESULTS

There were 5438 physicians who met all inclusion and exclusion criteria. Forty percent (2195/5438) were female, and the vast majority were board certified (96.0%; 5221/5438) (Table 1). The mean internal rating was 88.0 (median 89.0, standard deviation 6.73). The internal ratings were based on a mean of 119 survey returns over the 2-year period (median 117, standard deviation 45, range 30–342).

Table 1 Physician Demographics

Approximately three-quarters of physicians were found to be rated online (77.1%; 4191/5438), including Vitals.com (n = 3399), Healthgrades.com (n = 2647), UCompareHealthCare.com (n = 1399), RateMDs.com (n = 315), and WebMD.com (n = 285). One thousand five hundred fifty-five physicians were found to be rated on a single website, 1504 on two, 887 on three, 220 on four, 24 on five, and 1 on 6 websites. Of physicians who were found to be rated online, the mean online rating was 4.1 out of 5 (median 4.8, standard deviation 1.2). The online ratings were based on a mean of 3.5 reviews (median 2, standard deviation 10.0).

The correlation between the overall online rating and the internal rating was weak (Spearman’s rho .23). This correlation increased as the number of reviews used to formulate each online rating increased, reaching 0.42 for online ratings which were based on 15 or more reviews (Table 2). The correlation coefficients for the individual websites ranged from 0.16 to 0.30 (Table 3).

Table 2 Correlation Between Overall Online Rating and Internal Rating, by Number of Reviews Used to Formulate the Overall Online Rating
Table 3 Correlation Between Online Rating and Internal Rating, by Physician Rating Website

Of 4191 physicians found to be rated online, 22.2% (929/4191) were found to have one or more low online ratings (< 3.0 out of 5). The mean number of reviews used to formulate these low online ratings was 2.8 (median 1, standard deviation 3.7). Of physicians with any low online rating, 81.5% (757/929) had a positive internal rating (≥ 80.0). Of physicians with a low internal rating (< 80.0), 63.6% had no low online ratings (< 3.0 out of 5). In this categorical comparison, there was also poor agreement between the online and internal ratings (κ = 0.113, p < 0.001).

Among the internal ratings, physician years in practice was not found to be associated with the likelihood of low ratings. Among the online ratings, however, increasing physician years in practice were found to be associated with a greater likelihood of any low online rating, as well as a low overall online rating (weighted average; Table 4).

Table 4 Association Between Physician Years in Practice and Likelihood of a Low Rating, Internally and Online

DISCUSSION

Physician online ratings are ubiquitous and influential,15 but they also have their detractors. Critics point out that the online ratings may not accurately reflect patient satisfaction given that the number of reviews used to formulate each rating is often small. In four recent studies on the topic, for example, the average number of reviews used to calculate each physician’s online rating was 2.4,5 3.2,8 2.7,3 and 4.5,21 with nearly half of the ratings based on a single review.8

In our study, the online ratings were based on an average of 3.5 reviews (median 2), and correlation with the internal ratings (which were based on an average of 119 survey returns) was weak. This correlation increased with the number of reviews used to calculate each online rating, reaching 0.42 (“moderate”) for ratings which were based on 15 or more reviews.

While some prior reports have sought to assess the correlation between online and internal physician ratings, these have primarily been small-scale studies of academic physicians that were restricted to a limited number of websites or a single subspecialty, institution or department. Ryan and colleagues examined online ratings from Vitals.com and Healthgrades.com for 16 otolaryngologists at a single academic medical center and did not find significant correlation with results from the Press Ganey Medical Practice Survey.27 Chen and associates conducted a similar study among 200 faculty members at the University of Utah and documented similar results.24 Widmer and colleagues studied physicians at the Mayo Clinic, matching 113 physicians with negative online reviews to 113 physicians without negative reviews, and found similar Press Ganey scores in the two groups.26 Ricciardi and colleagues compared publicly available internal ratings with online ratings for 415 orthopaedic surgeons, although no overall correlation was calculated.25

Critics of online physician ratings also point out that the individuals who choose to submit reviews online may not be representative of the general population. For example, prior research has suggested that younger individuals may be more likely to leave online reviews.28 Since younger individuals may also be more likely to prefer younger doctors,29 we sought to examine the association between physician years in practice and patient-submitted ratings separately for the online and internal ratings. In our study, we found that negative online ratings were significantly more common among physicians who had been in practice for a longer period of time. Among the internal ratings, however, no such association was observed. While the relationship between physician age and patient outcomes is uncertain,31, 32 it does appear that physicians with greater number of years in practice are more likely to have a negative overall online rating than their less experienced colleagues.

While the online rating websites may have their flaws, most stakeholders agree that reviews of physicians have utility in that they enable patients to make more informed decisions regarding their care. Recently, a growing number of hospitals and health systems have begun to publish the internal patient-submitted ratings of their physicians.33 Since these internal ratings are based on a larger number of reviews from a broad and random sample of the patient population, they may provide better information on physicians for patients and consumers. In addition, such a strategy may allow hospitals to improve the perception of their own physicians who have been rated negatively online. In our study, for example, over 80% of physicians with a low online rating were found to have a positive rating internally.

The results of this investigation should be considered in light of our study design. We compared online and internal ratings for a large number of community-based physicians from a wide variety of facilities and specialties, which may increase the generalizability of our findings. Some may question whether patients would feel comfortable rating physicians honestly in surveys returned to the integrated healthcare delivery system, even with assurances of confidentiality. However, the percentage of physicians with low ratings internally (12.6%; 683/5438) was similar to the percentage of physicians who had a low overall rating online (13.1%; 549/4191). Since we do not have demographic data on the patients whose reviews were used to generate the internal Kaiser Permanente ratings, the degree to which these individuals are representative of the patient population cannot be assessed. Because the integrated healthcare delivery system surveys patients randomly after a clinic visit, patients who attend clinic visits more frequently may be oversampled. However, this may still represent an improvement over the online ratings, which do not feature a formal sampling methodology. Finally, it is possible that some online ratings could have been missed by our search strategy. However, the strategy utilized in our study mimics the one commonly used by patients when seeking information on physicians online.

In summary, online physician ratings do not correlate well with internal ratings from a large integrated healthcare delivery system. The correlation between the online and internal ratings increased with the number of reviews used to formulate each online rating, however, suggesting that the weak overall correlation may be related to the small sample sizes used to formulate most online ratings. Given that many patients are not aware of the statistical issues associated with small sample sizes, we would recommend that online rating websites refrain from displaying a physician’s rating until the sample size is sufficiently large (for example, at least 15 patient reviews). However, hospitals and health systems may be able to provide better information for patients by publishing the internal ratings of their physicians.