Does the Survey Method Affect the Net Promoter Score?
The extensively debated Net Promoter Score (NPS) originated in consumer marketing to measure brand loyalty. It has become a ubiquitous method to assess user sentiment across many industries, products, and services. NPS has the widely touted tagline of “the one number you need to grow.” NPS is typically collected in 2 ways—via in-app intercept surveys within the actual product/application or as part of a large email campaign. NPS collected via email campaigns consistently produces lower scores than via in-app intercept method. Case Studies outlining these differences are discussed.
KeywordsNPS Satisfaction Survey method Sentiment
The extensively debated Net Promoter Score (NPS) that originated in consumer marketing to measure brand loyalty has become a ubiquitous method to assess user sentiment across many industries, products, and services . NPS has the widely touted tagline of “the one number you need to grow”. Business operations and executives have come to rely on NPS to assess how their products compare against others and against benchmarks. The calculation of the score is conservative - using only 9 and 10 as positive ratings. The logic is that only someone who gives a 9 or 10 will remain loyal to a brand. The ease of a single question, and a score that is recognized as a cross-industry benchmark, make NPS popular among business leaders. In the enterprise setting, NPS is being used to compare internal products and services.
As UX practitioners working in an enterprise setting, we are finding that NPSs collected via enterprise email campaigns are significantly lower than NPSs collected via in-app intercept campaigns (prompting within the application). Case Studies of this trend and NPS concerns are described, along with recommendation to manage them.
Calculation of NPS
The score is reported as a single number that ranges from −100 to +100. Respondents are asked a single question: “How likely are you to recommend our company/product/service to a friend or colleague?” and are asked to rate their likelihood to recommend on a scale from 0 (extremely unlikely) to 10 (extremely likely). NPS is calculated by taking the difference between the proportion of high ratings (9 and 10), termed “promoters”, and “low” ratings (0–6), termed “detractors”. This uneven grouping results in an intentionally skewed score for loyalty. Spool  has argued against the method of calculation, as it intentionally discards data, and sacrifices diagnostic sensitivity to focus on the extremes.
UX Practitioner Concerns with NPS Collection
On the web, it is common to see a request to provide an NPS rating soon after visiting a website. In the retail space, NPS is collected immediately following a service such as a hotel or online purchase. Large companies have extended this loyalty index to sentiment, and it is used for benchmarking internal enterprise tools, teams, and services .
Many UX practitioners are asked to collect NPS despite its shortcomings as a diagnostic tool for uncovering usability issues [1, 4]. The score does not include subscales to probe into “ease of use” or “satisfaction” that are important to UX practitioners, but due to its high correlation with overall satisfaction , it is often used as a proxy for satisfaction. In order to interpret what is driving the score, a practitioner must read and classify all of the optional comments left by raters, a practice that is time intensive, and fraught with problems, such as subjective classification of the comments, small sample sizes for derived categories, and the inherent self-selection bias of making such comments optional. Further, a question involving a “recommendation” may not be meaningful when applied to applications end users are forced to use, often the norm in the enterprise, and may be interpreted inconsistently by respondents. Additionally, a large sample size is needed for NPS confidence , thus it should not be used in small sample usability tests.
A final concern is how business leaders sometimes use NPS to grade different applications for purposes of reward and continued investment. This is problematic because it treats the score as being suitable for cross comparisons between applications, when the score is inextricably rooted to the perceived value, enjoyability, and complexity of the capabilities supported by the app. You cannot meaningfully compare the NPS of an app that helps you discover delicious and easy-to-prepare recipes to an app that calculates your potential tax burden, and conclude one is better than the other on the basis of that score.
NPS is typically collected in three ways: (1) feedback campaigns within the actual product/application as an intercept survey; (2) part of a large randomly-sampled email campaign; or, (3) a passive NPS survey which is self-initiated via a button or link in the user interface. This last “passive” NPS survey option usually results in fewer responses and lower scores than an intercept survey. This type of difference is expected. Moreover, given its passive nature, it will skew to people who are highly motivated to provide an opinion.
2 Case Studies
2.1 Case Study 1 – IT Support Portal
Our first case study is for a worldwide IT Support Portal. A large email campaign survey that included multiple products widely used throughout the enterprise, termed the workplace survey, measured an NPS of 2 for the IT Support application as an initial baseline soon after launch. After a year, the score declined to −16, increased to −4 over the next 6-month period and did not respond as a function of ongoing user experience efforts.
Shortly after the first year, measuring NPS via an in-app intercept was introduced and was initially measured as 4. NPS was measured again over the next two quarters, each time immediately after two substantial investments to the user experience. The score climbed first to 14 and then to 27 where it has remained in subsequent quarters.
For the email campaign, a careful study of the comments indicated that large numbers of people had never used the application, had rated an older release and not what was currently available, or had rated the legacy app which had been sunset years earlier. Even more vexing were comments left by people who took the survey primarily for the opportunity to complain about aspects unrelated to the portal, such as a bad experience with a help desk agent or their objection to the concept of self service, items not within the control of the UX designer or development team, and left nothing to indicate whether they had ever used the app. This problem was compounded by the intermittent nature of a domain such as IT support; this is not a type of solution you use every day, rather when things go awry.
2.2 Case Study 2 – Global Sales Quote Application
Another case is a global sales quote application. A large email survey campaign was targeted to registered users of the application. The email survey NPS was 10 and the percentage of responses, i.e., response rate was 5%. Subsequently, the application introduced an intercept survey to collect NPS. NPS collected via the intercept survey was 31 with a response rate of 18%. Although these scores were collected at different points in time, the application did not have any major feature changes nor changes in its user base. This case shows the same trend of higher scores collected via the in-app intercept than in the email campaign. The difference in proportions are significant, χ2(df 2, 980) = 29.47 p < 0.05.
This case study is interesting because unlike the others described herein, the email survey was targeted toward all of the registered users of the application at the time, rather than screening participants or randomly sampling. Although the email survey was targeted at registered users, there is still no guarantee that the respondents use the application currently or have done so recently. A hypothesis for the difference between the NPS scores could be that when a survey request is received, users are more likely to fill it out in order to air their complaints, whereas if they are satisfied, they do not feel a need to contribute further. When active users are surveyed using an intercept survey, it seems possible that more promoters are responding to the NPS intercept survey as they receive it, especially since the response rate is much higher than in the email survey.
2.3 Case Study 3 – Enterprise Directory
Our next case is about an online Employee Directory. It was also part of the large email campaign that included multiple products. The Employee Directory measured an NPS of 39 for the application in the first half of 2018 and an NPS of 45 in the second half of 2018. Using an in-app intercept for the same times periods, the NPS measured 63 in the first half of 2018 and in the second half of 2018, it was 67.
2.4 Case Study 4 – Intranet Homepage
Another case study is for our Intranet Homepage. It was also included in the large email campaign where the Intranet Homepage measured an NPS of 10 in the first half of 2018 and an NPS of 11 in the second half of 2018. Using an in-app intercept within the homepage for the same time periods, the NPS measured 36 in the first half of 2018 and 34 in the second half of 2018.
2.5 Case Study 5 – Device Provisioning
Our final case study involves a website for device provisioning. This service assists users in ordering new workstations, tablets, and phones, as well as with returning new equipment. The service is infrequently used. For the email campaign, an NPS of −2 was measured; in contrast, for the in-app interrupt campaign the score was 48. A chi-squared test shows these differences are significant, χ2(df 2, N 5020) = 156.43), p < 0.05.
While NPS has long been controversial for other reasons among UX practitioners, we did not expect that the method of delivery would have such a profound impact on the score. In theory, if NPS is measuring the same underlying construct using different methods, the scores should be closely aligned. We propose that the method of delivery changes what NPS is actually measuring, and this is dependent on several other factors, such as whether users can meaningfully recommend the app or solution, whether the user is rating a recent experience with an app or a fading memorial representation of it.
Compared to intercept surveys, randomly-sampled email campaigns have a profound advantage in reaching people who may have abandoned an application or website at some time in the past and moved onto the competition. Unlike the consumer market, the enterprise rarely offers a lot of choice between apps or solutions, and to do your job, you must use what is provided. Email campaigns also offer advantages in being able to carefully construct a stratified sample of users to ensure that your responses generalize to the population of interest; although you do not have much control over who responds, it is possible to recruit additional users to target underrepresented subgroups.
Intercept surveys have an advantage of ease of use. No stratification is needed. Your sample is limited to those active users who have responded. When you have an acceptable margin of error, simply terminate the survey. By its very nature, the intercept method is not the best way to obtain feedback from users who have abandoned a solution but is very effective at measuring the satisfaction of verified active users. From a UX practitioner perspective, feedback from active users is vastly more useful than feedback from nonusers or users from older releases, as they can tell you what works or what doesn’t work for them, and their feedback may provide guidance as how to improve the user experience over time.
The consistently higher NPS scores from in-app intercept campaigns could be dismissed as inflated measures because it does not include users who abandoned the app. Abandonment is not an issue for the tools described here, as there was no alternative choice available for the respondents. While that is a risk where a choice between solutions is present, there is also risk associated with including feedback from users who may not fully recall their experience and arguably add noise to your data. When enough time has passed, NPS becomes more a measure of reputation than a rating based on actual experience; that may be invaluable information to marketing teams who are hyper-focused on concepts such as brands, but doesn’t move the ball forward for those in UX.
Several of the tools described here that show this trend, require only that the respondents have used the tool once in the last 6 months. The two tools that have the most lenient entry gate, i.e., used in the last 6 months, have the lowest scores via the email campaign. There is no reason to believe that rating a single experience that occurred months or years ago with NPS will be accurate. Human memory is infamously poor. Elizabeth Loftus  has famously conducted research on eyewitness testimony, which has led to overturning wrongful convictions, illustrating the unreliability of human memory.
In contrast, in-app intercept campaigns provide certainty that your users are rating the app in front of them, not a memory of it, and can provide diagnostic insight into whether a change meant to fix a usability issue has increased NPS/satisfaction. In-app intercept campaigns can be timed to follow the release of new features, whereas we found that email campaigns are less sensitive to feature improvements, as you have no idea whether your users have recently used the app or have had the opportunity to encounter the improvement. Compared to intercept surveys, NPS scores measured via email campaigns will always be a lagging indicator and may take months or years for even big UX improvements to be reflected in the score.
In the case studies we have reviewed, the NPS scores were not well aligned when using different survey methods. This begs the question: are we measuring the same underlying construct? To speculate, it appears that NPS collected via email campaigns is really measuring sentiment after all, or the reputation of application or website; reputations are often set early and may only change with continued exposure. On the other hand, when delivered against the backdrop of the solution to be rated where there is ample information about the solution and reputation can be easily refuted by fact, the NPS morphs into a satisfaction measure. In such a context, NPS may be useful in diagnosing UX issues in accordance with UX research. It is up to the UX practitioner to determine what construct they want to measure, and to choose it wisely.
Always ask the following question to gather comments, regardless of the rating given: “What would make you more likely to recommend it?”
Use an intercept survey within the application to best target your user population and ensure recipients are active users.
Time the use of intercept surveys to provide insight into whether recent improvements have improved the user experience.
If using an email campaign, carefully screen your users to ensure they have had actual experience with the solution within a short time window (e.g., 1–2 months).
Avoid using email campaigns for infrequently used applications.
Allow comments for every rating (0–10).
When reporting NPS, always include the Survey method, the distribution of scores, and the Margin of Error with a 95% confidence interval.
- 1.Reichheld, F.F.: The One Number You Need to Grow. Harvard Business Review, December 2003Google Scholar
- 2.Spool, J.M.: Net Promoter Score Considered Harmful (and What UX Professional Can Do About It). https://articles.uie.com/net-promoter-score-considered-harmful-and-what-ux-professionals-can-do-about-it/
- 3.Fessenden, T.: Net Promoter Score: What a Customer-Relations Metric Can Tell You About Your User Experience (2016). https://www.nngroup.com/articles/nps-ux/
- 4.Sauro, J.: Should the Net Promoter Score Go? 5 Common Criticisms Examined (2014). https://measuringu.com/nps-go/
- 5.Sauro, J.: Is the Net Promoter Score a Better Measure than Satisfaction? (2018). https://measuringu.com/nps-sat/