Skip to main content
Log in

Aggregation of consumer ratings: an application to Yelp.com

  • Published:
Quantitative Marketing and Economics Aims and scope Submit manuscript

Abstract

Because consumer reviews leverage the wisdom of the crowd, the way in which they are aggregated is a central decision faced by platforms. We explore this “rating aggregation problem” and offer a structural approach to solving it, allowing for (1) reviewers to vary in stringency and accuracy, (2) reviewers to be influenced by existing reviews, and (3) product quality to change over time. Applying this to restaurant reviews from Yelp.com, we construct an adjusted average rating and show that even a simple algorithm can lead to large information efficiency gains relative to the arithmetic average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Luca (2011) shows that Yelp consumers respond directly to the average rating even though it is coarser than the underlying information. The importance of the average rating is also supported by an online survey we conducted for this study. In this survey, we ask subjects to report their general use and understanding of restaurant ratings, without mentioning Yelp. Out of the 239 respondents, 93.7% use the average rating to choose a restaurant, but a much lower percentage of respondents said that they pay attention to other review information such as the number of reviews, rating trends, or reviewer profile. More details of the survey are presented in Section 6.

  2. Yelp had 167 million unique visitors per month in the first quarter of 2016 (source: http://www.yelp.com/factsheet), but according to a 2011 blog post of Yelp, journalist Susan Kuchinskas estimated that “only 1 percent of users will actively create content. Another 9 percent, the editors, will participate by commenting, rating, or sharing the content. The other 90 percent watch, look, and read without responding.” (https://www.yelpblog.com/2011/06/yelp-and-the-1990-rule, accessed on June 5, 2016).

  3. We assume that horizontal preferences of reviewers affects reviewer stringency. Hence, when we present the vertical quality to general readers of Yelp reviews, we should benchmark the adjusted average to the stringency of one type of reviewers. We choose to benchmark it to a reviewer with average attributes. The construction of reviewer horizontal preference is presented in Section 2.1.

  4. The “Elite” status is a badge displayed next to the reviewer name, and is rewarded by Yelp to prolific reviewers who write high quality reviews.

  5. Based on hotel reservation data from Travelocity.com, which include consumer-generated reviews from Travelocity.com and TripAdvisor.com, Ghose et al. (2012) estimate consumer demand for various product attributes and then rank products according to estimated “expected utility gain.”

  6. Readers interested in consumer usage of Yelp reviews can refer to Luca (2011), who combines the same Yelp data as in this paper with restaurant revenue data from Seattle. More generally, there is strong evidence that consumer reviews are an important source of information in a variety of settings. Chevalier and Mayzlin (2006) find predictive power of consumer rating on book sales. Both Godes and Mayzlin (2004) and Duan, Gu, and Duan et al. (2008) find the spread of word-of-mouth affects sales by bringing the consumer awareness of consumers; the former measure the spread by the “the dispersion of conversations across communities” and the latter by the volume of reviews. Duan et al. (2008) argue that after the endogenous correlation among ratings, online user reviews have no significant impact on movies’ box office revenues.

  7. We assume that a reviewer submits one review for a restaurant. Therefore, the order of the review indicates the reviewer’s identity. On Yelp.com, reviewers are only allowed to display one review per restaurant.

  8. Some reviewers are by nature generous and obtain psychological gains from submitting reviews that are more favorable than what they actually feel. In this case, 𝜃 r n > 0 represents leniency.

  9. The correlation structure is detailed in Appendix A.

  10. Note that the martingale assumption entails two features in the stochastic process: first, conditional on \(\mu _{rt_{n-1}}\), \(\mu _{rt_{n}}\) is independent of the past signals \(\{s_{rt_{1}},...,s_{rt_{n-1}}\}\); second, conditional on \(\mu _{rt_{n}}\), \(s_{rt_{n}}\) is independent of the past signals \(\{s_{rt_{1}},...,s_{rt_{n-1}}\}\). These two features greatly facilitate reviewer n’s Bayesian estimate of restaurant quality. This is also why we choose martingale over other statistical processes (such as AR(1)).

  11. The cuisine indicators describe whether a restaurant is traditional American, new American, European, Mediterranean, Latin American, Asian, Japanese, seafood, fast food, lounge, bar, bakery/coffee, vegetarian, or others. They are not mutually exclusive. The five price categories are (1,2,3,4) as defined by Yelp plus a missing price category (which we code as 0).

  12. By construction, the sample mean of each factor is normalized to 0 and sample variance normalized to 1.

  13. If reviewer i has not reviewed any restaurant yet, we set her taste equal to the mean characteristics of restaurants (C i t = 0).

  14. We have tried to estimate the model that allows ρ i to vary by reviewer attributes other than elite status, but none of other attributes significantly affect ρ based on the likelihood ratio test.

  15. Our model of time trend in addition to year fixed effects is consistent with Godes and Silva (2012), who suggest that the negative temporal trend may be due to the fact that that reviewers are becoming more critical and more negative in general, and they find that after conditioning on the year a review was written, ratings increase over time. In our case, the time trend since the first review of a restaurant remains negative after we control for year fixed effects.

  16. Note that this decline is in addition to the random walk evolution of restaurant quality because the martingale deviation is assumed to have a mean of zero.

  17. We define the raw age by calendar days since a restaurant’s first review on Yelp and normalize the age variable in our estimation by (raw age-548)/10. We choose to normalize age relative to the 548th day because the downward trend of reviews is steeper in a restaurant’s early reviews and flattens at roughly 1.5 years after the first review.

  18. A summary of the statistical data generating process is available in Appendix A.

  19. The parameters to be estimated are \(\{\mu _{r0}\}_{r = 1}^{R},\) σ ξ , (σ e ,σ n e ), (ρ e ,ρ n e ), (α y e a r t , α n u m r e v , α f r e q r e v , α m a t c h d , α t a s t e v a r , λ (en e)0, β a g e1, β a g e2, β n u m r e v , β f r e q r e v , β m a t c h d , β t a s t e v a r ), and (α a g e1,α a g e2). In an extended model, we also allow {σ e ,σ n e , α a g e1, α a g e2, σ ξ } to differ for ethnic and non-ethnic restaurants.

  20. This is relative to the review submitted 1.5 years after the first review, because age is normalized by (raw age - 548)/10.

  21. The estimation of \(E(\mu _{rt_{n}}|s_{rt_{1}},s_{rt_{2}},..,s_{rt_{n}})\) is detailed in Appendix A.

  22. In Hu et al. (2009), ratings are found to follow bimodal distributions on Amazon (with many one and five stars) and the paper attributed this to the tendency to review when opinions are extreme. We do not find the bimodal distribution pattern on Yelp that Hu et al. (2009) provide as evidence of significant reviewer selection.

  23. Reviews identified by Yelp as fake reviews are removed from the Yelp pages. We do not observe these reviews and do not consider them in our analysis.

  24. We can potentially predict the elite status using past activities on Yelp of a reviewer, but since we do not observe how many rating the sample reviewers have left outside Seattle, we cannot reliably predict elite status.

  25. Note that the martingale evolution of restaurant quality implies an increasing variance around the restaurant’s fixed effect, while positive social incentives implies a decreasing variance.

  26. Specifically, we have e x p((A I C B a y e s i a n A I C L i m i t e d A t t e n t i o n )) = e x p((l o g L B a y e s i a n l o g L L i m i t e d A t t e n t i o n )/2) = 46, 630.

  27. We create these figures by simulating a large number of ratings according to the underlying model, and then computing adjusted versus simple average of ratings at each time of review.

  28. In the simulation with full model specifications, the assumption for restaurant age affecting restaurant quality or reviewer bias is nonessential for comparing the mean absolute errors of the two aggregating methods. Adjusted average always corrects any bias in reviewer bias, and simple average always reflects the sum of the changes in quality and reviewer bias.

  29. There is a large literature on social image and social influence, with most evidence demonstrated in lab or field experiments. For example, Ariely et al. (2009) show that social image is important for charity giving and private monetary incentives partially crowd out the image motivation.

References

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

    Article  Google Scholar 

  • Akerlof, G.A. (1980). A theory of social custom, of which unemployment may be one consequence. Quarterly Journal of Economics, 94(4), 749–75.

    Article  Google Scholar 

  • Alevy, J.E., Haigh, M.S., & List, J.A. (2007). Information cascades: evidence from a field experiment with financial market professionals. The Journal of Finance, 62(1), 151–180.

    Article  Google Scholar 

  • Ariely, D., Bracha, A., & Meier, S. (2009). Doing good or doing well? Image motivation and monetary incentives in behaving prosocially. American Economic Review, 99(1), 544–555.

    Article  Google Scholar 

  • Banerjee, A.V. (1992). A simple model of herd behavior. The Quarterly Journal of Economics, 107(3), 797–817.

    Article  Google Scholar 

  • Bénabou, R., & Tirole, J. (2006). Incentives and prosocial behavior. American Economic Review, 96(5), 1652–1678.

    Article  Google Scholar 

  • Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A Theory of Fads, Fashion, Custom and Cultural Change as Informational Cascades. Journal of Political Economy, 100(5), 992–1026.

    Article  Google Scholar 

  • Brown, J., Hossain, T., & Morgan, J. (2010). Shrouded attributes and information suppression: evidence from the field. Quarterly Journal of Economics, 125(2), 859–876.

    Article  Google Scholar 

  • Chen, Y., Maxwell Harper, F., Konstan, J., & Li, S.X. (2010). Social comparison and contributions to online communities: a field experiment on MovieLens. American Economic Review, 100(4), 1358–1398.

    Article  Google Scholar 

  • Chevalier, J.A., & Mayzlin, D. (2006). The effect of word of mouth on sales: online book reviews. Journal of Marketing Research, 43(3), 345–354.

    Article  Google Scholar 

  • Duan, W., Bin, G., & Whinston, A.B. (2008). Do online reviews matter? an empirical investigation of panel data. Decision Support Systems, 45(4), 1007–1016.

    Article  Google Scholar 

  • Eyster, E., & Rabin, M. (2010). Naïve Herding in Rich-Information Settings. American Economic Journal: Microeconomics, 2(4), 221–243.

    Google Scholar 

  • Fradkin, A., Grewal, E., & Holtz, D. (2017). he Determinants of Online Review Informativeness: Evidence from Field Experiments on Airbnb. working paper.

  • Ghose, A., Ipeirotis, P., & Li, B. (2012). Designing Ranking Systems for Hotels on Travel Search Engines by Mining User-Generated and Crowd-Sourced Content. Marketing Science.

  • Glazer, J., McGuire, T.G., Cao, Z., & Zaslavsky, A. (2008). Using global ratings of health plans to improve the quality of health care. Journal of Health Economics, 27(5), 1182–95.

    Article  Google Scholar 

  • Godes, D., & Mayzlin, D. (2004). Using Online Conversations to Study Word-of-Mouth Communication. Marketing Science, 23(4), 545–560.

    Article  Google Scholar 

  • Godes, D., & Silva, J.C. (2012). Sequential and temporal dynamics of online opinion. Marketing Science, 31(3), 448–473.

    Article  Google Scholar 

  • Hu, N., Zhang, J., & Pavlou, P. (2009). Overcoming the J-shaped distribution of product reviews. Communication ACM.

  • Li, X., & Hitt, L. (2008). Self-selection and information role of online product reviews. Information Systems Research, 19(4), 456–474.

    Article  Google Scholar 

  • Ljungqvist, L., & Sargent, T.J. (2012). Recursive macroeconomic theory, 3Edition. Cambridge: MIT Press.

    Google Scholar 

  • Luca, M. (2011). Reviews, Reputation, and Revenue: The Case of Yelp.com. Harvard Business School working paper.

  • Luca, M., & Smith, J. (2013). Salience in Quality Disclosure: Evidence from The US News College Rankings. Journal of Economics & Management Strategy.

  • Luca, M., & Zervas, G. (2016). Fake it till you make it: reputation, competition, and Yelp review fraud. Management Science.

  • Mayzlin, D., Dover, Y., & Chevalier, J.A. (2014). Promotional Reviews: an Empirical Investigation of Online Review Manipulation. American Economic Review.

  • Miller, N., Resnick, P., & Zeckhauser, R.J. (2005). Eliciting informative feedback: the peer- prediction method. Management Science, 51(9), 1359–1373.

    Article  Google Scholar 

  • Moe, W.W., & Trusov, M. (2011). The value of social dynamics in online product ratings forums. Journal of Marketing Research, 48(3), 444–456.

    Article  Google Scholar 

  • Moe, W.W., & Schweidel, D.A. (2012). Online product opinions: incidence, evaluation, and evolution. Marketing Science, 31(3), 372–386.

    Article  Google Scholar 

  • Muchnik, L., Aral, S., & Taylor, S.J. (2013). Social influence bias: a randomized experiment. Science, 341(6146), 647–651.

    Article  Google Scholar 

  • Nosko, C., & Tadelis, S. (2015). The Limits of reputation in platform Markets: an empirical analysis and field experiment. NBER Working Paper No. 20930, January 2015.

  • Pope, D. (2009). Reacting to rankings: evidence from America’s best hospitals. Journal of Health Economics, 28(6), 1154–1165.

    Article  Google Scholar 

  • Wang, Q., Goh, K.Y., & Lu, X. (2012). How does user generated content influence consumers’ new product exploration and choice diversity? An empirical analysis of product reviews and consumer variety seeking behaviors. Working paper.

  • Wang, Z. (2010). Anonymity, Social Image, and the Competition for Volunteers: A Case Study of the Online Market for Reviews. The B.E. Journal of Economic Analysis & Policy.

  • Welch, G., & Bishop, G. (2001). An introduction to the Kalman filter. In Proceedings of the Siggraph Course, Los Angeles.

  • Wu, C., Che, H., Chan, T.Y., & Lu, X. (2015). The Economic Value of Online Reviews. Marketing Science, 34(5), 739–754.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ginger Jin.

Appendices

Appendices for aggregation of consumer ratings: an application to Yelp.com

Appendix A: Model of reviewer incentives to deviate from prior reviews

In this appendix, we show an alternative model to capture reviewer incentive to differentiate from prior ratings corresponding to our baseline model in Section 2.1. It gives rise to exactly the same equation except for ρ i < 0.

If social incentives motivate reviewer i to deviate from prior reviews, we can model it as reviewer i choosing to report \(x_{rt_{n}}\) to minimize a slightly different objective:

$$F_{rn}^{(2)}=(x_{rt_{n}}-(s_{rt_{n}}+\theta_{rn}))^{2}-w_{i}[x_{rt_{n}}-E(\mu_{rt_{n}}|x_{rt_{1}},x_{rt_{2}},...x_{rt_{n-1}})]^{2} $$

where \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},...x_{rt_{n-1}})\) is the posterior belief of true quality given all the prior ratings (not counting i’s own signal) and w i > 0 is the marginal utility that reviewer i will get by reporting differently from prior ratings. By Bayes’ Rule, \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},...x_{rt_{\{n-1\}}},s_{rt_{n}})\) is a weighted average of \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},...x_{rt_{n-1}})\) and i’s own signal \(s_{rt_{n}}\), which we can write as, \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},...x_{rt_{\{n-1\}}},s_{rt_{n}})=\alpha \cdot s_{rt_{n}}+(1-\alpha )\cdot \) \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},...x_{rt_{n-1}})\). Combining this with the first order condition of \(F_{rn}^{(2)}\), we have

$$\begin{array}{@{}rcl@{}} x_{rt_{n}}^{(2)} &=& \frac{1}{(1-w_{i})} \theta_{rn}+\frac{1-\alpha+w_{i}\alpha}{(1-w_{i})(1-\alpha)}s_{rt_{n}}\\ &&-\frac{w_{i}}{(1-w_{i})(1-\alpha)}E(\mu_{rt_{n}}|x_{rt_{1}},x_{rt_{2}},..x_{rt_{n-1}},s_{rt_{n}})\\ & = & \lambda_{rn}+(1-\rho_{i})s_{rt_{n}}+\rho_{i}E(\mu_{rt_{n}}|x_{rt_{1}},x_{rt_{2}},..x_{rt_{n-1}},s_{rt_{n}}) \end{array} $$

if we redefine \(\lambda _{rn}=\frac {1}{1-w_{i}}\theta _{rn}\) and \(\rho _{i}=-\frac {w_{i}}{(1-w_{i})(1-\alpha )}\). Note that the optimal ratings in the above two scenarios are written in exactly the same expression except that ρ i > 0 if one tries to be close to the best guess of the true restaurant quality in her report and ρ i < 0 if one is motivated to deviate from prior ratings. The empirical estimate of ρ i will inform us which scenario is more consistent with the data. In short, weight ρ i is an indicator of how a rating correlates with past ratings. As long as later ratings contain information from past ratings, aggregation needs to weigh early and late reviews differently.

Appendix B: Notes on the data generating process

3.1 B.1 Data generating process

The model presented in Section 2.1 includes random change in restaurant quality, random noise in reviewer signal, reviewer heterogeneity in stringency, social incentives, and signal precision, and a quadratic time trend, as well as the quality of the match between the reviewer and the restaurant. Overall, one can consider the data generation process as the following three steps:

  1. 1.

    Restaurant r starts with an initial quality μ r0 when it is first reviewed on Yelp. Denote this time as time 0. Since time 0, restaurant quality μ r evolves in a random walk process by calendar time, where an i.i.d. quality noise \(\xi _{t}\sim N(0,\sigma _{\xi }^{2})\) is added on to restaurant quality at t so that μ r t = μ r(t− 1) + ξ t .

  2. 2.

    A reviewer arrives at restaurant r at time t n as r’s n th reviewer. She observes the attributes and ratings of all the previous n − 1 reviewers of r. She also obtains a signal \(s_{rt_{n}}=\mu _{rt_{n}}+\epsilon _{rn}\) of the concurrent restaurant quality where the signal noise \(\epsilon _{rn}\sim N\left (0,\sigma _{\epsilon }^{2}\right )\).

  3. 3.

    The reviewer chooses an optimal rating that gives weights to both her own experience and her social incentives. The optimal rating takes the form

    $$x_{rt_{n}}=\lambda_{rn}+\rho_{n}E(\mu_{rt_{n}}|x_{rt_{1}},x_{rt_{2}},..,x_{rt_{2}},...,s_{rt_{n}})+(1-\rho_{n})s_{rt_{n}} $$

    where \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},..,x_{rt_{2}},...,s_{rt_{n}})\) is the best guess of the restaurant quality at t n by Bayesian updating.

  4. 4.

    The reviewer is assumed to know the attributes of all past reviewers so that she can de-bias the stringency of past reviewers. The reviewer also knows that the general population of reviewers may change taste from year to year (captured in year fixed effects {α y e a r t }), and there is a quadratic trend in λ by restaurant age (captured in {α a g e1,α a g e2}). This trend could be driven by changes in reviewer stringency or restaurant quality and these two drivers are not distinguishable in the above expression for \(x_{rt_{n}}\).

In the Bayesian estimate of \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},..,x_{rt_{2}},...,s_{rt_{n}})\), we assume the n th reviewer of r is fully rational and has perfect information about the other reviewers’ observable attributes, which according to our model determines the other reviewers’ stringency (λ), social preference (ρ), and signal noise (σ 𝜖 ). With this knowledge, the n th reviewer of r can back out each reviewer’s signal before her; thus the Bayesian estimate of \(E(\mu _{rt_{n}}|x_{rt_{1}},x_{rt_{2}},..,x_{rt_{2}},...,s_{rt_{n}})\) can be rewritten as \(E(\mu _{rt_{n}}|s_{rt_{1}},...s_{rt_{n}})\). Typical Bayesian inference implies that a reviewer’s posterior about restaurant quality is a weighted average of previous signals and her own signal, with the weight increasing with signal precision. This is complicated by the fact that restaurant quality evolves by a martingale process, and therefore current restaurant quality is better reflected in recent reviews. Accordingly, the Bayesian estimate of \(E(\mu _{rt_{n}}|s_{rt_{1}},...s_{rt_{n}})\) should give more weight to more recent reviews even if all reviewers have the same stringency, social preference, and signal precision. The analytical derivation of \(E(\mu _{rt_{n}}|s_{rt_{1}},...s_{rt_{n}})\) is presented in Appendix A.

3.2 B.2 Deriving \(\frac {@@}{@@}E(\mu _{rt}|s_{rt_{1}},...s_{rt_{n}})\)

For restaurant r, denote the prior belief of \(\mu _{rt_{n}}\) right before the realization of the n th signal as

$$\pi_{n|n-1}(\mu_{rt_{n}})=f(\mu_{rt_{n}}|s_{rt_{1}},...s_{rt_{n-1}}) $$

and we assume that the first reviewer uses an uninformative prior

$$\mu_{1|0}= 0,\sigma_{1|0}^{2}=W,\ W\ arbitrarily\ large $$

Denote the posterior belief of \(\mu _{rt_{n}}\) after observing \(s_{rt_{n}}\) as

$$h_{n|n}(\mu_{rt_{n}})=f(\mu_{rt_{n}}|s_{rt_{1}},...s_{rt_{n}}) $$

Hence

$$\begin{array}{@{}rcl@{}} h_{n|n}(\mu_{rt_{n}})\,=\,f(\mu_{rt_{n}}|s_{rt_{1}},...s_{rt_{n}}) & = & \frac{f(\mu_{rt_{n}},s_{rt_{1}},...s_{rt_{n}})}{f(s_{rt_{1}},...s_{rt_{n}})}\\ & \propto & f(\mu_{rt_{n}},s_{rt_{1}},...s_{rt_{n}})\\ & = & f(s_{rt_{n}}|\mu_{rt_{n}},s_{rt_{1}},...s_{rt_{n-1}})f(\mu_{rt_{n}},s_{rt_{1}},...s_{rt_{n-1}})\\ & = & f(s_{rt_{n}}|\mu_{rt_{n}},s_{rt_{1}},...s_{rt_{n-1}})f(\mu_{rt_{n}}|s_{rt_{1}},...s_{rt_{n-1}})f(s_{rt_{1}},...s_{rt_{n-1}})\\ & \propto & f(s_{rt_{n}}|\mu_{rt_{n}})f(\mu_{rt_{n}}|s_{rt_{1}},...s_{rt_{n-1}})\\ & = & f(s_{rt_{n}}|\mu_{rt_{n}})\pi_{n|n-1}(\mu_{rt_{n}}) \end{array} $$

where \(f(s_{rt_{n}}|\mu _{rt_{n}},s_{rt_{1}},...s_{rt_{n-1}})=f(s_{rt_{n}}|\mu _{rt_{n}})\) comes from the assumption that \(s_{rt_{n}}\) is independent of past signals conditional on \(\mu _{rt_{n}}\).

In the above formula, the prior belief of \(\mu _{rt_{n}}\) given the realization of \(\{s_{rt_{1}},...,s_{rt_{n-1}}\}\), or \(\pi _{n|n-1}(\mu _{rt_{n}})\), depends on the posterior belief of \(\mu _{rt_{n-1}}\), \(h_{n-1|n-1}(\mu _{rt_{n-1}})\) and the evolution process from \(\mu _{rt_{n-1}}\) to \(\mu _{rt_{n}}\), denoted as g(μ n |μ n− 1). Hence,

$$\pi_{n|n-1}(\mu_{rt_{n}})=g(\mu_{n}|\mu_{n-1})f(\mu_{rt_{n-1}}|s_{rt_{1}},...s_{rt_{n-1}})=g(\mu_{n}|\mu_{n-1})h_{n-1|n-1}(\mu_{rt_{n-1}}) $$

Given the normality of π n|n− 1, \(f(s_{rt_{n}}|\mu _{rt_{n}})\) and g(μ n |μ n− 1), \(h_{n|n}(\mu _{rt_{n}})\) is distributed normal. In addition, denote μ n|n and \(\sigma _{n|n}^{2}\) as the mean and variance for random variable with normal probability density function \(p_{n|n-1}(\mu _{rt_{n}})\), μ n|n− 1 and \(\sigma _{n|n-1}^{2}\) are the mean and variance of random variable with normal pdf \(h_{n|n}(\mu _{rt_{n}})\). After combining terms in the derivation of \(p_{n|n-1}(\mu _{rt_{n}})\) and \(h_{n|n}(\mu _{rt_{n}})\), the mean and variance evolves according to the following rule:

$$\begin{array}{@{}rcl@{}} \mu_{n|n} &=& \mu_{n|n-1}+\frac{\sigma_{n|n-1}^{2}}{\sigma_{n|n-1}^{2}+{\sigma_{n}^{2}}}(s_{n}-\mu_{n|n-1})\\ &=& \frac{\sigma_{n|n-1}^{2}}{\sigma_{n|n-1}^{2}+{\sigma_{n}^{2}}}s_{n}+\frac{{\sigma_{n}^{2}}}{\sigma_{n|n-1}^{2}+{\sigma_{n}^{2}}}\mu_{n|n-1}\\ \sigma_{n|n}^{2} &=& \frac{{\sigma_{n}^{2}}\sigma_{n|n-1}^{2}}{\sigma_{n|n-1}^{2}+{\sigma_{n}^{2}}}\\ \mu_{n + 1|n} &=& \mu_{n|n}\\ \sigma_{n + 1|n}^{2} &=& \sigma_{n|n}^{2}+(t_{n + 1}-t_{n})\sigma_{\xi}^{2} \end{array} $$

Hence, we can deduct the beliefs from the initial prior,

$$\begin{array}{@{}rcl@{}} \mu_{1|0} & =& 0\\ \sigma_{1|0}^{2} & =& W>0\ and\ arbitrarily\ large\\ \mu_{1|1} & =& s_{1}\\ \sigma_{1|1}^{2} & =& {\sigma_{1}^{2}}\\ \mu_{2|1} & =& s_{1}\\ \sigma_{2|1}^{2} & =& {\sigma_{1}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}\\ \mu_{2|2} &=& \frac{{\sigma_{1}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}}s_{2}+\frac{{\sigma_{2}^{2}}}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}}s_{1}\\ \sigma_{2|2}^{2} &=& \frac{{\sigma_{2}^{2}}\left( {\sigma_{1}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}\right)}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}}\\ \mu_{3|2} & =& \mu_{2|2}\\ \sigma_{3|2}^{2} &=& \frac{{\sigma_{2}^{2}}\left( {\sigma_{1}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}\right)}{{\sigma_{1}^{2}}+{\sigma_{2}^{2}}+(t_{2}-t_{1})\sigma_{\xi}^{2}}+(t_{3}-t_{2})\sigma_{\xi}^{2}\\ &...& \end{array} $$

\(E(\mu _{rt_{n}}|s_{rt_{1}},...s_{rt_{n}})=\mu _{n|n}\) is derived recursively following the above formulation.

3.3 B.3 The correlation of ratings induced by quality change

We assume quality evolution follows a martingale process: μ r t = μ r(t− 1) + ξ t , where t denotes the units of calendar time since restaurant r has first been reviewed and the t-specific evolution ξ t conforms to \(\xi _{t}\sim i.i.d\ N\left (0,\sigma _{\xi }^{2}\right )\). This martingale process introduces a positive correlation of restaurant quality over time,

$$\begin{array}{@{}rcl@{}} Cov(\mu_{rt},\mu_{rt^{\prime}}) &=&E\left( \mu_{r0}+\sum\limits_{\tau= 1}^{t}\xi_{\tau}-E(\mu_{rt})\right)\left( \mu_{r0}+\sum\limits_{\tau= 1}^{t^{\prime}}\xi_{\tau}-E(\mu_{rt^{\prime}})\right)\\ &=&E\left( \sum\limits_{\tau= 1}^{t}\xi_{\tau}\sum\limits_{\tau= 1}^{t^{\prime}}\xi_{\tau}\right)=\sum\limits_{\tau= 1}^{t}E\left( \xi_{\tau}^{2}\right)\ if\ t<t^{\prime}, \end{array} $$

which increases with the timing of the earlier date (t) but is independent of the time between t and t .

Recall that \(x_{rt_{n}}\) is the n th review written at time t n since r was first reviewed. We can express the n th reviewer’s signal as:

$$\begin{array}{@{}rcl@{}} s_{rt_{n}} &=& \mu_{rt_{n}}+\epsilon_{rn}\\ where\ \ \ \mu_{rt_{n}} &=& \mu_{rt_{n-1}}+\xi_{t_{n-1}+ 1}+\xi_{t_{n-1}+ 2}+...+\xi_{t_{n}.} \end{array} $$

Signal noise 𝜖 r n is assumed to be i.i.d. with \(Var(s_{rt_{n}}|\mu _{rt_{n}})={\sigma _{i}^{2}}\) where i is the identity of the n th reviewer. The variance of restaurant quality at t n conditional on quality at t n− 1 is,

$$Var(\mu_{rt_{n}}|\mu_{rt_{n-1}}) = Var(\xi_{t_{n-1}+ 1}+\xi_{t_{n-1}+ 2}+...+\xi_{t_{n}})=(t_{n}-t_{n-1})\sigma_{\xi}^{2}={\Delta} t_{n}\sigma_{\xi}^{2}. $$

Note that the martingale assumption entails two features in the stochastic process: first, conditional on \(\mu _{rt_{n-1}}\), \(\mu _{rt_{n}}\) is independent of the past signals \(\{s_{rt_{1}},...,s_{rt_{n-1}}\}\); second, conditional on \(\mu _{rt_{n}}\), \(s_{rt_{n}}\) is independent of the past signals \(\{s_{rt_{1}},...,s_{rt_{n-1}}\}\). These two features greatly facilitate reviewer n’s Bayesian estimate of restaurant quality.

Appendix C: Deriving the likelihood function

4.1 C.1 Deriving the likelihood function \(\frac {@@}{@@}f(x_{rt_{2}}-x_{rt_{1}},...,x_{rt_{N_{r}}}-x_{rt_{N_{r}-1}})\)

Because the covariance structure of \(\{x_{rt_{2}}-x_{rt_{1}},x_{rt_{3}}-x_{rt_{2}},...,x_{rt_{N_{r}}}-x_{rt_{N_{r}-1}}\}\) is complicated, we use the change of variable technique to express the likelihood \(f(x_{rt_{2}}-x_{rt_{1}},...,x_{rt_{N_{r}}}-x_{rt_{N_{r}-1}})\) by \(f(s_{rt_{2}}-s_{rt_{1}},...,s_{rt_{N_{r}}}-s_{rt_{N_{r}-1}})\),

$$f(x_{rt_{2}}-x_{rt_{1}},...,x_{rt_{N_{r}}}-x_{rt_{N_{r}-1}})=|J_{\Delta s\rightarrow{\Delta} x}|^{-1}f(s_{rt_{2}}-s_{rt_{1}},...,s_{rt_{N_{r}}}-s_{rt_{N_{r}-1}}). $$

The derivation of \(f(x_{rt_{2}}-x_{rt_{1}},...,x_{rt_{N_{r}}}-x_{rt_{N_{r}-1}})\) is shown as the following,

  • Step 1: To derive \(f(s_{rt_{2}}-s_{rt_{1}},...,s_{rt_{N_{r}}}-s_{rt_{N_{r}-1}})\), we note that \(s_{rt_{n}}=\mu _{rt_{n}}+\epsilon _{n}\) and thus, for any m > n, n ≥ 2, the variance and covariance structure can be written as:

    $$\begin{array}{@{}rcl@{}} &&Cov(s_{rt_{n}}-s_{rt_{n-1}},s_{rt_{m}}-s_{rt_{m-1}})\\ &&= Cov(\epsilon_{rn}-\epsilon_{rn-1}+\xi_{t_{n-1}+ 1}+...+\xi_{t_{n}},\epsilon_{rm}-\epsilon_{rm-1}+\xi_{t_{m-1}+ 1}+...+\xi_{t_{m}})\\ &&= \left\{\begin{array}{ll} -\sigma_{rn}^{2} & if\ m=n + 1\\ 0 & if\ m>n + 1 \end{array}\right.\\ && Var(s_{rt_{n}}-s_{rt_{n-1}})\\ &&= \sigma_{rn}^{2}+\sigma_{rn-1}^{2}+(t_{n}-t_{n-1})\sigma_{\xi}^{2}. \end{array} $$

    Denoting the total number of reviewers on restaurant r as N r , the vector of the first differences of signals as \({\Delta }s_{r}=\{s_{rt_{n}}-s_{rt_{n-1}}\}_{n = 2}^{N_{r}}\), and its covariance variance structure as \({\Sigma }_{\Delta s_{r}}\), we have

    $$f({\Delta} s_{r})=(2\pi)^{-\frac{N_{r}-1}{2}}|{\Sigma}_{\Delta s_{r}}|^{-(N_{r}-1)/2}exp\left( -\frac{1}{2}{\Delta} s_{r}^{\prime}{\Sigma}_{\Delta s_{r}}^{-1}{\Delta} s_{r}\right). $$
  • Step 2: We derive the value of \(\{s_{rt},...s_{rt_{N_{r}}}\}_{r = 1}^{R}\) from observed ratings \(\{x_{rt_{1}},...x_{rt_{N_{r}}}\}_{r = 1}^{R}\). Given

    $$x_{rt_{n}}=\lambda_{rn}+\rho_{n}E(\mu_{rt_{n}}|s_{rt_{1}},...s_{rt_{n}})+(1-\rho_{n})s_{rt_{n}} $$

    and \(E(\mu _{rt_{n}}|s_{rt},...s_{rt_{n}})\) as a function of \(\{s_{rt_{1}},...s_{rt_{n}}\}\) (formula in Appendix A), we can solve \(\{s_{rt_{1}},...s_{rt_{n}}\}\) from \(\{x_{rt_{1}},...x_{rt_{n}}\}\) according to the recursive formula in Appendix A.

  • Step 3: We derive |J Δs→Δx |− 1 or |J Δx→Δs |, where J Δx→Δs is such that

    $$\left[\begin{array}{c} s_{rt_{2}}-s_{rt_{1}}\\ ...\\ s_{rt_{n}}-s_{rt_{n-1}} \end{array}\right]=J_{\Delta x\rightarrow{\Delta} s}\left[\begin{array}{c} x_{rt_{2}}-x_{rt_{1}}\\ ...\\ x_{rt_{n}}-x_{rt_{n-1}} \end{array}\right] $$

    the analytical form of J Δx→Δs is available given the recursive expression for \(x_{rt_{n}}\) and \(s_{rt_{n}}\).

4.2 C.2 Solving \(\frac {@@}{@@}\{s_{rt_{1}},...s_{rt_{n}}\}\) from observed ratings

Solve \(\{s_{rt_{1}},...s_{rt_{n}}\}\) from \(\{x_{rt_{1}},...x_{rt_{n}}\}\) according to the following recursive formula:

$$\begin{array}{@{}rcl@{}} x_{1} &=& s_{1}+\lambda_{1}\\ s_{1} &=& x_{1}-\lambda_{1}\\ \end{array} $$
$$\begin{array}{@{}rcl@{}} x_{2} &=&\rho_{2}\frac{{\sigma_{2}^{2}}}{\sigma_{2|1}^{2}+{\sigma_{2}^{2}}}\mu_{2|1}+\rho_{2}\frac{\sigma_{2|1}^{2}}{\sigma_{2|1}^{2}+{\sigma_{2}^{2}}}s_{2}+(1-\rho_{2})s_{2}+\lambda_{2}\\ &=&\rho_{2}\frac{{\sigma_{2}^{2}}}{\sigma_{2|1}^{2}+{\sigma_{2}^{2}}}\mu_{2|1}+\left[1-\left( 1-\frac{\sigma_{2|1}^{2}}{\sigma_{2|1}^{2}+{\sigma_{2}^{2}}}\right)\rho_{2}\right]s_{2}+\lambda_{2}\\ s_{2}&=&\frac{1}{\left[1-\left( 1-\frac{\sigma_{2|1}^{2}}{\sigma_{2|1}^{2}+{\sigma_{2}^{2}}}\right)\rho_{2}\right]}\left[x_{2}-\lambda_{2}-\rho_{2}\frac{{\sigma_{2}^{2}}}{\sigma_{2|1}^{2}+{\sigma_{2}^{2}}}\mu_{2|1}\right]\\ &...&\\ s_{n} &=&\frac{1}{\left[1-\left( 1-\frac{\sigma_{n|n-1}^{2}}{\sigma_{n|n-1}^{2}+{\sigma_{n}^{2}}}\right)\rho_{n}\right]}\left[x_{n}-\lambda_{n}-\rho_{n}\frac{{\sigma_{n}^{2}}}{\sigma_{n|n-1}^{2}+{\sigma_{n}^{2}}}\mu_{n|n-1}\right]. \end{array} $$

Appendix D: Tables

Table 7 What explains the variance of yelp ratings?
Table 8 Variability of ratings declines over time
Table 9 Examine serial correlation in restaurant ratings
Table 10 Does matching improve over time?
Table 11 Baseline model with different quality update frequency assumptions
Table 12 Estimation results: limited attention model vs. bayesian rational model
Table 13 Characteristics of review with different simple and adjusted averages gaps

Appendix E: Figures

Fig. 3
figure 3

Distribution of ratings relative to restaurant mean by elite status. Notes: 1. The figure on the left plots the distribution \(Rating_{rn}-\overline {Rating}_{rn}^{BF}\), where \(Rating_{rn}\) is the n th rating on restaurant r, and \(\overline {Rating}_{rn}^{BF}\) is the arithmetic mean of past n − 1 ratings on restaurant r before n. Similarly, the figure on the right plots the distribution of \(Rating_{rn}-\overline {Rating}_{rn}^{AF}\), where \(Rating_{rn}\) is the n th rating on restaurant r, and \(\overline {Rating}_{rn}^{AF}\) is the arithmetic mean of future ratings on restaurant r until the end of our sample. 2. These figures show that ratings by elite reviewers are closer to a restaurant’s average rating

Fig. 4
figure 4

Restaurants experience a “Chilling Effect”. Notes: This figure shows the rating trend within a restaurant over time. Ratings are on average more favorable to restaurants in the beginning and decline over time. We plot the fractional polynomial of the restaurant residual on the sequence of reviews. Residuals \(\epsilon _{rn,year}\) are obtained from regression \(Rating_{rn,year}=\mu _{r}+\gamma _{year}+\epsilon _{rn,year}\)

Fig. 5
figure 5

Fractional-polynomial fit of within restaurant rating trend (Ethnic Vs. Non-ethnic Restaurants). Notes: The above figures plot the simulated 95% confidence interval for the average ratings that would occur for a restaurant at a given quality level. When reviewers differ in precision, both arithmetic and adjusted averages are unbiased estimates for true quality. But, relative to arithmetic average, adjusted average converges faster to true quality. The difference in converging speed increases when elite reviewers’ precision relative to that of non-elite reviewers is larger

Fig. 6
figure 6

Adjusted and simple averages comparison: reviewers with different social incentives. Notes: The above figures simulated 95% confidence interval for adjusted and simple average ratings in predicting true restaurant quality. When reviewers have popularity concerns, arithmetic and adjusted averages are both unbiased estimates for true quality. But, relative to arithmetic average, adjusted aggregation converges faster to the true quality, and the relative efficiency of adjusted average is greater when reviewers’ social incentive is larger

Fig. 7
figure 7

Adjusted and simple averages comparison: reviewers with different precisions. Notes: The above figures plot the simulated 95% confidence interval for the average ratings that would occur for a restaurant at a given quality level. When reviewers differ in precision, both arithmetic and adjusted averages are unbiased estimates for true quality. But, relative to arithmetic average, adjusted average converges faster to true quality. The difference in converging speed increases when elite reviewers’ precision relative to that of non-elite reviewers is larger

Fig. 8
figure 8

Adjusted and simple averages comparison: restaurants with quality random walk. Notes: The above figures plot the mean absolute errors of adjusted and simple average ratings in estimating quality when quality evolves in a random walk process. To isolate randomness in review frequency on restaurants, we fix the frequency of reviews on restaurants to be one per month. We simulate a history of 60 reviews, or a time span of 5 years. 2. The figures show that simple averages become more erroneous in representing the true quality over time while the adjusted average keeps the same level of mean absolute error. The error of simple average is greater compared with adjusted average if a restaurant has larger variance in quality, and changes quality more frequently

Fig. 9
figure 9

Simulations when reviewers are biased. Notes: The above figures plot the simulated mean and 95% confidence interval for the average ratings that would occur for restaurants with biased reviewers. The figure on the left assumes that restaurants have fixed quality at 3, and reviewers’ bias is trending downwards with restaurant age. The figure on the right assumes that the restaurants have quality trending downwards with restaurant age, and the reviewer bias is unaffected by restaurant age. In both cases, we assume that reviewers perfectly acknowledge other reviewers’ biases and the common restaurant quality trend. So in both cases, adjusted aggregation is an unbiased estimate for true quality while the simple average is biased without correcting the review bias

Fig. 10
figure 10

Adjusted and simple averages comparison: “Fake” review. Notes: The above figures plot the simulated mean of the average ratings for a single restaurant whose quality follows the random walk process. A “fake” review that is fixed at 1.5 appears as the first or the fifth review on the restaurant. We consider the review “fake” if a reviewer misreports her signal. Both aggregating algorithms weight past ratings, and are affected by the “fake” rating. But compared with arithmetic mean, adjusted aggregation “forgets” about earlier ratings and converges back to the true quality in a faster rate

Appendix F: “Restaurant reviews beliefs survey” questionnaire

To test our model against external source of information, we conducted an online survey using Amazon Mturk (“Restaurant Reviews Beliefs Survey,” February 1, 2016) in which we asked how respondents used and comprehended restaurant ratings online (we didn’t mention Yelp in the survey). In total, 239 Mturk workers responded to our survey. The following shows the screen shot of the questionnaire.

figure a
figure b

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dai, W.(., Jin, G., Lee, J. et al. Aggregation of consumer ratings: an application to Yelp.com. Quant Mark Econ 16, 289–339 (2018). https://doi.org/10.1007/s11129-017-9194-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11129-017-9194-9

Keywords

JEL Classification

Navigation