Evaluating Recommender Systems

  • Charu C. Aggarwal


The evaluation of collaborative filtering shares a number of similarities with that of classification. This similarity is due to the fact that collaborative filtering can be viewed as a generalization of the classification and regression modeling problem (cf. section of Chapter  1).


Root Mean Square Error Receiver Operating Characteristic Curve Recommender System Rating Matrix Recommendation Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [4]
    P. Adamopoulos, A. Bellogin, P. Castells, P. Cremonesi, and H. Steck. REDD 2014 – International Workshop on Recommender Systems Evaluation: Dimensions and Design. Held in conjunction with ACM Conference on Recommender systems, 2014.Google Scholar
  2. [18]
    C. Aggarwal. Data classification: algorithms and applications. CRC Press, 2014.Google Scholar
  3. [22]
    C. Aggarwal. Data mining: the textbook. Springer, New York, 2015.Google Scholar
  4. [49]
    C. Anderson. The long tail: why the future of business is selling less of more. Hyperion, 2006.Google Scholar
  5. [59]
    S. Balakrishnan and S. Chopra. Collaborative ranking. Web Search and Data Mining Conference, pp. 143–152, 2012.Google Scholar
  6. [93]
    G. Box, W. Hunter, and J. Hunter. Statistics for experimenters, Wiley, New York, 1978.zbMATHGoogle Scholar
  7. [98]
    J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. Conference on Uncertainty in Artificial Inetlligence, 1998.Google Scholar
  8. [130]
    P. Campos, F. Diez, and I. Cantador. Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Modeling and User-Adapted Interaction, 24(1–2), pp. 67–119, 2014.CrossRefGoogle Scholar
  9. [140]
    O. Celma and P. Herrera. A new approach to evaluating novel recommendations. ACM Conference on Recommender Systems, pp. 179–186, 2008.Google Scholar
  10. [141]
    T. Chai and R. Draxler. Root mean square error (RMSE) or mean absolute error (MAE)?– Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), pp. 1247–1250, 2004.,CrossRefGoogle Scholar
  11. [158]
    P. Chirita, W. Nejdl, and C. Zamfir. Preventing shilling attacks in online recommender systems. ACM International Workshop on Web Information and Data Management, pp. 67–74, 2005.Google Scholar
  12. [171]
    H. Cramer, V. Evers, S. Ramlal, M. Someren, L. Rutledge, N. Stash, L. Aroyo, and B. Wielinga. The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction, 18(5), pp. 455–496, 2008.Google Scholar
  13. [173]
    P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. RecSys, pp. 39–46, 2010.Google Scholar
  14. [175]
    A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. World Wide Web Conference, pp. 271–280, 2007.Google Scholar
  15. [181]
    M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems (TOIS), 22(1), pp. 143–177, 2004.CrossRefGoogle Scholar
  16. [184]
    R. Devooght, N. Kourtellis, and A. Mantrach. Dynamic matrix factorization with priors on unknown values. ACM KDD Conference, 2015.Google Scholar
  17. [195]
    T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, Palo Alto, CA, HP Laboratories, 2003.Google Scholar
  18. [203]
    D. M. Fleder and K. Hosanagar. Recommender systems and their impact on sales diversity. ACM Conference on Electronic Commerce, pp. 192–199, 2007.Google Scholar
  19. [214]
    M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. ACM Conference on Recommender Systems, pp. 257–260, 2010.Google Scholar
  20. [246]
    J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 22(1), pp. 5–53, 2004.CrossRefGoogle Scholar
  21. [248]
    J. Herlocker, J. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. ACM Conference on Computer Supported Cooperative work, pp. 241–250, 2000.Google Scholar
  22. [275]
    D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich. An introduction to recommender systems, Cambridge University Press, 2011.Google Scholar
  23. [286]
    N. Jones and P. Pu. User technology adoption issues in recommender systems. Networking and Electronic Conference, pp. 379–394, 2007.Google Scholar
  24. [298]
    M. Kendall. A new measure of rank correlation. Biometrika, pp. 81–93, 1938.Google Scholar
  25. [299]
    M. Kendall and J. Gibbons. Rank correlation methods. Charles Griffin, 5th edition, 1990.Google Scholar
  26. [305]
    R. Kohavi, R. Longbotham, D. Sommerfield, R. Henne. Controlled experiments on the Web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), pp. 140–181, 2009.MathSciNetCrossRefGoogle Scholar
  27. [308]
    J. Konstan, S. McNee, C. Ziegler, R. Torres, N. Kapoor, and J. Riedl. Lessons on applying automated recommender systems to information-seeking tasks. AAAI Conference, pp. 1630–1633, 2006.Google Scholar
  28. [309]
    Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. ACM KDD Conference, pp. 426–434, 2008. Extended version of this paper appears as: “Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD), 4(1), 1, 2010.”Google Scholar
  29. [311]
    Y. Koren. The Bellkor solution to the Netflix grand prize. Netflix prize documentation, 81, 2009.
  30. [317]
    V. Krishnan, P. Narayanashetty, M. Nathan, R. Davies, and J. Konstan. Who predicts better? Results from an online study comparing humans and an online recommender system. ACM Conference on Recommender Systems, pp. 211–218, 2008.Google Scholar
  31. [329]
    S. Lam and J. Riedl. Shilling recommender systems for fun and profit. World Wide Web Conference, pp. 393–402, 2004.Google Scholar
  32. [335]
    N. Lathia, S. Hailes, L. Capra, and X. Amatriain. Temporal diversity in recommender systems. ACM SIGIR Conference, pp. 210–217, 2010.Google Scholar
  33. [339]
    B.-H. Lee, H. Kim, J. Jung, and G.-S. Jo. Location-based service with context data for a restaurant recommendation. Database and Expert Systems Applications, pp. 430–438, 2006.Google Scholar
  34. [349]
    L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. International Conference on Web Search and Data Mining, pp. 297–306, 2011.Google Scholar
  35. [361]
    C. Ling and C. Li. Data Mining for direct marketing: problems and solutions. ACM KDD Conference, pp. 73–79, 1998.Google Scholar
  36. [385]
    Z. Ma, G. Pant, and O. Sheng. Interest-based personalized search. ACM Transactions on Information Systems, 25(1), 2007.Google Scholar
  37. [389]
    T. Mahmood and F. Ricci. Learning and adaptivity in interactive recommender systems. International Conference on Electronic Commerce, pp. 75–84, 2007.Google Scholar
  38. [390]
    T. Mahmood and F. Ricci. Improving recommender systems with adaptive conversational strategies. ACM Conference on Hypertext and Hypermedia, pp. 73–82, 2009.Google Scholar
  39. [393]
    M. O’Mahony, N. Hurley, N. Kushmerick, and G. Silvestre. Collaborative recommendation: A robustness analysis. ACM Transactions on Internet Technology, 4(4), pp. 344–377, 2004.CrossRefGoogle Scholar
  40. [402]
    B. Marlin and R. Zemel. Collaborative prediction and ranking with non-random missing data. ACM Conference on Recommender Systems, pp. 5–12, 2009.Google Scholar
  41. [418]
    S. McNee, J. Riedl, and J. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. SIGCHI Conference, pp. 1097–1101, 2006.Google Scholar
  42. [433]
    S. Middleton, N. Shadbolt, and D. de Roure. Ontological user profiling in recommender systems. ACM Transactions on Information Systems, 22(1), pp. 54–88, 2004.CrossRefGoogle Scholar
  43. [444]
    B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward trustworthy recommender systems: an analysis of attack models and algorithm robustness. ACM Transactions on Internet Technology (TOIT), 7(4), 23, 2007.Google Scholar
  44. [450]
    T. Murakami, K. Mori, and R. Orihara. Metrics for evaluating the serendipity of recommendation lists. New Frontiers in Artificial Intelligence, pp. 40–46, 2008.Google Scholar
  45. [459]
    F. Del Olmo and E. Gaudioso. Evaluation of recommender systems: A new approach. Expert Systems with Applications, 35(3), pp. 790–804, 2008.CrossRefGoogle Scholar
  46. [486]
    P. Pu and L. Chen. Trust building with explanation interfaces. International conference on Intelligent User Interfaces, pp. 93–100, 2006.Google Scholar
  47. [505]
    F. Ricci, L. Rokach, B. Shapira, and P. Kantor. Recommender systems handbook. Springer, New York, 2011.Google Scholar
  48. [527]
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Incremental singular value decomposition algorithms for highly scalable recommender systems. International Conference on Computer and Information Science, pp. 27–28, 2002.Google Scholar
  49. [528]
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. International Conference on Computer and Information Technology, 2002.Google Scholar
  50. [533]
    A. Schein, A. Popescul, L. Ungar, and D. Pennock. Methods and metrics for cold-start recommendations. ACM SIGIR Conference, 2002.Google Scholar
  51. [538]
    G. Shani and A. Gunawardana. Evaluating recommendation systems. Recommender Systems Handbook, pp. 257–297, 2011.Google Scholar
  52. [539]
    G. Shani, M. Chickering, and C. Meek. Mining recommendations from the Web. ACM Conference on Recommender Systems, pp. 35–42, 2008.Google Scholar
  53. [554]
    J. Sill, G. Takacs, L. Mackey, and D. Lin. Feature-weighted linear stacking. arXiv preprint, arXiv:0911.0460, 2009.
  54. [560]
    B. Smyth and P. McClave. Similarity vs. diversity. Case-Based Reasoning Research and Development, pp. 347–361, 2001.Google Scholar
  55. [564]
    H. Steck. Item popularity and recommendation accuracy. ACM Conference on Recommender Systems, pp. 125–132, 2011.Google Scholar
  56. [565]
    H. Steck. Training and testing of recommender systems on data missing not at random. ACM KDD Conference, pp. 713–722, 2010.Google Scholar
  57. [566]
    H. Steck. Evaluation of recommendations: rating-prediction and ranking. ACM Conference on Recommender Systems, pp. 213–220, 2013.Google Scholar
  58. [579]
    R. Sutton and A. Barto. Reinforcement learning: An introduction, MIT Press, Cambridge, 1998.Google Scholar
  59. [585]
    N. Taghipour, A. Kardan, and S. Ghidary. Usage-based web recommendations: a reinforcement learning approach. ACM Conference on Recommender Systems, pp. 113–120, 2007.Google Scholar
  60. [587]
    G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 10, pp. 623–656, 2009.Google Scholar
  61. [632]
    C. Willmott and K. Matsuura. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79, 2005.Google Scholar
  62. [670]
    Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. ACM SIGIR Conference, pp. 81–88, 2002.Google Scholar
  63. [680]
    C. Ziegler, S. McNee, J. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. World Wide Web Conference, pp. 22–32, 2005.Google Scholar
  64. [713]

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Charu C. Aggarwal
    • 1
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations