Evaluating Recommender Systems

  • Charu C. Aggarwal


The evaluation of collaborative filtering shares a number of similarities with that of classification. This similarity is due to the fact that collaborative filtering can be viewed as a generalization of the classification and regression modeling problem (cf. section of Chapter  1).


  1. [4]
    P. Adamopoulos, A. Bellogin, P. Castells, P. Cremonesi, and H. Steck. REDD 2014 – International Workshop on Recommender Systems Evaluation: Dimensions and Design. Held in conjunction with ACM Conference on Recommender systems, 2014.Google Scholar
  2. [18]
    C. Aggarwal. Data classification: algorithms and applications. CRC Press, 2014.Google Scholar
  3. [22]
    C. Aggarwal. Data mining: the textbook. Springer, New York, 2015.Google Scholar
  4. [49]
    C. Anderson. The long tail: why the future of business is selling less of more. Hyperion, 2006.Google Scholar
  5. [59]
    S. Balakrishnan and S. Chopra. Collaborative ranking. Web Search and Data Mining Conference, pp. 143–152, 2012.Google Scholar
  6. [93]
    G. Box, W. Hunter, and J. Hunter. Statistics for experimenters, Wiley, New York, 1978.MATHGoogle Scholar
  7. [98]
    J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. Conference on Uncertainty in Artificial Inetlligence, 1998.Google Scholar
  8. [130]
    P. Campos, F. Diez, and I. Cantador. Time-aware recommender systems: a comprehensive survey and analysis of existing evaluation protocols. User Modeling and User-Adapted Interaction, 24(1–2), pp. 67–119, 2014.CrossRefGoogle Scholar
  9. [140]
    O. Celma and P. Herrera. A new approach to evaluating novel recommendations. ACM Conference on Recommender Systems, pp. 179–186, 2008.Google Scholar
  10. [141]
    T. Chai and R. Draxler. Root mean square error (RMSE) or mean absolute error (MAE)?– Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), pp. 1247–1250, 2004.,CrossRefGoogle Scholar
  11. [158]
    P. Chirita, W. Nejdl, and C. Zamfir. Preventing shilling attacks in online recommender systems. ACM International Workshop on Web Information and Data Management, pp. 67–74, 2005.Google Scholar
  12. [171]
    H. Cramer, V. Evers, S. Ramlal, M. Someren, L. Rutledge, N. Stash, L. Aroyo, and B. Wielinga. The effects of transparency on trust in and acceptance of a content-based art recommender. User Modeling and User-Adapted Interaction, 18(5), pp. 455–496, 2008.Google Scholar
  13. [173]
    P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. RecSys, pp. 39–46, 2010.Google Scholar
  14. [175]
    A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. World Wide Web Conference, pp. 271–280, 2007.Google Scholar
  15. [181]
    M. Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Transactions on Information Systems (TOIS), 22(1), pp. 143–177, 2004.CrossRefGoogle Scholar
  16. [184]
    R. Devooght, N. Kourtellis, and A. Mantrach. Dynamic matrix factorization with priors on unknown values. ACM KDD Conference, 2015.Google Scholar
  17. [195]
    T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, Palo Alto, CA, HP Laboratories, 2003.Google Scholar
  18. [203]
    D. M. Fleder and K. Hosanagar. Recommender systems and their impact on sales diversity. ACM Conference on Electronic Commerce, pp. 192–199, 2007.Google Scholar
  19. [214]
    M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. ACM Conference on Recommender Systems, pp. 257–260, 2010.Google Scholar
  20. [246]
    J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 22(1), pp. 5–53, 2004.CrossRefGoogle Scholar
  21. [248]
    J. Herlocker, J. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. ACM Conference on Computer Supported Cooperative work, pp. 241–250, 2000.Google Scholar
  22. [275]
    D. Jannach, M. Zanker, A. Felfernig, and G. Friedrich. An introduction to recommender systems, Cambridge University Press, 2011.Google Scholar
  23. [286]
    N. Jones and P. Pu. User technology adoption issues in recommender systems. Networking and Electronic Conference, pp. 379–394, 2007.Google Scholar
  24. [298]
    M. Kendall. A new measure of rank correlation. Biometrika, pp. 81–93, 1938.Google Scholar
  25. [299]
    M. Kendall and J. Gibbons. Rank correlation methods. Charles Griffin, 5th edition, 1990.Google Scholar
  26. [305]
    R. Kohavi, R. Longbotham, D. Sommerfield, R. Henne. Controlled experiments on the Web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), pp. 140–181, 2009.MathSciNetCrossRefGoogle Scholar
  27. [308]
    J. Konstan, S. McNee, C. Ziegler, R. Torres, N. Kapoor, and J. Riedl. Lessons on applying automated recommender systems to information-seeking tasks. AAAI Conference, pp. 1630–1633, 2006.Google Scholar
  28. [309]
    Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. ACM KDD Conference, pp. 426–434, 2008. Extended version of this paper appears as: “Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Transactions on Knowledge Discovery from Data (TKDD), 4(1), 1, 2010.”Google Scholar
  29. [311]
    Y. Koren. The Bellkor solution to the Netflix grand prize. Netflix prize documentation, 81, 2009. http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
  30. [317]
    V. Krishnan, P. Narayanashetty, M. Nathan, R. Davies, and J. Konstan. Who predicts better? Results from an online study comparing humans and an online recommender system. ACM Conference on Recommender Systems, pp. 211–218, 2008.Google Scholar
  31. [329]
    S. Lam and J. Riedl. Shilling recommender systems for fun and profit. World Wide Web Conference, pp. 393–402, 2004.Google Scholar
  32. [335]
    N. Lathia, S. Hailes, L. Capra, and X. Amatriain. Temporal diversity in recommender systems. ACM SIGIR Conference, pp. 210–217, 2010.Google Scholar
  33. [339]
    B.-H. Lee, H. Kim, J. Jung, and G.-S. Jo. Location-based service with context data for a restaurant recommendation. Database and Expert Systems Applications, pp. 430–438, 2006.Google Scholar
  34. [349]
    L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. International Conference on Web Search and Data Mining, pp. 297–306, 2011.Google Scholar
  35. [361]
    C. Ling and C. Li. Data Mining for direct marketing: problems and solutions. ACM KDD Conference, pp. 73–79, 1998.Google Scholar
  36. [385]
    Z. Ma, G. Pant, and O. Sheng. Interest-based personalized search. ACM Transactions on Information Systems, 25(1), 2007.Google Scholar
  37. [389]
    T. Mahmood and F. Ricci. Learning and adaptivity in interactive recommender systems. International Conference on Electronic Commerce, pp. 75–84, 2007.Google Scholar
  38. [390]
    T. Mahmood and F. Ricci. Improving recommender systems with adaptive conversational strategies. ACM Conference on Hypertext and Hypermedia, pp. 73–82, 2009.Google Scholar
  39. [393]
    M. O’Mahony, N. Hurley, N. Kushmerick, and G. Silvestre. Collaborative recommendation: A robustness analysis. ACM Transactions on Internet Technology, 4(4), pp. 344–377, 2004.CrossRefGoogle Scholar
  40. [402]
    B. Marlin and R. Zemel. Collaborative prediction and ranking with non-random missing data. ACM Conference on Recommender Systems, pp. 5–12, 2009.Google Scholar
  41. [418]
    S. McNee, J. Riedl, and J. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. SIGCHI Conference, pp. 1097–1101, 2006.Google Scholar
  42. [433]
    S. Middleton, N. Shadbolt, and D. de Roure. Ontological user profiling in recommender systems. ACM Transactions on Information Systems, 22(1), pp. 54–88, 2004.CrossRefGoogle Scholar
  43. [444]
    B. Mobasher, R. Burke, R. Bhaumik, and C. Williams. Toward trustworthy recommender systems: an analysis of attack models and algorithm robustness. ACM Transactions on Internet Technology (TOIT), 7(4), 23, 2007.Google Scholar
  44. [450]
    T. Murakami, K. Mori, and R. Orihara. Metrics for evaluating the serendipity of recommendation lists. New Frontiers in Artificial Intelligence, pp. 40–46, 2008.Google Scholar
  45. [459]
    F. Del Olmo and E. Gaudioso. Evaluation of recommender systems: A new approach. Expert Systems with Applications, 35(3), pp. 790–804, 2008.CrossRefGoogle Scholar
  46. [486]
    P. Pu and L. Chen. Trust building with explanation interfaces. International conference on Intelligent User Interfaces, pp. 93–100, 2006.Google Scholar
  47. [505]
    F. Ricci, L. Rokach, B. Shapira, and P. Kantor. Recommender systems handbook. Springer, New York, 2011.Google Scholar
  48. [527]
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Incremental singular value decomposition algorithms for highly scalable recommender systems. International Conference on Computer and Information Science, pp. 27–28, 2002.Google Scholar
  49. [528]
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. International Conference on Computer and Information Technology, 2002.Google Scholar
  50. [533]
    A. Schein, A. Popescul, L. Ungar, and D. Pennock. Methods and metrics for cold-start recommendations. ACM SIGIR Conference, 2002.Google Scholar
  51. [538]
    G. Shani and A. Gunawardana. Evaluating recommendation systems. Recommender Systems Handbook, pp. 257–297, 2011.Google Scholar
  52. [539]
    G. Shani, M. Chickering, and C. Meek. Mining recommendations from the Web. ACM Conference on Recommender Systems, pp. 35–42, 2008.Google Scholar
  53. [554]
    J. Sill, G. Takacs, L. Mackey, and D. Lin. Feature-weighted linear stacking. arXiv preprint, arXiv:0911.0460, 2009. http://arxiv.org/pdf/0911.0460.pdf
  54. [560]
    B. Smyth and P. McClave. Similarity vs. diversity. Case-Based Reasoning Research and Development, pp. 347–361, 2001.Google Scholar
  55. [564]
    H. Steck. Item popularity and recommendation accuracy. ACM Conference on Recommender Systems, pp. 125–132, 2011.Google Scholar
  56. [565]
    H. Steck. Training and testing of recommender systems on data missing not at random. ACM KDD Conference, pp. 713–722, 2010.Google Scholar
  57. [566]
    H. Steck. Evaluation of recommendations: rating-prediction and ranking. ACM Conference on Recommender Systems, pp. 213–220, 2013.Google Scholar
  58. [579]
    R. Sutton and A. Barto. Reinforcement learning: An introduction, MIT Press, Cambridge, 1998.Google Scholar
  59. [585]
    N. Taghipour, A. Kardan, and S. Ghidary. Usage-based web recommendations: a reinforcement learning approach. ACM Conference on Recommender Systems, pp. 113–120, 2007.Google Scholar
  60. [587]
    G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 10, pp. 623–656, 2009.Google Scholar
  61. [632]
    C. Willmott and K. Matsuura. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79, 2005.Google Scholar
  62. [670]
    Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. ACM SIGIR Conference, pp. 81–88, 2002.Google Scholar
  63. [680]
    C. Ziegler, S. McNee, J. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. World Wide Web Conference, pp. 22–32, 2005.Google Scholar
  64. [713]

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Charu C. Aggarwal
    • 1
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations