Skip to main content
Log in

Towards reproducibility in recommender-systems research

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

Numerous recommendation approaches are in use today. However, comparing their effectiveness is a challenging task because evaluation results are rarely reproducible. In this article, we examine the challenge of reproducibility in recommender-system research. We conduct experiments using Plista’s news recommender system, and Docear’s research-paper recommender system. The experiments show that there are large discrepancies in the effectiveness of identical recommendation approaches in only slightly different scenarios, as well as large discrepancies for slightly different approaches in identical scenarios. For example, in one news-recommendation scenario, the performance of a content-based filtering approach was twice as high as the second-best approach, while in another scenario the same content-based filtering approach was the worst performing approach. We found several determinants that may contribute to the large discrepancies observed in recommendation effectiveness. Determinants we examined include user characteristics (gender and age), datasets, weighting schemes, the time at which recommendations were shown, and user-model size. Some of the determinants have interdependencies. For instance, the optimal size of an algorithms’ user model depended on users’ age. Since minor variations in approaches and scenarios can lead to significant changes in a recommendation approach’s performance, ensuring reproducibility of experimental results is difficult. We discuss these findings and conclude that to ensure reproducibility, the recommender-system community needs to (1) survey other research fields and learn from them, (2) find a common understanding of reproducibility, (3) identify and understand the determinants that affect reproducibility, (4) conduct more comprehensive experiments, (5) modernize publication practices, (6) foster the development and use of recommendation frameworks, and (7) establish best-practice guidelines for recommender-systems research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. Some of the definitions have been previously introduced by Beel (2015).

  2. http://plista.com.

  3. With “well-performing” we mean if one algorithm was the most effective on a particular news site, it should be the most effective algorithm on other news sites, or at least should be among the most effective.

  4. For more details on the algorithms and the evaluation, refer to Lommatzsch (2014a, b).

  5. Please note that the results of this section are not statically significant, and a further analysis based on more data is required.

  6. Several suggestions are inspired from Ekstrand et al. (2011b) and Konstan and Adomavicius (2013).

References

  • Al-Maskari, A., Sanderson, M., Clough, P.: The relationship between IR effectiveness measures and user satisfaction. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 773–774. ACM, New York (2007)

  • Amatriain, X., Pujol, J., Oliver, N.: I like it. i like it not: Evaluating user ratings noise in recommender systems. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) User Modeling, Adaptation, and Personalization, pp. 247–258. Springer, Berlin (2009)

  • Beel, J.: Towards effective research-paper recommender systems and user modeling based on mind maps. PhD Thesis. Otto-von-Guericke Universität Magdeburg (2015)

  • Beel, J., Langer, S.: A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL). Lecture Notes in Computer Science. 153–168 (2015). doi:10.1007/978-3-319-24592-8_12

  • Beel, J., Gipp, B., Shaker, A., Friedrich, N.: SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information (font size). In Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) Research and Advanced Technology for Digital Libraries. Proceedings of the 14th European Conference on Digital Libraries (ECDL’10). Lecture Notes of Computer Science (LNCS), pp. 413–416. Springer, Glasgow (2010)

  • Beel, J., Gipp, B., Langer, S., Genzmehr, M.: Docear: an academic literature suite for searching, organizing and creating academic literature. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL). JCDL’11, pp. 465–466. ACM, New York (2011). doi:10.1145/1998076.1998188

  • Beel, J., Langer, S., Genzmehr, M.: Sponsored versus organic (Research Paper) recommendations and the impact of labeling. In: Aalberg, T., Dobreva, M., Papatheodorou, C., Tsakonas, G., Farrugia, C. (eds.) Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 395–399. Malta, Valletta (2013)

    Google Scholar 

  • Beel, J., Langer, S., Genzmehr, M., Gipp, B., Breitinger, C., Nürnberger, A.: Research paper recommender system evaluation: a quantitative literature survey. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys). ACM International Conference Proceedings Series (ICPS), pp. 15–22. ACM, New York (2013b). doi:10.1145/2532508.2532512

  • Beel, J., Langer, S., Genzmehr, M., Gipp, B., Nürnberger, A.: A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys). ACM International Conference Proceedings Series (ICPS), pp. 7–14 (2013c). doi:10.1145/2532508.2532511

  • Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docears PDF inspector: title extraction from PDF files. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13), pp. 443–444. ACM, New York (2013d). doi:10.1145/2467696.2467789

  • Beel, J., Langer, S., Genzmehr, M., Nürnberger, A.: Introducing Docear’s research paper recommender system. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13), pp. 459–460. ACM, New Year (2013e). doi:10.1145/2467696.2467786

  • Beel, J., Langer, S., Genzmehr, M., Nürnberger, A: Persistence in recommender systems: giving the same recommendations to the same users multiple times. In: Aalberg, T., Dobreva, M., Papatheodorou, C., Tsakonas, G., Farrugia, C. (eds.) Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013). Lecture Notes of Computer Science (LNCS), pp. 390–394. Springer, Valletta (2013f)

  • Beel, J., Langer, S., Nürnberger, A., Genzmehr, M.: The impact of demographics (age and gender) and other user characteristics on evaluating recommender systems. In: Aalberg, T., Dobreva, M., Papatheodorou, C., Tsakonas, G., Farrugia, C. (eds.) Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 400–404. Springer, Valletta (2013)

    Google Scholar 

  • Beel, J., Langer, S., Genzmehr, M., Gipp, B.: Utilizing mind-maps for information retrieval and user modelling. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, G.-J. (eds.) Proceedings of the 22nd Conference on User Modelling, Adaption, and Personalization (UMAP). Lecture Notes in Computer Science, pp. 301–313. Springer, Berlin (2014a). doi:10.1007/978-3-319-08786-3_26

  • Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of Docear’s Research paper recommender system. D-Lib Magazine 20, 11/12 (2014b). doi:10.1045/november14-beel

  • Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research paper recommender systems: a literature survey. Int. J. Digital Libr. 1–34 (2015a). doi:10.1007/s00799-015-0156-0

  • Beel, J., Langer, S., Kapitsaki, G.M., Breitinger, C., Gipp, B.: Exploring the potential of user modeling based on mind maps. In: Ricci, F., Bontcheva, K., Conlan, O., Lawless, S. (eds.) Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP). Lecture Notes of Computer Science, pp. 3–17. Springer, Berlin (2015b). doi:10.1007/978-3-319-20267-9_1

  • Bellogin, A., Castells, P., Said, A., Tikk, D.: Report on the workshop on reproducibility and replication in recommender systems evaluation (RepSys). In: ACM SIGIR forum, pp. 29–35. ACM, New York (2014)

  • Bethard, S., Jurafsky, D.: Who should I cite: learning literature search models from citation behavior. In: Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 609–618. ACM, New York (2010)

  • Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl.-Based Syst. 46(2013), 109–132 (2013)

    Article  Google Scholar 

  • Bogers, T., van den Bosch, A.: Comparing and evaluating information retrieval algorithms for news recommendation. In: RecSys’07, pp. 141–144. ACM, Minneapolis (2007)

  • Bogers, T., van den Bosch, A.: Recommending scientific articles using citeulike. In: Proceedings of the 2008 ACM conference on Recommender systems, pp. 287–290. ACM, New York (2008)

  • Bollen, J., Rocha, L.M.: An adaptive systems approach to the implementation and evaluation of digital library recommendation systems. In: Proceedings of the 4th European Conference on Digital Libraries, pp. 356–359. Springer, Berlin (2000)

  • Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on Uncertainty in Artificial Intelligence, pp. 43–52. Microsoft Research (1998)

  • Buckheit, J.B., Donoho, D.L.: Wavelab and reproducible research. Wavelets and Statistics. Lecture Notes in Statistics, pp. 55–81. Springer, Berlin (1995)

  • Burns, A.C., Bush, R.F.: Marketing Research, 7th edn. Prentice Hall, Upper Saddle River (2013)

    Google Scholar 

  • Casadevall, A., Fang, F.C.: Reproducible science. Infect. Immun. 78(12), 4972–4975 (2010)

    Article  Google Scholar 

  • CiteULike. My Top Recommendations. Webpage (http://www.citeulike.org/profile/joeran/recommendations) (2011)

  • Cremonesi, P., Garzotto, F., Negro, S., Papadopoulos, A.V., Turrin, R.: Looking for “good” recommendations: a comparative evaluation of recommender systems. In: Human-Computer Interaction-INTERACT 2011, pp. 152–168. Springer, Berlin (2011)

  • Cremonesi, P., Garzotto, F., Turrin, R.: Investigating the persuasion potential of recommender systems from a quality perspective: An empirical study. ACM Trans. Interact. Intell. Syst. 2(2), 1–11 (2012)

    Article  Google Scholar 

  • Davies, M.: Concept mapping, mind mapping and argument mapping: what are the differences and do they matter? Hig. Educ. 62(3), 279–301 (2011)

    Article  Google Scholar 

  • Deyo, R.A., Diehr, P., Patrick, D.L.: Reproducibility and responsiveness of health status measures statistics and strategies for evaluation. Control. Clin. Trials 12(4), S142–S158 (1991)

    Article  Google Scholar 

  • Dong, R., Tokarchuk, L., Ma, A.: Digging friendship: paper recommendation in social network. In: Proceedings of Networking & Electronic Commerce Research Conference (NAEC 2009), pp. 21–28 (2009)

  • Domingues Garcia, R., Bender, M., Anjorin, M., Rensing, C., Steinmetz, R.: FReSET: an evaluation framework for folksonomy-based recommender systems. In: Proceedings of the 4th ACM RecSys Workshop on Recommender Systems and the Social Web, pp. 25–28. ACM, New York (2012)

  • Downing, S.M.: Reliability: on the reproducibility of assessment data. Med. Educ. 38(9), 1006–1012 (2004)

    Article  Google Scholar 

  • Drummond, C.: Replicability is not reproducibility: nor is it good science. In: Proceedings of the Evaluation Methods for MachineLearning Workshop at the 26th ICML (2009)

  • Ekstrand, M.D., Kannan, P., Stemper, J.A., Butler, J.T., Konstan, J.A., Riedl, J.T.: Automatically building research reading lists. In Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 159–166. ACM, New York (2010)

  • Eckart de Castilho, R., Gurevych, I.: A lightweight framework for reproducible parameter sweeping in information retrieval. In: Proceedings of the 2011 Workshop on Data InfrastructurEs for Supporting Information Retrieval Evaluation, pp. 7–10. ACM, New York (2011)

  • Ekstrand, M.D., Ludwig, M., Kolb, J., Riedl, J.T.: LensKit: a modular recommender framework. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 349–350. ACM, New York (2011a)

  • Ekstrand, M.D., Ludwig, M., Konstan, J.A., Riedl, J.T.: Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 133–140. ACM, New York (2011b)

  • Felfernig, A., Jeran, M., Ninaus, G., Reinfrank, F., Reiterer, S.: Toward the next generation of recommender systems: applications and research challenges. In: Multimedia Services in Intelligent Environments, pp. 81–98. Springer, Berlin (2013)

  • Flyvbjerg, B.: Making Social Science Matter: Why Social Inquiry Fails and How It Can Succeed Again. Cambridge University Press, Cambridge (2001)

    Book  Google Scholar 

  • Gantner, Z., Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: MyMediaLite: a free recommender system library. In Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 305–308. ACM, New York (2011)

  • Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 257–260. ACM, New York (2010)

  • Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)

    MathSciNet  MATH  Google Scholar 

  • Guo, G., Zhang, J., Sun, Z., Yorke-Smith, N.: Librec: a java library for recommender systems. In: Posters, Demos, Late-breaking Results and Workshop Proceedings of the 23rd International Conference on User Modeling, Adaptation and Personalization (2015)

  • Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  • Hahsler, M.: Recommenderlab: a framework for developing and testing recommendation algorithms. (2011). https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

  • Hawking, D., Craswell, N., Thistlewaite, P., Harman, D.: Results and challenges in web search evaluation. Comput. Netw. 31(11), 1321–1330 (1999)

    Article  Google Scholar 

  • Hayes, C., Massa, P., Avensani, P., Cunningham, P.: An on-line evaluation framework for recommender systems. In: Proceedings of the AH’2002 Workshop on Recommendation and Personalization in eCommerce. Department of Computer Science, Trinity College Dublin (2002)

  • He, Q., Pei, J., Kifer, D., Mitra, P., Giles, L.: Context-aware citation recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 421–430. ACM, New York (2010)

  • He, J., Nie, J.-Y., Lu, Y., Zhao, W.X.: Position-aligned translation model for citation recommendation. In: Proceedings of the 19th International Conference on String Processing and Information Retrieval, pp. 251–263. Springer, Berlin (2012)

  • Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004)

    Article  Google Scholar 

  • Hersh, W. et al.: Do batch and user evaluations give the same results? In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 17–24. ACM, New York (2000a)

  • Hersh, W.R. et al.: Further analysis of whether batch and user evaluations give the same results with a question-answering task. In: Proceedings of the Ninth Text REtrieval Conference (TREC 9), pp. 16–25 (2000b)

  • Hoeymans, N., Wouters, E.R.C.M., Feskens, E.J.M., van den Bos, G.A.M., Kromhout, D.: Reproducibility of performance-based and self-reported measures of functional status. J. Gerontol. Ser. A 52(6), M363–M368 (1997)

    Article  Google Scholar 

  • Hofmann, K., Schuth, A., Bellogin, A., de Rijke, M.: Effects of position bias on click-based recommender evaluation. In Advances in Information Retrieval, pp. 624–630. Springer, Berlin (2014)

  • Holland, B., Holland, L., Davies, J.: An investigation into the concept of mind mapping and the use of mind mapping software to support and improve student academic performance, pp. 89–94. Centre for Learning and Teaching - Learning and Teaching Project Report, University of Wolverhampton, Hollande (2004)

  • Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Managementpp. 1910–1914. ACM, New York (2012)

  • Jannach, D.: Recommender systems: an introduction. Lecture Slides (PhD School 2014) (2014)

  • Jannach, D., Zanker, M., Ge, M., Gröning, M.: Recommender systems in computer science and information systems-a landscape of research. Proceedings of the 13th International Conference. EC-Web, pp. 76–87. Springer, Berlin (2012)

  • Jannach, D., Lerche, L., Gedikli, F., Bonnin, G.: What recommenders recommend-an analysis of accuracy, popularity, and sales diversity effects. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) User Modeling, Adaptation, and Personalization, pp. 25–37. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  • Knijnenburg, B.P., Willemsen, M.C., Kobsa, A.: A pragmatic procedure to support the user-centric evaluation of recommender systems. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 321–324. ACM, New York (2011)

  • Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the user experience of recommender systems. User Model. User-Adap. Inter. 22(4–5), 441–504 (2012)

    Article  Google Scholar 

  • Koers, H., Gabriel, A., Capone, R.: Executable papers in computer science go live on ScienceDirect (2013). https://www.elsevier.com/connect/executable-papers-in-computer-science-go-live-on-sciencedirect

  • Konstan, J.A., Adomavicius, G.: Toward identification and adoption of best practices in algorithmic recommender systems research. In: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, pp. 23–28. ACM, New York (2013)

  • Konstan, J., Ekstrand, M.D.: Introduction to Recommender Systems. Coursera Lecture Slides. Springer, Berlin (2015)

    Google Scholar 

  • Konstan, J.A., Riedl, J.: Recommender systems: from algorithms to user experience. User Model. User-Adapt. Inreraction 22(1–2), 1–23 (2012)

    Google Scholar 

  • Kowald, D., Lacic, E., Trattner, C.: Tagrec: towards a standardized tag recommender benchmarking framework. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media, pp. 305–307. ACM, New York (2014)

  • Langer, S., Beel, J.: The Comparability of recommender system evaluations and characteristics of Docear’s users. In: Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys). CEUR-WS, pp. 1–6 (2014)

  • Lommatzsch, A.: Real-time news recommendation using context-aware ensembles. In: Proceedings of the 36th European Conference on Information Retrieval (ECIR), pp. 51–62. Springer, New York (2014a)

  • Lommatzsch, A.: Real-time news recommendation using context-aware ensembles. PowerPoint Presentation, http://euklid.aot.tu-berlin.de/andreas/20140414__ECIR/20140414__Lommatzsch-ECIR2014.pdf (2014b)

  • Lu, Y., He, J., Shan, D., Yan, H.: Recommending citations with translation model. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 2017–2020. ACM, New York (2011)

  • Manouselis, N., Verbert, K.: Layered evaluation of multi-criteria collaborative filtering for scientific paper recommendation. In: Procedia Computer Science, pp. 1189–1197. Elsevier, New York (2013)

  • McNee, S.M. et al.: On the recommending of citations for research papers. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 116–125. ACM, New Orleans (2002). doi:10.1145/587078.587096

  • McNee, S.M., Kapoor, N., Konstan, J.A.: Don’t look stupid: avoiding pitfalls when recommending research papers. In: Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work, pp. 171–180. ACM, New York (2006)

  • McNutt, M.: Reproducibility. Science 343(6168), 229–229 (2014)

    Google Scholar 

  • Melville, P., Mooney, R.J., Nagarajan, R. Content-boosted collaborative filtering for improved recommendations. In: Proceedings of the National Conference on Artificial Intelligence, pp. 187–192. AAAI Press, Menlo Park Cambridge, MA (1999); MIT Press, London (2002)

  • Open Science Collaboration and others: Estimating the reproducibility of psychological science. Science 349, 6251 (2015). doi:10.1126/science.aac4716

    Google Scholar 

  • Pennock, D.M., Horvitz, E., Lawrence, S., Giles, CL.: Collaborative filtering by personality diagnosis: a hybrid memory-and model-based approach. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pp. 473–480. Morgan Kaufmann Publishers Inc., Burlington (2000)

  • Popper, K.: The Logic of Scientific Discovery. Hutchinson, London (1959)

    MATH  Google Scholar 

  • Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 157–164. ACM, New York (2011)

  • Pu, P., Chen, L., Hu, R.: Evaluating recommender systems from the user’s perspective: survey of the state of the art. User Model. User-Adapt. Interaction 22, 1–39 (2012)

    Article  Google Scholar 

  • Rehman, J.: Cancer research in crisis: Are the drugs we count on based on bad science? (2013). http://www.salon.com/2013/09/01/is_cancer_research_facing_a_crisis/

  • Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186. ACM, New York (1994)

  • Ricci, F., Rokach, L., Shapira, B., Paul, K.B.: Recommender Systems handbook. Springer, New York (2011)

    Book  MATH  Google Scholar 

  • Ricci, F., Rokach, L., Shapira, B., Paul, K.B.: Recommender Systems Handbook, 2nd edn. Springer, New York (2015)

    Book  MATH  Google Scholar 

  • Rich, E.: User modeling via stereotypes. Cognit. Sci. 3(4), 329–354 (1979)

    Article  Google Scholar 

  • Rothwell, P.M., Martyn, C.N.: Reproducibility of peer review in clinical neuroscience. Brain 123(9), 1964–1969 (2000)

    Article  Google Scholar 

  • Said, A.: Evaluating the Accuracy and Utility of Recommender Systems. PhD Thesis. Technische Universität Berlin (2013)

  • Said, A., Bellogin, A.: Rival: a toolkit to foster reproducibility in recommender system evaluation. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 371–372. ACM, New York (2014)

  • Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 253–260. ACM, New York (2002)

  • Schmidt, S.: Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev. Gen. Psychol. 13(2), 90 (2009)

    Article  Google Scholar 

  • Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, New York (2011)

    Chapter  Google Scholar 

  • Sharma, L.: Gera, Anju: A survey of recommendation system: research challenges. Int. J. Eng. Trends Technol. 4(5), 1989–1992 (2013)

    Google Scholar 

  • Shi, Y., Larson, M., Hanjalic, A.: Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput. Surv. 47(1), 3:1–3:45 (2014). doi:10.1145/2556270

  • Sonnenburg, S., et al.: The need for open source software in machine learning. J. Mach. Learn. Res. 8, 2443–2466 (2007)

    Google Scholar 

  • Thomas, D., Greenberg, A., Calarco, P.: Scholarly usage based recommendations: evaluating bX for a consortium presentation. http://igelu.org/wp-content/uploads/2011/09/bx_igelu_presentation_updated_september-13.pdf. (2011)

  • Torres, R., McNee, S.M., Abel, M., Konstan, J.A., Riedl, J.: Enhancing digital libraries with TechLens+. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 228–236. ACM, New York (2004)

  • Turpin, A.H., Hersh, W.: Why batch and user evaluations do not give the same results. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 225–231. ACM, New York (2001)

  • Voorhees, E.M.: TREC: improving information access through evaluation. Bull. Am. Soc. Inf. Sci. Technol. 32(1), 16–21 (2005)

    Article  Google Scholar 

  • Zarrinkalam, F., Kahani, M.: SemCiR-a citation recommendation system based on a novel semantic distance measure. Program 47(1), 92–112 (2013)

    Article  Google Scholar 

  • Zheng, H., Wang, D., Zhang, Q., Li, H., Yang, T.: Do clicks measure recommendation relevancy? An empirical user study. In: Proceedings of the fourth ACM Conference on Recommender Systems, pp. 249–252. ACM, New York (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joeran Beel.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Beel, J., Breitinger, C., Langer, S. et al. Towards reproducibility in recommender-systems research. User Model User-Adap Inter 26, 69–101 (2016). https://doi.org/10.1007/s11257-016-9174-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11257-016-9174-x

Keywords

Navigation