Towards reproducibility in recommender-systems research

Beel, Joeran; Breitinger, Corinna; Langer, Stefan; Lommatzsch, Andreas; Gipp, Bela

doi:10.1007/s11257-016-9174-x

Towards reproducibility in recommender-systems research

Published: 12 March 2016

Volume 26, pages 69–101, (2016)
Cite this article

User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Joeran Beel^1,5,
Corinna Breitinger^1,2,
Stefan Langer^1,3,
Andreas Lommatzsch⁴ &
…
Bela Gipp^1,5

3067 Accesses
45 Citations
9 Altmetric
Explore all metrics

Abstract

Numerous recommendation approaches are in use today. However, comparing their effectiveness is a challenging task because evaluation results are rarely reproducible. In this article, we examine the challenge of reproducibility in recommender-system research. We conduct experiments using Plista’s news recommender system, and Docear’s research-paper recommender system. The experiments show that there are large discrepancies in the effectiveness of identical recommendation approaches in only slightly different scenarios, as well as large discrepancies for slightly different approaches in identical scenarios. For example, in one news-recommendation scenario, the performance of a content-based filtering approach was twice as high as the second-best approach, while in another scenario the same content-based filtering approach was the worst performing approach. We found several determinants that may contribute to the large discrepancies observed in recommendation effectiveness. Determinants we examined include user characteristics (gender and age), datasets, weighting schemes, the time at which recommendations were shown, and user-model size. Some of the determinants have interdependencies. For instance, the optimal size of an algorithms’ user model depended on users’ age. Since minor variations in approaches and scenarios can lead to significant changes in a recommendation approach’s performance, ensuring reproducibility of experimental results is difficult. We discuss these findings and conclude that to ensure reproducibility, the recommender-system community needs to (1) survey other research fields and learn from them, (2) find a common understanding of reproducibility, (3) identify and understand the determinants that affect reproducibility, (4) conduct more comprehensive experiments, (5) modernize publication practices, (6) foster the development and use of recommendation frameworks, and (7) establish best-practice guidelines for recommender-systems research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Some of the definitions have been previously introduced by Beel (2015).
http://plista.com.
With “well-performing” we mean if one algorithm was the most effective on a particular news site, it should be the most effective algorithm on other news sites, or at least should be among the most effective.
For more details on the algorithms and the evaluation, refer to Lommatzsch (2014a, b).
Please note that the results of this section are not statically significant, and a further analysis based on more data is required.
Several suggestions are inspired from Ekstrand et al. (2011b) and Konstan and Adomavicius (2013).

References

Al-Maskari, A., Sanderson, M., Clough, P.: The relationship between IR effectiveness measures and user satisfaction. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 773–774. ACM, New York (2007)
Amatriain, X., Pujol, J., Oliver, N.: I like it. i like it not: Evaluating user ratings noise in recommender systems. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) User Modeling, Adaptation, and Personalization, pp. 247–258. Springer, Berlin (2009)
Beel, J.: Towards effective research-paper recommender systems and user modeling based on mind maps. PhD Thesis. Otto-von-Guericke Universität Magdeburg (2015)
Beel, J., Langer, S.: A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) Proceedings of the 19th International Conference on Theory and Practice of Digital Libraries (TPDL). Lecture Notes in Computer Science. 153–168 (2015). doi:10.1007/978-3-319-24592-8_12
Beel, J., Gipp, B., Shaker, A., Friedrich, N.: SciPlore Xtract: extracting titles from scientific PDF documents by analyzing style information (font size). In Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) Research and Advanced Technology for Digital Libraries. Proceedings of the 14th European Conference on Digital Libraries (ECDL’10). Lecture Notes of Computer Science (LNCS), pp. 413–416. Springer, Glasgow (2010)
Beel, J., Gipp, B., Langer, S., Genzmehr, M.: Docear: an academic literature suite for searching, organizing and creating academic literature. In: Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL). JCDL’11, pp. 465–466. ACM, New York (2011). doi:10.1145/1998076.1998188
Beel, J., Langer, S., Genzmehr, M.: Sponsored versus organic (Research Paper) recommendations and the impact of labeling. In: Aalberg, T., Dobreva, M., Papatheodorou, C., Tsakonas, G., Farrugia, C. (eds.) Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 395–399. Malta, Valletta (2013)
Google Scholar
Beel, J., Langer, S., Genzmehr, M., Gipp, B., Breitinger, C., Nürnberger, A.: Research paper recommender system evaluation: a quantitative literature survey. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys). ACM International Conference Proceedings Series (ICPS), pp. 15–22. ACM, New York (2013b). doi:10.1145/2532508.2532512
Beel, J., Langer, S., Genzmehr, M., Gipp, B., Nürnberger, A.: A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In: Proceedings of the Workshop on Reproducibility and Replication in Recommender Systems Evaluation (RepSys) at the ACM Recommender System Conference (RecSys). ACM International Conference Proceedings Series (ICPS), pp. 7–14 (2013c). doi:10.1145/2532508.2532511
Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docears PDF inspector: title extraction from PDF files. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13), pp. 443–444. ACM, New York (2013d). doi:10.1145/2467696.2467789
Beel, J., Langer, S., Genzmehr, M., Nürnberger, A.: Introducing Docear’s research paper recommender system. In: Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13), pp. 459–460. ACM, New Year (2013e). doi:10.1145/2467696.2467786
Beel, J., Langer, S., Genzmehr, M., Nürnberger, A: Persistence in recommender systems: giving the same recommendations to the same users multiple times. In: Aalberg, T., Dobreva, M., Papatheodorou, C., Tsakonas, G., Farrugia, C. (eds.) Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013). Lecture Notes of Computer Science (LNCS), pp. 390–394. Springer, Valletta (2013f)
Beel, J., Langer, S., Nürnberger, A., Genzmehr, M.: The impact of demographics (age and gender) and other user characteristics on evaluating recommender systems. In: Aalberg, T., Dobreva, M., Papatheodorou, C., Tsakonas, G., Farrugia, C. (eds.) Proceedings of the 17th International Conference on Theory and Practice of Digital Libraries (TPDL 2013), pp. 400–404. Springer, Valletta (2013)
Google Scholar
Beel, J., Langer, S., Genzmehr, M., Gipp, B.: Utilizing mind-maps for information retrieval and user modelling. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, G.-J. (eds.) Proceedings of the 22nd Conference on User Modelling, Adaption, and Personalization (UMAP). Lecture Notes in Computer Science, pp. 301–313. Springer, Berlin (2014a). doi:10.1007/978-3-319-08786-3_26
Beel, J., Langer, S., Gipp, B., Nürnberger, A.: The architecture and datasets of Docear’s Research paper recommender system. D-Lib Magazine 20, 11/12 (2014b). doi:10.1045/november14-beel
Beel, J., Gipp, B., Langer, S., Breitinger, C.: Research paper recommender systems: a literature survey. Int. J. Digital Libr. 1–34 (2015a). doi:10.1007/s00799-015-0156-0
Beel, J., Langer, S., Kapitsaki, G.M., Breitinger, C., Gipp, B.: Exploring the potential of user modeling based on mind maps. In: Ricci, F., Bontcheva, K., Conlan, O., Lawless, S. (eds.) Proceedings of the 23rd Conference on User Modelling, Adaptation and Personalization (UMAP). Lecture Notes of Computer Science, pp. 3–17. Springer, Berlin (2015b). doi:10.1007/978-3-319-20267-9_1
Bellogin, A., Castells, P., Said, A., Tikk, D.: Report on the workshop on reproducibility and replication in recommender systems evaluation (RepSys). In: ACM SIGIR forum, pp. 29–35. ACM, New York (2014)
Bethard, S., Jurafsky, D.: Who should I cite: learning literature search models from citation behavior. In: Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 609–618. ACM, New York (2010)
Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Recommender systems survey. Knowl.-Based Syst. 46(2013), 109–132 (2013)
Article Google Scholar
Bogers, T., van den Bosch, A.: Comparing and evaluating information retrieval algorithms for news recommendation. In: RecSys’07, pp. 141–144. ACM, Minneapolis (2007)
Bogers, T., van den Bosch, A.: Recommending scientific articles using citeulike. In: Proceedings of the 2008 ACM conference on Recommender systems, pp. 287–290. ACM, New York (2008)
Bollen, J., Rocha, L.M.: An adaptive systems approach to the implementation and evaluation of digital library recommendation systems. In: Proceedings of the 4th European Conference on Digital Libraries, pp. 356–359. Springer, Berlin (2000)
Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th conference on Uncertainty in Artificial Intelligence, pp. 43–52. Microsoft Research (1998)
Buckheit, J.B., Donoho, D.L.: Wavelab and reproducible research. Wavelets and Statistics. Lecture Notes in Statistics, pp. 55–81. Springer, Berlin (1995)
Burns, A.C., Bush, R.F.: Marketing Research, 7th edn. Prentice Hall, Upper Saddle River (2013)
Google Scholar
Casadevall, A., Fang, F.C.: Reproducible science. Infect. Immun. 78(12), 4972–4975 (2010)
Article Google Scholar
CiteULike. My Top Recommendations. Webpage (http://www.citeulike.org/profile/joeran/recommendations) (2011)
Cremonesi, P., Garzotto, F., Negro, S., Papadopoulos, A.V., Turrin, R.: Looking for “good” recommendations: a comparative evaluation of recommender systems. In: Human-Computer Interaction-INTERACT 2011, pp. 152–168. Springer, Berlin (2011)
Cremonesi, P., Garzotto, F., Turrin, R.: Investigating the persuasion potential of recommender systems from a quality perspective: An empirical study. ACM Trans. Interact. Intell. Syst. 2(2), 1–11 (2012)
Article Google Scholar
Davies, M.: Concept mapping, mind mapping and argument mapping: what are the differences and do they matter? Hig. Educ. 62(3), 279–301 (2011)
Article Google Scholar
Deyo, R.A., Diehr, P., Patrick, D.L.: Reproducibility and responsiveness of health status measures statistics and strategies for evaluation. Control. Clin. Trials 12(4), S142–S158 (1991)
Article Google Scholar
Dong, R., Tokarchuk, L., Ma, A.: Digging friendship: paper recommendation in social network. In: Proceedings of Networking & Electronic Commerce Research Conference (NAEC 2009), pp. 21–28 (2009)
Domingues Garcia, R., Bender, M., Anjorin, M., Rensing, C., Steinmetz, R.: FReSET: an evaluation framework for folksonomy-based recommender systems. In: Proceedings of the 4th ACM RecSys Workshop on Recommender Systems and the Social Web, pp. 25–28. ACM, New York (2012)
Downing, S.M.: Reliability: on the reproducibility of assessment data. Med. Educ. 38(9), 1006–1012 (2004)
Article Google Scholar
Drummond, C.: Replicability is not reproducibility: nor is it good science. In: Proceedings of the Evaluation Methods for MachineLearning Workshop at the 26th ICML (2009)
Ekstrand, M.D., Kannan, P., Stemper, J.A., Butler, J.T., Konstan, J.A., Riedl, J.T.: Automatically building research reading lists. In Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 159–166. ACM, New York (2010)
Eckart de Castilho, R., Gurevych, I.: A lightweight framework for reproducible parameter sweeping in information retrieval. In: Proceedings of the 2011 Workshop on Data InfrastructurEs for Supporting Information Retrieval Evaluation, pp. 7–10. ACM, New York (2011)
Ekstrand, M.D., Ludwig, M., Kolb, J., Riedl, J.T.: LensKit: a modular recommender framework. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 349–350. ACM, New York (2011a)
Ekstrand, M.D., Ludwig, M., Konstan, J.A., Riedl, J.T.: Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 133–140. ACM, New York (2011b)
Felfernig, A., Jeran, M., Ninaus, G., Reinfrank, F., Reiterer, S.: Toward the next generation of recommender systems: applications and research challenges. In: Multimedia Services in Intelligent Environments, pp. 81–98. Springer, Berlin (2013)
Flyvbjerg, B.: Making Social Science Matter: Why Social Inquiry Fails and How It Can Succeed Again. Cambridge University Press, Cambridge (2001)
Book Google Scholar
Gantner, Z., Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: MyMediaLite: a free recommender system library. In Proceedings of the Fifth ACM Conference on Recommender Systems, pp. 305–308. ACM, New York (2011)
Ge, M., Delgado-Battenfeld, C., Jannach, D.: Beyond accuracy: evaluating recommender systems by coverage and serendipity. In: Proceedings of the Fourth ACM Conference on Recommender Systems, pp. 257–260. ACM, New York (2010)
Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)
MathSciNet MATH Google Scholar
Guo, G., Zhang, J., Sun, Z., Yorke-Smith, N.: Librec: a java library for recommender systems. In: Posters, Demos, Late-breaking Results and Workshop Proceedings of the 23rd International Conference on User Modeling, Adaptation and Personalization (2015)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hahsler, M.: Recommenderlab: a framework for developing and testing recommendation algorithms. (2011). https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
Hawking, D., Craswell, N., Thistlewaite, P., Harman, D.: Results and challenges in web search evaluation. Comput. Netw. 31(11), 1321–1330 (1999)
Article Google Scholar
Hayes, C., Massa, P., Avensani, P., Cunningham, P.: An on-line evaluation framework for recommender systems. In: Proceedings of the AH’2002 Workshop on Recommendation and Personalization in eCommerce. Department of Computer Science, Trinity College Dublin (2002)
He, Q., Pei, J., Kifer, D., Mitra, P., Giles, L.: Context-aware citation recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 421–430. ACM, New York (2010)
He, J., Nie, J.-Y., Lu, Y., Zhao, W.X.: Position-aligned translation model for citation recommendation. In: Proceedings of the 19th International Conference on String Processing and Information Retrieval, pp. 251–263. Springer, Berlin (2012)
Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. 22(1), 5–53 (2004)
Article Google Scholar
Hersh, W. et al.: Do batch and user evaluations give the same results? In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 17–24. ACM, New York (2000a)
Hersh, W.R. et al.: Further analysis of whether batch and user evaluations give the same results with a question-answering task. In: Proceedings of the Ninth Text REtrieval Conference (TREC 9), pp. 16–25 (2000b)
Hoeymans, N., Wouters, E.R.C.M., Feskens, E.J.M., van den Bos, G.A.M., Kromhout, D.: Reproducibility of performance-based and self-reported measures of functional status. J. Gerontol. Ser. A 52(6), M363–M368 (1997)
Article Google Scholar
Hofmann, K., Schuth, A., Bellogin, A., de Rijke, M.: Effects of position bias on click-based recommender evaluation. In Advances in Information Retrieval, pp. 624–630. Springer, Berlin (2014)
Holland, B., Holland, L., Davies, J.: An investigation into the concept of mind mapping and the use of mind mapping software to support and improve student academic performance, pp. 89–94. Centre for Learning and Teaching - Learning and Teaching Project Report, University of Wolverhampton, Hollande (2004)
Huang, W., Kataria, S., Caragea, C., Mitra, P., Giles, C.L., Rokach, L.: Recommending citations: translating papers into references. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Managementpp. 1910–1914. ACM, New York (2012)
Jannach, D.: Recommender systems: an introduction. Lecture Slides (PhD School 2014) (2014)
Jannach, D., Zanker, M., Ge, M., Gröning, M.: Recommender systems in computer science and information systems-a landscape of research. Proceedings of the 13th International Conference. EC-Web, pp. 76–87. Springer, Berlin (2012)
Jannach, D., Lerche, L., Gedikli, F., Bonnin, G.: What recommenders recommend-an analysis of accuracy, popularity, and sales diversity effects. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) User Modeling, Adaptation, and Personalization, pp. 25–37. Springer, Heidelberg (2013)
Chapter Google Scholar
Knijnenburg, B.P., Willemsen, M.C., Kobsa, A.: A pragmatic procedure to support the user-centric evaluation of recommender systems. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 321–324. ACM, New York (2011)
Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the user experience of recommender systems. User Model. User-Adap. Inter. 22(4–5), 441–504 (2012)
Article Google Scholar
Koers, H., Gabriel, A., Capone, R.: Executable papers in computer science go live on ScienceDirect (2013). https://www.elsevier.com/connect/executable-papers-in-computer-science-go-live-on-sciencedirect
Konstan, J.A., Adomavicius, G.: Toward identification and adoption of best practices in algorithmic recommender systems research. In: Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, pp. 23–28. ACM, New York (2013)
Konstan, J., Ekstrand, M.D.: Introduction to Recommender Systems. Coursera Lecture Slides. Springer, Berlin (2015)
Google Scholar
Konstan, J.A., Riedl, J.: Recommender systems: from algorithms to user experience. User Model. User-Adapt. Inreraction 22(1–2), 1–23 (2012)
Google Scholar
Kowald, D., Lacic, E., Trattner, C.: Tagrec: towards a standardized tag recommender benchmarking framework. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media, pp. 305–307. ACM, New York (2014)
Langer, S., Beel, J.: The Comparability of recommender system evaluations and characteristics of Docear’s users. In: Proceedings of the Workshop on Recommender Systems Evaluation: Dimensions and Design (REDD) at the 2014 ACM Conference Series on Recommender Systems (RecSys). CEUR-WS, pp. 1–6 (2014)
Lommatzsch, A.: Real-time news recommendation using context-aware ensembles. In: Proceedings of the 36th European Conference on Information Retrieval (ECIR), pp. 51–62. Springer, New York (2014a)
Lommatzsch, A.: Real-time news recommendation using context-aware ensembles. PowerPoint Presentation, http://euklid.aot.tu-berlin.de/andreas/20140414__ECIR/20140414__Lommatzsch-ECIR2014.pdf (2014b)
Lu, Y., He, J., Shan, D., Yan, H.: Recommending citations with translation model. In: Proceedings of the 20th ACM international conference on Information and knowledge management, pp. 2017–2020. ACM, New York (2011)
Manouselis, N., Verbert, K.: Layered evaluation of multi-criteria collaborative filtering for scientific paper recommendation. In: Procedia Computer Science, pp. 1189–1197. Elsevier, New York (2013)
McNee, S.M. et al.: On the recommending of citations for research papers. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 116–125. ACM, New Orleans (2002). doi:10.1145/587078.587096
McNee, S.M., Kapoor, N., Konstan, J.A.: Don’t look stupid: avoiding pitfalls when recommending research papers. In: Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work, pp. 171–180. ACM, New York (2006)
McNutt, M.: Reproducibility. Science 343(6168), 229–229 (2014)
Google Scholar
Melville, P., Mooney, R.J., Nagarajan, R. Content-boosted collaborative filtering for improved recommendations. In: Proceedings of the National Conference on Artificial Intelligence, pp. 187–192. AAAI Press, Menlo Park Cambridge, MA (1999); MIT Press, London (2002)
Open Science Collaboration and others: Estimating the reproducibility of psychological science. Science 349, 6251 (2015). doi:10.1126/science.aac4716
Google Scholar
Pennock, D.M., Horvitz, E., Lawrence, S., Giles, CL.: Collaborative filtering by personality diagnosis: a hybrid memory-and model-based approach. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pp. 473–480. Morgan Kaufmann Publishers Inc., Burlington (2000)
Popper, K.: The Logic of Scientific Discovery. Hutchinson, London (1959)
MATH Google Scholar
Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of the fifth ACM Conference on Recommender Systems, pp. 157–164. ACM, New York (2011)
Pu, P., Chen, L., Hu, R.: Evaluating recommender systems from the user’s perspective: survey of the state of the art. User Model. User-Adapt. Interaction 22, 1–39 (2012)
Article Google Scholar
Rehman, J.: Cancer research in crisis: Are the drugs we count on based on bad science? (2013). http://www.salon.com/2013/09/01/is_cancer_research_facing_a_crisis/
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186. ACM, New York (1994)
Ricci, F., Rokach, L., Shapira, B., Paul, K.B.: Recommender Systems handbook. Springer, New York (2011)
Book MATH Google Scholar
Ricci, F., Rokach, L., Shapira, B., Paul, K.B.: Recommender Systems Handbook, 2nd edn. Springer, New York (2015)
Book MATH Google Scholar
Rich, E.: User modeling via stereotypes. Cognit. Sci. 3(4), 329–354 (1979)
Article Google Scholar
Rothwell, P.M., Martyn, C.N.: Reproducibility of peer review in clinical neuroscience. Brain 123(9), 1964–1969 (2000)
Article Google Scholar
Said, A.: Evaluating the Accuracy and Utility of Recommender Systems. PhD Thesis. Technische Universität Berlin (2013)
Said, A., Bellogin, A.: Rival: a toolkit to foster reproducibility in recommender system evaluation. In: Proceedings of the 8th ACM Conference on Recommender Systems, pp. 371–372. ACM, New York (2014)
Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 253–260. ACM, New York (2002)
Schmidt, S.: Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev. Gen. Psychol. 13(2), 90 (2009)
Article Google Scholar
Shani, G., Gunawardana, A.: Evaluating recommendation systems. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) Recommender Systems Handbook, pp. 257–297. Springer, New York (2011)
Chapter Google Scholar
Sharma, L.: Gera, Anju: A survey of recommendation system: research challenges. Int. J. Eng. Trends Technol. 4(5), 1989–1992 (2013)
Google Scholar
Shi, Y., Larson, M., Hanjalic, A.: Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput. Surv. 47(1), 3:1–3:45 (2014). doi:10.1145/2556270
Sonnenburg, S., et al.: The need for open source software in machine learning. J. Mach. Learn. Res. 8, 2443–2466 (2007)
Google Scholar
Thomas, D., Greenberg, A., Calarco, P.: Scholarly usage based recommendations: evaluating bX for a consortium presentation. http://igelu.org/wp-content/uploads/2011/09/bx_igelu_presentation_updated_september-13.pdf. (2011)
Torres, R., McNee, S.M., Abel, M., Konstan, J.A., Riedl, J.: Enhancing digital libraries with TechLens+. In: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 228–236. ACM, New York (2004)
Turpin, A.H., Hersh, W.: Why batch and user evaluations do not give the same results. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 225–231. ACM, New York (2001)
Voorhees, E.M.: TREC: improving information access through evaluation. Bull. Am. Soc. Inf. Sci. Technol. 32(1), 16–21 (2005)
Article Google Scholar
Zarrinkalam, F., Kahani, M.: SemCiR-a citation recommendation system based on a novel semantic distance measure. Program 47(1), 92–112 (2013)
Article Google Scholar
Zheng, H., Wang, D., Zhang, Q., Li, H., Yang, T.: Do clicks measure recommendation relevancy? An empirical user study. In: Proceedings of the fourth ACM Conference on Recommender Systems, pp. 249–252. ACM, New York (2010)

Download references

Author information

Authors and Affiliations

Docear, Konstanz, Germany
Joeran Beel, Corinna Breitinger, Stefan Langer & Bela Gipp
School of Computer Science, Physics and Mathematics, Linnaeus University, 351 95, Växjö, Sweden
Corinna Breitinger
Department of Computer Science, Otto-von-Guericke University, 39106, Magdeburg, Germany
Stefan Langer
DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7, 10587, Berlin, Germany
Andreas Lommatzsch
Department of Information Science, Konstanz University, Universitätsstraße 10, 78464, Konstanz, Germany
Joeran Beel & Bela Gipp

Authors

Joeran Beel
View author publications
You can also search for this author in PubMed Google Scholar
Corinna Breitinger
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Langer
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Lommatzsch
View author publications
You can also search for this author in PubMed Google Scholar
Bela Gipp
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joeran Beel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beel, J., Breitinger, C., Langer, S. et al. Towards reproducibility in recommender-systems research. User Model User-Adap Inter 26, 69–101 (2016). https://doi.org/10.1007/s11257-016-9174-x

Download citation

Received: 04 January 2015
Accepted: 10 February 2016
Published: 12 March 2016
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11257-016-9174-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards reproducibility in recommender-systems research

Abstract

Access this article

Similar content being viewed by others

Data quality of platforms and panels for online behavioral research

PRISMA-S: an extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews

Survey-software implicit association tests: A methodological and empirical analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards reproducibility in recommender-systems research

Abstract

Access this article

Similar content being viewed by others

Data quality of platforms and panels for online behavioral research

PRISMA-S: an extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews

Survey-software implicit association tests: A methodological and empirical analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation