Learning to recommend similar items from human judgments

  • Christoph TrattnerEmail author
  • Dietmar Jannach


Similar item recommendations—a common feature of many Web sites—point users to other interesting objects given a currently inspected item. A common way of computing such recommendations is to use a similarity function, which expresses how much alike two given objects are. Such similarity functions are usually designed based on the specifics of the given application domain. In this work, we explore how such functions can be learned from human judgments of similarities between objects, using two domains of “quality and taste”—cooking recipe and movie recommendation—as guiding scenarios. In our approach, we first collect a few thousand pairwise similarity assessments with the help of crowdworkers. Using these data, we then train different machine learning models that can be used as similarity functions to compare objects. Offline analyses reveal for both application domains that models that combine different types of item characteristics are the best predictors for human-perceived similarity. To further validate the usefulness of the learned models, we conducted additional user studies. In these studies, we exposed participants to similar item recommendations using a set of models that were trained with different feature subsets. The results showed that the combined models that exhibited the best offline prediction performance led to the highest user-perceived similarity, but also to recommendations that were considered useful by the participants, thus confirming the feasibility of our approach.


Similar item recommendations Similarity measures Content-based recommender systems User studies 



  1. Adomavicius, G., Kwon, Y.: Improving aggregate recommendation diversity using ranking-based techniques. IEEE Trans. Knowl. Data Eng. 24(5), 896–911 (2012)CrossRefGoogle Scholar
  2. Allison, L., Dix, T.I.: A bit-string longest-common-subsequence algorithm. Inf. Process. Lett. 23(5), 305–310 (1986)MathSciNetCrossRefGoogle Scholar
  3. Aucouturier, J.J., Pachet, F., et al.: Music similarity measures: what’s the use? In: Proceedings of ISMIR ’02 (2002)Google Scholar
  4. Beel, J., Langer, S.: A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In: Proceedings of TPDL ’15 (2015)Google Scholar
  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. Brovman, Y.M., Jacob, M., Srinivasan, N., Neola, S., Galron, D., Snyder, R., Wang, P.: Optimizing similar item recommendations in a semi-structured marketplace to maximize conversion. In: Proceedings of RecSys ’16 (2016)Google Scholar
  7. Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical Turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)CrossRefGoogle Scholar
  8. Colucci, L., Doshi, P., Lee, K.L., Liang, J., Lin, Y., Vashishtha, I., Zhang, J., Jude, A.: Evaluating item–item similarity algorithms for movies. In: Proceedings of CHI EA ’16 (2016)Google Scholar
  9. Cremonesi, P., Garzotto, F., Turrin, R.: Investigating the persuasion potential of recommender systems from a quality perspective: an empirical study. ACM Trans. Intell. Syst. Technol. (2012). Google Scholar
  10. Deldjoo, Y., Elahi, M., Cremonesi, P., Garzotto, F., Piazzolla, P., Quadrana, M.: Content-based video recommendation system based on stylistic visual features. J. Data Semant. 5(2), 1–15 (2016)CrossRefGoogle Scholar
  11. Ebizma: Ebizma Rankings for Recipe Websites (2017). Accessed 19 April 2017
  12. Eksombatchai, C., Jindal, P., Liu, J.Z., Liu, Y., Sharma, R., Sugnet, C., Ulrich, M., Leskovec, J.: Pixie: a system for recommending 3+ billion items to 200+ million users in real-time. In: Proceedings of the Web Conference ’18 (2018)Google Scholar
  13. Ellis, D.P.W., Whitman, B., Berenzweig, A., Lawrence, S.: The quest for ground truth in musical artist similarity. In: Proceedings of ISMIR ’02 (2002)Google Scholar
  14. Elsweiler, D., Trattner, C., Harvey, M.: Exploiting food choice biases for healthier recipe recommendation. In: Proceedings of SIGIR ’17 (2017)Google Scholar
  15. Freyne, J., Berkovsky, S.: Intelligent food planning: personalized recipe recommendation. In: Proceedings of IUI ’10 (2010)Google Scholar
  16. Garcin, F., Faltings, B., Donatsch, O., Alazzawi, A., Bruttin, C., Huber, A.: Offline and online evaluation of news recommender systems at In: Proceedings of RecSys ’14 (2014)Google Scholar
  17. Gedikli, F., Jannach, D.: Improving recommendation accuracy based on item-specific tag preferences. ACM Trans. Intell. Syst. Technol. 4(1), 43–55 (2013)CrossRefGoogle Scholar
  18. Gedikli, F., Jannach, D., Ge, M.: How should I explain? A comparison of different explanation types for recommender systems. Int. J. Hum Comput Stud. 72(4), 367–382 (2014)CrossRefGoogle Scholar
  19. Golbeck, J., Hendler, J., et al.: Filmtrust: movie recommendations using trust in web-based social networks. In: Proceedings of CCNC ’06 (2006)Google Scholar
  20. Harvey, M., Ludwig, B., Elsweiler, D.: You are what you eat: learning user tastes for rating prediction. In: Proceedings of SPIRE ’13 (2013)Google Scholar
  21. Hasler, D., Suesstrunk, S.E.: Measuring colorfulness in natural images. In: Human vision and electronic imaging VIII, vol. 5007, pp. 87–96. International Society for Optics and Photonics (2003)Google Scholar
  22. Hauser, D.J., Schwarz, N.: Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants. Behav. Res. Methods 48(1), 400–407 (2016)CrossRefGoogle Scholar
  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR ’16, pp. 770–778 (2016)Google Scholar
  24. Howard, S., Adams, J., White, M., et al.: Nutritional content of supermarket ready meals and recipes by television chefs in the United Kingdom: cross sectional study. BMJ 345, e7607 (2012)CrossRefGoogle Scholar
  25. Einhorn, H.J., Kleinmuntz, D.N., Kleinmuntz, B.: Linear regression and process-tracing models of judgment. Psychol. Rev. 86, 465–485 (1979)CrossRefGoogle Scholar
  26. Jannach, D., Adomavicius, G.: Recommendations with a purpose. In: Proceedings of RecSys ’16 (2016)Google Scholar
  27. Jaro, M.A.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)CrossRefGoogle Scholar
  28. Jones, M.C., Downie, J.S., Ehmann, A.F.: Human similarity judgments: implications for the design of formal evaluations. In: Proceedings of ISMIR ’07 (2007)Google Scholar
  29. Kim, S.D., Lee, Y.J., Cho, H.G., Yoon, S.M.: Complexity and similarity of recipes based on entropy measurement. Indian J. Sci. Technol. (2016). Google Scholar
  30. Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C.: Explaining the user experience of recommender systems. User Model. User Adapt. Interact. 22(4), 441–504 (2012)CrossRefGoogle Scholar
  31. Kondrak, G.: N-gram similarity and distance. In: Proceedings of SPIRE ’05, pp. 115–126. Springer (2005)Google Scholar
  32. Kusmierczyk, T., Nørvåg, K.: Online food recipe title semantics: combining nutrient facts and topics. In: Proceedings of CIKM ’16 (2016)Google Scholar
  33. Lee, J.H.: Crowdsourcing music similarity judgments using mechanical Turk. In: Proceedings of ISMIR ’10 (2010)Google Scholar
  34. Lops, P., De Gemmis, M., Semeraro, G.: Content-based recommender systems: state of the art and trends. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook. Springer, New York (2011)Google Scholar
  35. Maksai, A., Garcin, F., Faltings, B.: Predicting online performance of news recommender systems through richer evaluation metrics. In: Proceedings of RecSys ’15 (2015)Google Scholar
  36. Messina, P., Dominguez, V., Parra, D., Trattner, C., Soto, A.: Content-based artwork recommendation: integrating painting metadata with neural and manually-engineered visual features. User Model. User Adapt. Interact. 28, 40 (2018)Google Scholar
  37. Milosavljevic, M., Navalpakkam, V., Koch, C., Rangel, A.: Relative visual saliency differences induce sizable bias in consumer choice. J. Consum. Psychol. 22(1), 67–74 (2012)CrossRefGoogle Scholar
  38. Mirizzi, R., Di Noia, T., Ragone, A., Ostuni, V.C., Di Sciascio, E.: Movie recommendation with DBpedia. In: Proceedings of IIR ’12 (2012)Google Scholar
  39. Oleszak, M.: Regularization: Ridge, lasso and elastic net (2018). Accessed June 2019
  40. O’Mahony, M.P., Smyth, B.: Learning to recommend helpful hotel reviews. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ’09, pp. 305–308 (2009)Google Scholar
  41. Ostuni, V.C., Di Noia, T., Di Sciascio, E., Mirizzi, R.: Top-n recommendations from implicit feedback leveraging linked open data. In: Proceedings of RecSys ’13 (2013)Google Scholar
  42. Peer, E., Vosgerau, J., Acquisti, A.: Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav. Res. Methods 46(4), 1023–1031 (2014)CrossRefGoogle Scholar
  43. Pu, P., Chen, L., Hu, R.: A user-centric evaluation framework for recommender systems. In: Proceedings of RecSys ’11 (2011)Google Scholar
  44. Rokicki, M., Trattner, C., Herder, E.: The impact of recipe features, social cues and demographics on estimating the healthiness of online recipes. In: Proceedings of ICWSM ’18 (2018)Google Scholar
  45. Rossetti, M., Stella, F., Zanker, M.: Contrasting offline and online results when evaluating recommendation algorithms. In: Proceedings of RecSys ’16 (2016)Google Scholar
  46. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of ICCV ’14, vol. 11, p. 2 (2011)Google Scholar
  47. San Pedro, J., Siersdorfer, S.: Ranking and classifying attractiveness of photos in folksonomies. In: Proceedings of WWW ’09 (2009)Google Scholar
  48. Sen, S., Vig, J., Riedl, J.: Tagommenders: connecting users to items through tags. In: Proceedings of WWW ’09 (2009)Google Scholar
  49. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)MathSciNetCrossRefzbMATHGoogle Scholar
  50. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
  51. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR ’16, pp. 2818–2826 (2016)Google Scholar
  52. Teng, C.Y., Lin, Y.R., Adamic, L.A.: Recipe recommendation using ingredient networks. In: Proceedings of WebSci ’12 (2012)Google Scholar
  53. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  54. Tran, T.N.T., Atas, M., Felfernig, A., Stettinger, M.: An overview of recommender systems in the healthy food domain. J. Intell. Inf. Syst. 50, 501–526 (2017)CrossRefGoogle Scholar
  55. Trattner, C., Elsweiler, D.: Food recommender systems: important contributions, challenges and future research directions (2017a). arXiv preprint arXiv:1711.02760
  56. Trattner, C., Elsweiler, D.: Investigating the healthiness of internet-sourced recipes: implications for meal planning and recommender systems. In: Proceedings of WWW ’17, pp. 489–498 (2017b)Google Scholar
  57. Trattner, C., Moesslang, D., Elsweiler, D.: On the predictability of the popularity of online recipes. EPJ Data Sci. (2018). Google Scholar
  58. Trattner, C., Kusmierczyk, T., Nørvåg, K.: Investigating and predicting online food recipe upload behavior. Inf. Process. Manag. 56(3), 654–673 (2019)CrossRefGoogle Scholar
  59. Tversky, A., Gati, I.: Studies of similarity. Cognit. Categ. 1(1978), 79–98 (1978)Google Scholar
  60. van Pinxteren, Y., Geleijnse, G., Kamsteeg, P.: Deriving a recipe similarity measure for recommending healthful meals. In: Proceedings of IUI ’11 (2011)Google Scholar
  61. Vargas, S., Castells, P.: Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of RecSys ’11 (2011)Google Scholar
  62. Vig, J., Sen, S., Riedl, J.: Tagsplanations: explaining recommendations using tags. In: Proceedings of IUI ’09, pp. 47–56 (2009)Google Scholar
  63. Wang, L., Li, Q., Li, N., Dong, G., Yang, Y.: Substructure similarity measurement in Chinese recipes. In: Proceedings of WWW ’08 (2008)Google Scholar
  64. Wang, C., Agrawal, A., Li, X., Makkad, T., Veljee, E., Mengshoel, O., Jude, A.: Content-based top-n recommendations with perceived similarity. In: Proceedings of SMC ’17 (2017)Google Scholar
  65. Yang, L., Hsieh, C.K., Yang, H., Pollak, J.P., Dell, N., Belongie, S., Cole, C., Estrin, D.: Yum-me: a personalized nutrient-based meal recommender system. ACM Trans. Inf. Syst. 36(1), 7 (2017)CrossRefGoogle Scholar
  66. Yao, Y., Harper, F.M.: Judging similarity: a user-centric study of related item recommendations. In: Proceedings of RecSys ’18 (2018)Google Scholar
  67. Yujian, L., Bo, L.: A normalized Levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)CrossRefGoogle Scholar
  68. Zhong, Y., Menezes, T.L.S., Kumar, V., Zhao, Q., Harper, F.M.: A field study of related video recommendations: newest, most similar, or most relevant? In: Proceedings of RecSys ’18 (2018)Google Scholar
  69. Ziegler, C.N., McNee, S.M., Konstan, J.A., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of WWW ’05 (2005)Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.University of BergenBergenNorway
  2. 2.University of KlagenfurtKlagenfurtAustria

Personalised recommendations