Generating post hoc review-based natural language justifications for recommender systems


In this article, we present a framework to build post hoc natural language justifications that supports the suggestions generated by a recommendation algorithm. Our methodology is based on the intuition that reviews’ excerpts contain much relevant information that can be used to justify a recommendation; thus, we propose a black-box explanation strategy that takes as input a recommended item and a set of reviews and builds as output a post hoc natural language justification which is completely independent of the underlying recommendation model. To validate our claims, we also introduce three different implementations of our conceptual framework: the first one uses natural language processing and sentiment analysis techniques to identify relevant and distinguishing aspects discussed in the reviews and combines reviews’ excerpts mentioning these aspects in a natural language justification which is presented to the target user. The second implementation extends the first one by introducing automatic aspect extraction and text summarization, which are exploited to generate a unique synthesis presenting the main characteristics of the item that is used as justification. Finally, the third implementation tackles the problem of generating a context-aware justification, that is to say, a justification that differs on varying of the different contextual situations, by automatically learning a lexicon for each contextual setting and by using such a lexicon to diversify the justifications. In the experimental evaluation, we carried out three user studies in different domains, and the results showed that our methodology is able to make the recommendation process more transparent, engaging and trustful for the users, thus confirming the validity of the intuitions behind this work.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

    It is worth to note that this aspect is neither investigated nor evaluated in the current research, and it is left for future work.

  6. 6.

    As previously stated, we will investigate the algorithmic-independent nature of the methodology as a future work, by evaluating the effectiveness of the justifications on varying of the different algorithms used to generate the recommendations.

  7. 7.

    A discussion about the available grammatical categories is out of scope to this paper. However, one of the most popular tagsets, the Penn Treebank tagset, contains 36 categories for words. We suggest to refer to Marcus et al. (1994) for further reading about POS tagging.

  8. 8.

    Sentences conveying a neutral opinion were ignored. More details about the sentiment analysis algorithm used in the pipeline are provided in the next section.

  9. 9.

    As the IDF calculates the number of documents that contain a term, the IAF counts the number of items in which at least one review discusses the aspect a. The lower the number, the higher the IAF score.

  10. 10.

    In the experimental evaluation, we compared two different templates including different number of aspects. For the sake of simplicity, we just report one of them since they differ in a small portion of the grammar.

  11. 11.

  12. 12.

  13. 13.

    Actually, this problem is common to almost every approach to generate explanations that has been presented in the literature so far.

  14. 14.

    Actually, one of the experiments discussed in the next section evaluates the impact of more complex generation strategies, that is to say, text summarization techniques. In this case, we decided to use a basic template-based structure instead of a more sophisticated generation in order to have a higher control and understanding on the outcomes of the experiments. Rather than changing two modules, that is to say, Aspect Extraction and Generation, we preferred to maintain the same generation phase of the NLP-Pipeline and we just changed the strategy for Aspect Extraction and Ranking by making it context-aware.

  15. 15.

  16. 16.

  17. 17. - Only the reviews available in the ‘Movies and TV’ and ‘Books’ categories were downloaded.

  18. 18.

    Acronym for Amazon Standard Identification Number -

  19. 19.

  20. 20.

  21. 21.

  22. 22.

  23. 23.

    The platform is available online and the experimental protocol can be still run.

  24. 24.

  25. 25.

  26. 26.

    Of course, as future work, we will plan to evaluate the effectiveness of context-aware justifications for Movie and Books recommendation, as well.

  27. 27. - available only in Italian.

  28. 28.

  29. 29.

  30. 30.

  31. 31.


  1. Adomavicius, G., Tuzhilin, A.: Context-aware recommender systems. In: Recommender Systems Handbook, pp. 217–253. Springer (2011)

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. Springer, Berlin (2007)

    Google Scholar 

  3. Balazs, J.A., Velásquez, J.D.: Opinion mining and information fusion: a survey. Inf. Fusion 27, 95–110 (2016)

    Article  Google Scholar 

  4. Baral, R., Zhu, X., Iyengar, S., Li, T.: ReEL: Review-aware explanation of location recommendation. In: Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization, pp. 23–32. ACM (2018)

  5. Basile, P., Novielli, N.: Uniba: Sentiment analysis of English tweets combining micro-blogging, lexicon and semantic features. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp. 595–600 (2015)

  6. Bilgic, M., Mooney, R.J.: Explaining Recommendations: Satisfaction vs. Promotion. In: Beyond Personalization, IUI WS, vol. 5 (2005)

  7. Biran, O., Cotton, C.: Explanation and Justification in Machine Learning: A Survey. In: IJCAI-17 Workshop on Explainable AI (XAI), p. 8 (2017)

  8. Bizer, C.: The emerging web of linked data. IEEE Intel. Syst. 24(5), 87–92 (2009)

    Article  Google Scholar 

  9. Chang, S., Harper, F.M., Terveen, L.G.: Crowd-based personalized natural language explanations for recommendations. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 175–182. ACM (2016)

  10. Chen, C., Zhang, M., Liu, Y., Ma, S.: Neural attentional rating regression with review-level explanations. In: Proceedings of the 2018 World Wide Web Conference, pp. 1583–1592. International World Wide Web Conferences Steering Committee (2018)

  11. Chen, G., Chen, L.: Augmenting service recommender systems by incorporating contextual opinions from user reviews. User Model. User Adapt. Interaction 25(3), 295–329 (2015)

    Article  Google Scholar 

  12. Chen, L., Chen, G., Wang, F.: Recommender systems based on user reviews: the state of the art. User Model. User Adapt. Interaction 25(2), 99–154 (2015)

    Article  Google Scholar 

  13. Chen, L., Wang, F.: Explaining recommendations based on feature sentiments in product reviews. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 17–28. ACM (2017)

  14. Coyle, M., Smyth, B.: Explaining search results. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI’05, pp. 1553–1555. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2005)

  15. Cramer, H., Evers, V., Ramlal, S., Van Someren, M., Rutledge, L., Stash, N., Aroyo, L., Wielinga, B.: The effects of transparency on trust and acceptance of a content-based art recommender. User Model. User Adapt. Interaction 18(5), 455–496 (2008)

    Article  Google Scholar 

  16. De Filippo, A., Lombardi, M., Milano, M.: Non-linear optimization of business models in the electricity market. In: International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pp. 81–97. Springer (2016)

  17. Friedrich, G., Zanker, M.: A taxonomy for generating explanations in recommender systems. AI Mag. 32(3), 90–98 (2011)

    Article  Google Scholar 

  18. Gedikli, F., Jannach, D., Ge, M.: How should i explain? a comparison of different explanation types for recommender systems. Int. J. Hum. Comput. Stud. 72(4), 367–382 (2014)

    Article  Google Scholar 

  19. Goodman, B., Flaxman, S.: European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”. arXiv preprint arXiv:1606.08813 (2016)

  20. Guha, R., Gupta, V., Raghunathan, V., Srikant, R.: User Modeling for a personal assistant. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 275–284. ACM (2015)

  21. Haveliwala, T.H.: Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)

    Article  Google Scholar 

  22. He, X., Chen, T., Kan, M.Y., Chen, X.: Trirank: Review-aware explainable recommendation by modeling aspects. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1661–1670. ACM (2015)

  23. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining Collaborative Filtering Recommendations. In: CSCW, pp. 241–250 (2000)

  24. Hernández-Rubio, M., Cantador, I., Bellogín, A.: A Comparative analysis of recommender systems based on item aspect opinions extracted from user reviews. User Modeling and User-Adapted Interaction, pp. 1–61 (2018)

  25. Jannach, D., Zanker, M., Felfernig, A., Friedrich, G.: Recommender Systems: An Introduction. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  26. Johnson, H., Johnson, P.: Explanation facilities and interactive systems. In: Proceedings of the 1st International Conference on Intelligent user Interfaces, pp. 159–166. ACM (1993)

  27. Knijnenburg, B., Bostandjiev, S., O’Donovan, J., Kobsa, A.: Inspectability and control in social recommenders. RecSys 2012, 43–50 (2012)

    Article  Google Scholar 

  28. Knijnenburg, B., Willemsen, M.: Evaluating recommender systems with user experiments. In: Recommender Systems Handbook, pp. 309–352. Springer (2015)

  29. Kullback, S., Leibler, R.A.: On information and sufficiency. Annals Math. Stat. 22(1), 79–86 (1951)

    MathSciNet  Article  Google Scholar 

  30. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Mining Text Data, pp. 415–463. Springer (2012)

  31. Lu, Y., Dong, R., Smyth, B.: Why i like it: multi-task learning for recommendation and explanation. In: Proceedings of the 12th ACM Conference on Recommender Systems, pp. 4–12 (2018)

  32. Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The penn treebank: annotating predicate argument structure. In: Proceedings of the workshop on Human Language Technology, pp. 114–119. Association for Computational Linguistics (1994)

  33. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)

  34. Misztal, J., Indurkhya, B.: Explaining contextual recommendations: Interaction design study and prototype implementation. In: IntRS@ RecSys, pp. 13–20 (2015)

  35. Muhammad, K.I., Lawlor, A., Smyth, B.: A live-user study of opinionated explanations for recommender systems. In: Proceedings of the 21st International Conference on Intelligent User Interfaces, pp. 256–260. ACM (2016)

  36. Musto, C., de Gemmis, M., Semeraro, G., Lops, P.: A multi-criteria recommender system exploiting aspect-based sentiment analysis of users’ reviews. In: Proceedings of the eleventh ACM conference on recommender systems, pp. 321–325 (2017)

  37. Musto, C., Narducci, F., Lops, P., De Gemmis, M., Semeraro, G.: ExpLOD: a framework for explaining recommendations based on the linked open data cloud. In: Proceedings of the 10th ACM Conference on Recommender Systems, RecSys ’16, pp. 151–154. ACM, New York, NY, USA (2016).

  38. Musto, C., Narducci, F., Lops, P., de Gemmis, M., Semeraro, G.: Linked open data-based explanations for transparent recommender systems. Int. J. Hum. Comput. Stud. 121, 93–107 (2019)

    Article  Google Scholar 

  39. Musto, C., Rossiello, G., de Gemmis, M., Lops, P., Semeraro, G.: Combining text summarization and aspect-based sentiment analysis of users’ reviews to justify recommendations. In: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 383–387 (2019)

  40. Nakagawa, H., Mori, T.: A simple but powerful automatic term extraction method. In: Coling 2002: Second International Workshop on Computational Terminology, Vol. 14, pp. 1–7. Association for Computational Linguistics (2002)

  41. Nunes, I., Jannach, D.: A systematic review and taxonomy of explanations in decision support and recommender systems. User Model. User Adapt. Interaction 27(3–5), 393–444 (2017)

    Article  Google Scholar 

  42. Qiu, F., Cho, J.: Automatic identification of user interest for personalized search. In: Proceedings of the 15th International Conference on World Wide Web, pp. 727–736. ACM (2006)

  43. Radev, D.R., Jing, H., Sty, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)

    Article  Google Scholar 

  44. Rossiello, G., Basile, P., Semeraro, G.: Centroid-based Text Summarization through Compositionality of Word Embeddings. In: G. Giannakopoulos, E. Lloret, J.M. Conroy, J. Steinberger, M. Litvak, P.A. Rankel, B. Favre (eds.) Proceedings of the Workshop on Summarization and Summary Evaluation Across Source Types and Genres, MultiLing@EACL 2017, Valencia, Spain, April 3, 2017, pp. 12–21. Association for Computational Linguistics (2017).

  45. Schilit, B.N., Adams, N., Want, R., et al.: Context-aware Computing Applications. Xerox Corporation, Palo Alto Research Center, Palo Alto (1994)

    Google Scholar 

  46. Sinha, R., Swearingen, K.: The role of transparency in recommender systems. In: CHI’02 Extended Abstracts on Human Factors in Computing Systems, pp. 830–831. ACM (2002)

  47. Suglia, A., Greco, C., Musto, C., De Gemmis, M., Lops, P., Semeraro, G.: A deep architecture for content-based recommendations exploiting recurrent neural networks. In: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pp. 202–211 (2017)

  48. Symeonidis, P., Nanopoulos, A., Manolopoulos, Y.: MoviExplain: a recommender system with explanations. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 317–320. ACM (2009)

  49. Tintarev, N., Masthoff, J.: A survey of explanations in recommender systems. In: 2007 IEEE 23rd International Conference on Data Engineering Workshop, pp. 801–810. IEEE (2007)

  50. Tintarev, N., Masthoff, J.: Evaluating the effectiveness of explanations for recommender systems. UMUAI 22(4–5), 399–439 (2012)

    Google Scholar 

  51. Vig, J., Sen, S., Riedl, J.: Tagsplanations: explaining recommendations using tags. In: Proceedings of the 14th International Conference on Intelligent User Interfaces, pp. 47–56. ACM (2009)

Download references

Author information



Corresponding author

Correspondence to Cataldo Musto.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Musto, C., de Gemmis, M., Lops, P. et al. Generating post hoc review-based natural language justifications for recommender systems. User Model User-Adap Inter (2020).

Download citation


  • Recommender systems
  • Explanation
  • Text summarization
  • Natural language processing
  • Sentiment analysis