Multimodal data fusion framework based on autoencoders for top-N recommender systems

  • Felipe L. A. Conceiç aoEmail author
  • Flávio L. C. Pádua
  • Anisio Lacerda
  • Adriano C. Machado
  • Daniel H. Dalip


In this paper, we present a novel multimodal framework for video recommendation based on deep learning. Unlike most common solutions, we formulate video recommendations by exploiting simultaneously two data modalities, particularly: (i) the visual (i.e., image sequence) and (ii) the textual modalities, which in conjunction with the audio stream constitute the elementary data of a video document. More specifically, our framework firstly describe textual data by using the bag-of-words and TF-IDF models, fusing those features with deep convolutional descriptors extracted from the visual data. As result, we obtain a multimodal descriptor for each video document, from which we construct a low-dimensional sparse representation by using autoencoders. To qualify the recommendation task, we extend a sparse linear method with side information (SSLIM), by taking into account the sparse representations of video descriptors previously computed. By doing this, we are able to produce a ranking of the top-N most relevant videos to the user. Note that our framework is flexible, i.e., one may use other types of modalities, autoencoders, and fusion architectures. Experimental results obtained on three real datasets (MovieLens-1M, MovieLens-10M and Vine), containing 3,320, 8,400 and 18,576 videos, respectively, show that our framework can improve up to 60.6% the recommendation results, when compared to a single modality recommendation model and up to 31%, when compared to state-of-the art methods used as baselines in our study, demonstrating the effectiveness of our framework and highlighting the usefulness of multimodal information in recommender system.


Recommender systems Autoencoders Data fusion Multimodal representation 



The authors would like to thank the support of CNPq under Procs. 307510/2017-4, 313163/2014-6, 431458/2016-2 and 309291/2017-8, FAPEMIG under Procs. PPM-00542-15, APQ-03445-16 and FAPEMIG-PRONEX-MASWeb, Models, Algorithms and Systems for the Web under Proc. APQ-01400-14, CEFET-MG and CAPES.


  1. 1.
    Ahmed M, Imtiaz MT, Khan R (2018) Movie recommendation system using clustering and pattern recognition network. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, pp 143–147Google Scholar
  2. 2.
    Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New YorkGoogle Scholar
  3. 3.
    Beel J, Gipp B, Langer S, Breitinger C (2015) Research-paper recommender systems: a literature survey. International Journal on Digital Libraries, pp 1–34.
  4. 4.
    Beutel A, Covington P, Jain S, Xu C, Li J, Gatto V, Chi EH (2018) Latent cross: Making use of context in recurrent recommender systems. In: Proceedings of the eleventh ACM international conference on Web search and data mining. ACM, pp 46–54Google Scholar
  5. 5.
    Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012) Collaborative filtering based on significances. Inf Sci 185(1):1–17 ., CrossRefGoogle Scholar
  6. 6.
    Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl-Based Syst 46:109–132CrossRefGoogle Scholar
  7. 7.
    Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, pp 7–10Google Scholar
  8. 8.
    Christakou C, Vrettos S, Stafylopatis A (2007) A Hybrid Movie Recommender System Based on Neural Networks. Int J Artif Intell Tools 16(05):771–792. CrossRefGoogle Scholar
  9. 9.
    da Conceiċão F L A, Pádua F L C, Machado AC, Lacerda AM, Dalip DH (2016) Metodologia para recomendaċão de vídeos baseada em descritores de conteúdo visuais e textuais. Tendências da Pesquisa Brasileira em Ciência da Informaċão 9(1):208–225Google Scholar
  10. 10.
    Covington P, Adams J, Sargin E (2016) Deep neural networks for youtube recommendations. In: Proceedings of the 10th ACM conference on recommender systems. ACM, pp 191–198Google Scholar
  11. 11.
    Cremonesi P, Koren Y, Turrin R (2010) Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the fourth ACM conference on recommender systems, RecSys’10. ACM, New York, pp 39–46Google Scholar
  12. 12.
    Cunningham JP, Byron MY (2014) Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17(11):1500–1509CrossRefGoogle Scholar
  13. 13.
    Davidson J, Livingston B, Sampath D, Liebald B, Liu J, Nandy P, Van Vleet T, Gargi U, Gupta S, He Y, Lambert M (2010) The YouTube video recommendation system. Proceedings of the fourth ACM conference on Recommender systems - RecSys ’10, p 293,,
  14. 14.
    Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems, pp 3844–3852Google Scholar
  15. 15.
    Deldjoo Y, Quadrana M, Elahi M, Cremonesi P (2017) Using mise-en-scène visual features based on mpeg-7 and deep learning for movie recommendation. arXiv:170406109
  16. 16.
    Deng Z, Yan M, Sang J, Xu C (2015) Twitter is faster: personalized time-aware video recommendation from Twitter to YouTube. ACM Trans Multimed Comput Commun Appl (TOMM) 11(2):31Google Scholar
  17. 17.
    Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177CrossRefGoogle Scholar
  18. 18.
    Fan Y, Wang Y, Yu H, Liu B (2017) Movie recommendation based on visual features of trailers. In: International conference on innovative mobile and internet services in ubiquitous computing, Springer, pp 242–253Google Scholar
  19. 19.
    Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, CambridgezbMATHGoogle Scholar
  20. 20.
    He R, McAuley J (2016) Vbpr: Visual bayesian personalized ranking from implicit feedback. In: AAAI, pp 144–150Google Scholar
  21. 21.
    Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: Proceedings of the 2008 Eighth IEEE international conference on data mining, ICDM ’08. IEEE Computer Society, Washington, pp 263–272,
  22. 22.
    Järvelin K, Kekäläinen J (2000) IR Evaluation Methods for Retrieving Highly Relevant Documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, pp 41-48.
  23. 23.
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678Google Scholar
  24. 24.
    Kabbur S, Ning X, Karypis G (2013) Fism: factored item similarity models for top-n recommender systems. In: ACM SIGKDD, pp 659–667Google Scholar
  25. 25.
    Kataria S, Mitra P, Bhatia S (1999) Utilizing Context in Generative Bayesian Models for Linked Corpus. Aaai 10(Hofmann):1Google Scholar
  26. 26.
    Kim HN, Ji AT, Ha I, Jo GS (2010) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron Commer Res Appl 9(1):73–83. CrossRefGoogle Scholar
  27. 27.
    Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Mach Learn 81(1):53–67. MathSciNetCrossRefGoogle Scholar
  28. 28.
    Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196Google Scholar
  29. 29.
    LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551CrossRefGoogle Scholar
  30. 30.
    Li X, She J (2017) Collaborative variational autoencoder for recommender systems. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 305–314Google Scholar
  31. 31.
    Li Z, Peng JY, Geng GH, Chen XJ, Zheng PP (2014) Video recommendation based on multi-modal information and multiple kernel. Multimed Tools Appl 74(13):4599–4616. CrossRefGoogle Scholar
  32. 32.
    Lin J, Wilbur WJ (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinform 8(1):423. CrossRefGoogle Scholar
  33. 33.
    Linden G, Smith B, York J (2003) recommendations: Item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80. CrossRefGoogle Scholar
  34. 34.
    McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM conference on recommender systems. ACM, pp 165–172Google Scholar
  35. 35.
    Mei T, Yang B, Hua XS, Yang L, Yang SQ, Li S (2007) VideoReach. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’07. ACM Press, New York, pp 767.,
  36. 36.
    Nascimento C, Laender AH, Da Silva AS, Gonçalves MA (2011) A Source Independent Framework for Research Paper Recommendation. In: Proceedings of the 11th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 297–306.,
  37. 37.
    Nascimento G, Laranjeira C, Braz V, Lacerda A, Nascimento ER (2018) A robust indoor scene recognition method based on sparse representation. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 22nd Iberoamerican Congress, CIARP 2017, Valparaíso, 2017, proceedings. Springer, vol 10657, pp 408Google Scholar
  38. 38.
    Ning X, Karypis G (2011) Slim: Sparse linear methods for top-n recommender systems. In: ICDM’11, pp 497–506Google Scholar
  39. 39.
    Ning X, Karypis G (2012) Sparse linear methods with side information for top-n recommendations. In: ACM RecSys, pp 155–162Google Scholar
  40. 40.
    Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, ACM, pp 251–260Google Scholar
  41. 41.
    Rassweiler Filho RJ, Wehrmann J, Barros RC (2017) Leveraging deep visual features for content-based movie recommender systems. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 604–611Google Scholar
  42. 42.
    Redi M, O’Hare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 Seconds of sound and vision: Creativity in micro-videos. In: The IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  43. 43.
    Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009a) Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference‘ on uncertainty in artificial intelligence, UAI’09. AUAI Press, Arlington, pp 452–461.
  44. 44.
    Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009b) Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 452–461Google Scholar
  45. 45.
    Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088): 533CrossRefzbMATHGoogle Scholar
  46. 46.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis (IJCV) 115(3):211–252. MathSciNetCrossRefGoogle Scholar
  47. 47.
    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826Google Scholar
  48. 48.
    Vapnik VN (1998) The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc.,
  49. 49.
    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(Dec):3371–3408MathSciNetzbMATHGoogle Scholar
  50. 50.
    Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 448–456Google Scholar
  51. 51.
    Yang C, Chen X, Liu L, Liu T, Geng S (2018) A hybrid movie recommendation method based on social similarity and item attributes. In: International conference on sensing and imaging. Springer, pp 275–285Google Scholar
  52. 52.
    Yang J, Nguyen MN, San PP, Li X, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In: Ijcai, vol 15, pp 3995–4001Google Scholar
  53. 53.
    Zhang F, Yuan NJ, Lian D, Xie X, Ma WY (2016) Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 353–362Google Scholar
  54. 54.
    Zhang Y, Ai Q, Chen X, Croft WB (2017) Joint representation learning for top-n recommendation with heterogeneous information sources. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1449– 1458Google Scholar
  55. 55.
    Zheng L, Noroozi V, Yu PS (2017) Joint deep modeling of users and items using reviews for recommendation. In: Proceedings of the Tenth ACM international conference on Web search and data mining. ACM, pp 425–434Google Scholar
  56. 56.
    Zheng Y, Mobasher B, Burke R (2014) Cslim. In: Proceedings of the 8th ACM conference on recommender systems - RecSys’14, vol 0, pp 301–304.,
  57. 57.
    Zhou R, Khemmarat S, Gao L (2010) The Impact of YouTube Recommendation System on Video Views. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 404–410.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of ComputingCEFET-MGBelo HorizonteBrazil
  2. 2.Department of Computer ScienceUFMGBelo HorizonteBrazil

Personalised recommendations