POLAR: Attention-Based CNN for One-Shot Personalized Article Recommendation

  • Zhengxiao DuEmail author
  • Jie Tang
  • Yuhui Ding
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)


In this paper, we propose POLAR, an attention-based CNN combined with one-shot learning for personalized article recommendation. Given a query, POLAR uses an attention-based CNN to estimate the relevance score between the query and related articles. The attention mechanism can help significantly improve the relevance estimation. For example, on AMiner, this can help achieve a +5.0% improvement in terms of NDCG@3. One more challenge in personalized article recommendation is how to collect statistically sufficient training data for a recommendation model. POLAR combines a one-shot learning function into the recommendation model, which further gains significant improvements. For example, on AMiner, with only  1.6 feedbacks on average, POLAR achieves 2.7% improvement by NDCG@3. We evaluate the proposed POLAR on three different datasets: AMiner, Patent, and RARD. Experimental results demonstrate the effectiveness of the proposed model. Recently, we have successfully deployed POLAR into AMiner as the recommendation engine for article recommendation, which further confirms the effectiveness of the proposed model. Data related to this paper is available at:


Personalized recommendation Term weighting CNN One-shot learning 


  1. 1.
    Beel, J., Carevic, Z., Schaible, J., Neusch, G.: RARD: the related-article recommendation dataset [data] (2017).
  2. 2.
    Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: UAI, pp. 43–52 (1998)Google Scholar
  3. 3.
    Caruana, R., Lawrence, S., Giles, L.: Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In: NIPS, pp. 381–387 (2000)Google Scholar
  4. 4.
    Chen, K., Wang, J., Chen, L., Gao, H., Xu, W., Nevatia, R.: ABC-CNN: an attention based convolutional neural network for visual question answering. CoRR abs/1511.05960 (2015)Google Scholar
  5. 5.
    Das, A.S., Datar, M., Garg, A., Rajaram, S.: Google news personalization: scalable online collaborative filtering. In: WWW, pp. 271–280 (2007).
  6. 6.
    Das, A.: Audio visual person authentication by multiple nearest neighbor classifiers. In: Lee, S.-W., Li, S.Z. (eds.) ICB 2007. LNCS, vol. 4642, pp. 1114–1123. Springer, Heidelberg (2007). Scholar
  7. 7.
    Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: CIKM, pp. 55–64 (2016).
  8. 8.
    Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp. 2333–2338 (2013).
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)Google Scholar
  10. 10.
    Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2 (2015)Google Scholar
  11. 11.
    Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: ICML, pp. 957–966 (2015)Google Scholar
  12. 12.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
  13. 13.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). Scholar
  14. 14.
    Li, F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006). Scholar
  15. 15.
    Li, L., Wang, D., Li, T., Knox, D., Padmanabhan, B.: SCENE: a scalable two-stage personalized news recommendation system. In: SIGIR, pp. 125–134 (2011).
  16. 16.
    Marlin, B., Zemel, R.S.: The multiple multiplicative factor model for collaborative filtering. In: ICML, p. 73 (2004).
  17. 17.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
  18. 18.
    Mitra, B., Diaz, F., Craswell, N.: Learning to match using local and distributed representations of text for web search. In: WWW, pp. 1291–1299 (2017).
  19. 19.
    Palangi, H., et al.: Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Trans. Audio Speech Lang. Proc. 24(4), 694–707 (2016). Scholar
  20. 20.
    Pang, L., Lan, Y., Guo, J., Xu, J., Wan, S., Cheng, X.: Text matching as image recognition. In: AAAI, pp. 2793–2799 (2016)Google Scholar
  21. 21.
    Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: Brusilovsky, P., Kobsa, A., Nejdl, W. (eds.) The Adaptive Web. LNCS, vol. 4321, pp. 325–341. Springer, Heidelberg (2007). Scholar
  22. 22.
    Rendle, S.: Factorization machines. In: ICDM, pp. 995–1000 (2010).
  23. 23.
    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC, pp. 109–126 (1994)Google Scholar
  24. 24.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  25. 25.
    Salton, G., Fox, E.A., Wu, H.: Extended boolean information retrieval. Commun. ACM 26(11), 1022–1036 (1983). Scholar
  26. 26.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: SIGKDD, pp. 990–998 (2008).
  28. 28.
    Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: NIPS, pp. 3630–3638 (2016)Google Scholar
  29. 29.
    Wan, S., Lan, Y., Guo, J., Xu, J., Pang, L., Cheng, X.: A deep architecture for semantic matching with multiple positional sentence representations. In: AAAI, pp. 2835–2841 (2016)Google Scholar
  30. 30.
    Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: CVPR, pp. 842–850 (2015).
  31. 31.
    Xiong, C., Dai, Z., Callan, J., Liu, Z., Power, R.: End-to-end neural ad-hoc ranking with kernel pooling. In: SIGIR, pp. 55–64 (2017).
  32. 32.
    Yin, W., Schütze, H., Xiang, B., Zhou, B.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. TACL 4, 259–272 (2016)Google Scholar
  33. 33.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  34. 34.
    Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: SIGIR, pp. 575–584 (2015).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations