A Novel Ensemble Approach for Click-Through Rate Prediction Based on Factorization Machines and Gradient Boosting Decision Trees

  • Xiaochen WangEmail author
  • Gang Hu
  • Haoyang Lin
  • Jiayu Sun
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11642)


Click-Through Rate (CTR) prediction is a significant technique in the field of computational advertising, its accuracy directly affects companies profits and user experience. Achieving great ability of generalization by learning complicated feature interactions behind user behaviors is critical in improving CTR for recommender systems. Factorization Machines (FM) is a hot recommender method for efficiently modeling features’ second-order interactions. Nevertheless, FM cannot capture the nonlinear and complex modes implied in the real-world data while it models feature in a linear way and just uses the second-order feature interactions. In this paper, we propose a model named GFM, which is an ensemble learning of FM and Gradient Boosting Decision Trees (GBDT) for recommendations. We use FM to model linear features and second-order feature interactions and use GBDT to model the side information for transforming the raw features to cross-combined features. In addition, we import the attention mechanism to calculate users’ latent attention on different features. To illustrate the performance of GFM, we conduct experiments on two real-world datasets, including a movie dataset and a music dataset, the results show that our model is effective in providing accurate recommendations.


Factorization Machines Gradient Boosting Decision Trees CTR prediction Attention 



This work is supported by National Natural Science Foundation of China (grants No. 61672133 and No. 61832001).


  1. 1.
    Bai, B., et al.: Learning to rank with (a lot of) word features. Inf. Retr. 13(3), 291–314 (2010)CrossRefGoogle Scholar
  2. 2.
    Bayer, I., He, X., Kanagal, B., Rendle, S.: A generic coordinate descent framework for learning from implicit feedback. In: WWW, pp. 1341–1350. ACM (2017)Google Scholar
  3. 3.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003).
  4. 4.
    Blondel, M., Ishihata, M., Fujino, A., Ueda, N.: Polynomial networks and factorization machines: new insights and efficient training algorithms. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 850–858. (2016)Google Scholar
  5. 5.
    Chen, J., Zhang, H., He, X., Nie, L., Liu, W., Chua, T.: Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In: SIGIR, pp. 335–344. ACM (2017)Google Scholar
  6. 6.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 785–794. ACM (2016).
  7. 7.
    Cheng, H., et al.: Wide & deep learning for recommender systems. In: DLRS@RecSys, pp. 7–10. ACM (2016)Google Scholar
  8. 8.
    Davidson, J., et al.: The Youtube video recommendation system. In: RecSys, pp. 293–296. ACM (2010)Google Scholar
  9. 9.
    Deshpande, M., Karypis, G.: Item-based top-N recommendation algorithms. ACM Trans. Inf. Syst. 22(1), 143–177 (2004)CrossRefGoogle Scholar
  10. 10.
    Guo, H., Tang, R., Ye, Y., Li, Z., He, X.: DeepFM: a factorization-machine based neural network for CTR prediction. In: IJCAI, pp. 1725–1731. (2017)Google Scholar
  11. 11.
    He, X., Chen, T., Kan, M., Chen, X.: TriRank: review-aware explainable recommendation by modeling aspects. In: CIKM, pp. 1661–1670. ACM (2015)Google Scholar
  12. 12.
    He, X., et al.: Practical lessons from predicting clicks on ads at Facebook. In: ADKDD@KDD, pp. 5:1–5:9. ACM (2014)Google Scholar
  13. 13.
    Hong, L., Doumith, A.S., Davison, B.D.: Co-factorization machines: modeling user interests and predicting individual decisions in Twitter. In: WSDM, pp. 557–566. ACM (2013)Google Scholar
  14. 14.
    Hong, R., Yang, Y., Wang, M., Hua, X.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)CrossRefGoogle Scholar
  15. 15.
    Juan, Y., Zhuang, Y., Chin, W., Lin, C.: Field-aware factorization machines for CTR prediction. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 43–50. ACM (2016).
  16. 16.
    Kabbur, S., Ning, X., Karypis, G.: FISM: factored item similarity models for top-n recommender systems. In: KDD, pp. 659–667. ACM (2013)Google Scholar
  17. 17.
    Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: NIPS, pp. 3149–3157 (2017)Google Scholar
  18. 18.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014).
  19. 19.
    Koren, Y., Bell, R.: Advances in collaborative filtering. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 77–118. Springer, Boston, MA (2015). Scholar
  20. 20.
    Liu, D.C., et al.: Related pins at pinterest: the evolution of a real-world recommender system. In: WWW (Companion Volume), pp. 583–592. ACM (2017)Google Scholar
  21. 21.
    Oentaryo, R.J., Lim, E., Low, J., Lo, D., Finegold, M.: Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In: WSDM, pp. 123–132. ACM (2014)Google Scholar
  22. 22.
    Petroni, F., Corro, L.D., Gemulla, R.: CORE: context-aware open relation extraction with factorization machines. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015, pp. 1763–1773. The Association for Computational Linguistics (2015).
  23. 23.
    Qiang, R., Liang, F., Yang, J.: Exploiting ranking factorization machines for microblog retrieval. In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, 27 October–1 November 2013, pp. 1783–1788. ACM (2013).
  24. 24.
    Rendle, S.: Factorization machines. In: ICDM, pp. 995–1000. IEEE Computer Society (2010)Google Scholar
  25. 25.
    Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: UAI, pp. 452–461. AUAI Press (2009)Google Scholar
  26. 26.
    Rendle, S., Gantner, Z., Freudenthaler, C., Schmidt-Thieme, L.: Fast context-aware recommendations with factorization machines. In: SIGIR, pp. 635–644. ACM (2011)Google Scholar
  27. 27.
    Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP, pp. 379–389. The Association for Computational Linguistics (2015)Google Scholar
  28. 28.
    Shan, Y., Hoens, T.R., Jiao, J., Wang, H., Yu, D., Mao, J.C.: Deep crossing: web-scale modeling without manually crafted combinatorial features. In: KDD, pp. 255–262. ACM (2016)Google Scholar
  29. 29.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Wang, X., He, X., Feng, F., Nie, L., Chua, T.: TEM: tree-enhanced embedding model for explainable recommendation. In: WWW, pp. 1543–1552. ACM (2018)Google Scholar
  31. 31.
    Xiao, J., Ye, H., He, X., Zhang, H., Wu, F., Chua, T.: Attentional factorization machines: learning the weight of feature interactions via attention networks. In: IJCAI, pp. 3119–3125. (2017)Google Scholar
  32. 32.
    Xu, Z., Xia, M.: Hesitant fuzzy entropy and cross-entropy and their use in multiattribute decision-making. Int. J. Intell. Syst. 27(9), 799–822 (2012)CrossRefGoogle Scholar
  33. 33.
    Zhou, G., et al.: Deep interest evolution network for click-through rate prediction. CoRR abs/1809.03672 (2018)Google Scholar
  34. 34.
    Zhou, G., et al.: Deep interest network for click-through rate prediction. In: KDD, pp. 1059–1068. ACM (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Xiaochen Wang
    • 1
    Email author
  • Gang Hu
    • 1
  • Haoyang Lin
    • 1
  • Jiayu Sun
    • 1
  1. 1.Center for Future Media, School of Computer Science and EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations