A Novel Ensemble Approach for Click-Through Rate Prediction Based on Factorization Machines and Gradient Boosting Decision Trees
Abstract
Click-Through Rate (CTR) prediction is a significant technique in the field of computational advertising, its accuracy directly affects companies profits and user experience. Achieving great ability of generalization by learning complicated feature interactions behind user behaviors is critical in improving CTR for recommender systems. Factorization Machines (FM) is a hot recommender method for efficiently modeling features’ second-order interactions. Nevertheless, FM cannot capture the nonlinear and complex modes implied in the real-world data while it models feature in a linear way and just uses the second-order feature interactions. In this paper, we propose a model named GFM, which is an ensemble learning of FM and Gradient Boosting Decision Trees (GBDT) for recommendations. We use FM to model linear features and second-order feature interactions and use GBDT to model the side information for transforming the raw features to cross-combined features. In addition, we import the attention mechanism to calculate users’ latent attention on different features. To illustrate the performance of GFM, we conduct experiments on two real-world datasets, including a movie dataset and a music dataset, the results show that our model is effective in providing accurate recommendations.
Keywords
Factorization Machines Gradient Boosting Decision Trees CTR prediction AttentionNotes
Acknowledgments
This work is supported by National Natural Science Foundation of China (grants No. 61672133 and No. 61832001).
References
- 1.Bai, B., et al.: Learning to rank with (a lot of) word features. Inf. Retr. 13(3), 291–314 (2010)CrossRefGoogle Scholar
- 2.Bayer, I., He, X., Kanagal, B., Rendle, S.: A generic coordinate descent framework for learning from implicit feedback. In: WWW, pp. 1341–1350. ACM (2017)Google Scholar
- 3.Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003). http://jmlr.org/papers/v3/bengio03a.html
- 4.Blondel, M., Ishihata, M., Fujino, A., Ueda, N.: Polynomial networks and factorization machines: new insights and efficient training algorithms. In: ICML. JMLR Workshop and Conference Proceedings, vol. 48, pp. 850–858. JMLR.org (2016)Google Scholar
- 5.Chen, J., Zhang, H., He, X., Nie, L., Liu, W., Chua, T.: Attentive collaborative filtering: multimedia recommendation with item- and component-level attention. In: SIGIR, pp. 335–344. ACM (2017)Google Scholar
- 6.Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 785–794. ACM (2016). https://doi.org/10.1145/2939672.2939785
- 7.Cheng, H., et al.: Wide & deep learning for recommender systems. In: DLRS@RecSys, pp. 7–10. ACM (2016)Google Scholar
- 8.Davidson, J., et al.: The Youtube video recommendation system. In: RecSys, pp. 293–296. ACM (2010)Google Scholar
- 9.Deshpande, M., Karypis, G.: Item-based top-N recommendation algorithms. ACM Trans. Inf. Syst. 22(1), 143–177 (2004)CrossRefGoogle Scholar
- 10.Guo, H., Tang, R., Ye, Y., Li, Z., He, X.: DeepFM: a factorization-machine based neural network for CTR prediction. In: IJCAI, pp. 1725–1731. IJCAI.org (2017)Google Scholar
- 11.He, X., Chen, T., Kan, M., Chen, X.: TriRank: review-aware explainable recommendation by modeling aspects. In: CIKM, pp. 1661–1670. ACM (2015)Google Scholar
- 12.He, X., et al.: Practical lessons from predicting clicks on ads at Facebook. In: ADKDD@KDD, pp. 5:1–5:9. ACM (2014)Google Scholar
- 13.Hong, L., Doumith, A.S., Davison, B.D.: Co-factorization machines: modeling user interests and predicting individual decisions in Twitter. In: WSDM, pp. 557–566. ACM (2013)Google Scholar
- 14.Hong, R., Yang, Y., Wang, M., Hua, X.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)CrossRefGoogle Scholar
- 15.Juan, Y., Zhuang, Y., Chin, W., Lin, C.: Field-aware factorization machines for CTR prediction. In: Sen, S., Geyer, W., Freyne, J., Castells, P. (eds.) Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016, pp. 43–50. ACM (2016). https://doi.org/10.1145/2959100.2959134
- 16.Kabbur, S., Ning, X., Karypis, G.: FISM: factored item similarity models for top-n recommender systems. In: KDD, pp. 659–667. ACM (2013)Google Scholar
- 17.Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: NIPS, pp. 3149–3157 (2017)Google Scholar
- 18.Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
- 19.Koren, Y., Bell, R.: Advances in collaborative filtering. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 77–118. Springer, Boston, MA (2015). https://doi.org/10.1007/978-1-4899-7637-6_3CrossRefGoogle Scholar
- 20.Liu, D.C., et al.: Related pins at pinterest: the evolution of a real-world recommender system. In: WWW (Companion Volume), pp. 583–592. ACM (2017)Google Scholar
- 21.Oentaryo, R.J., Lim, E., Low, J., Lo, D., Finegold, M.: Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In: WSDM, pp. 123–132. ACM (2014)Google Scholar
- 22.Petroni, F., Corro, L.D., Gemulla, R.: CORE: context-aware open relation extraction with factorization machines. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015, pp. 1763–1773. The Association for Computational Linguistics (2015). http://aclweb.org/anthology/D/D15/D15-1204.pdf
- 23.Qiang, R., Liang, F., Yang, J.: Exploiting ranking factorization machines for microblog retrieval. In: He, Q., Iyengar, A., Nejdl, W., Pei, J., Rastogi, R. (eds.) 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, USA, 27 October–1 November 2013, pp. 1783–1788. ACM (2013). https://doi.org/10.1145/2505515.2505648
- 24.Rendle, S.: Factorization machines. In: ICDM, pp. 995–1000. IEEE Computer Society (2010)Google Scholar
- 25.Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: UAI, pp. 452–461. AUAI Press (2009)Google Scholar
- 26.Rendle, S., Gantner, Z., Freudenthaler, C., Schmidt-Thieme, L.: Fast context-aware recommendations with factorization machines. In: SIGIR, pp. 635–644. ACM (2011)Google Scholar
- 27.Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: EMNLP, pp. 379–389. The Association for Computational Linguistics (2015)Google Scholar
- 28.Shan, Y., Hoens, T.R., Jiao, J., Wang, H., Yu, D., Mao, J.C.: Deep crossing: web-scale modeling without manually crafted combinatorial features. In: KDD, pp. 255–262. ACM (2016)Google Scholar
- 29.Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
- 30.Wang, X., He, X., Feng, F., Nie, L., Chua, T.: TEM: tree-enhanced embedding model for explainable recommendation. In: WWW, pp. 1543–1552. ACM (2018)Google Scholar
- 31.Xiao, J., Ye, H., He, X., Zhang, H., Wu, F., Chua, T.: Attentional factorization machines: learning the weight of feature interactions via attention networks. In: IJCAI, pp. 3119–3125. IJCAI.org (2017)Google Scholar
- 32.Xu, Z., Xia, M.: Hesitant fuzzy entropy and cross-entropy and their use in multiattribute decision-making. Int. J. Intell. Syst. 27(9), 799–822 (2012)CrossRefGoogle Scholar
- 33.Zhou, G., et al.: Deep interest evolution network for click-through rate prediction. CoRR abs/1809.03672 (2018)Google Scholar
- 34.Zhou, G., et al.: Deep interest network for click-through rate prediction. In: KDD, pp. 1059–1068. ACM (2018)Google Scholar