Cognitive Computation

, Volume 11, Issue 2, pp 262–270 | Cite as

Large-scale Ensemble Model for Customer Churn Prediction in Search Ads

  • Qiu-Feng WangEmail author
  • Mirror Xu
  • Amir Hussain


Customer churn prediction is one of the most important issues in search ads business management, which is a multi-billion market. The aim of churn prediction is to detect customers with a high propensity to leave the ads platform, then to do analysis and increase efforts for retaining them ahead of time. Ensemble model combines multiple weak models to obtain better predictive performance, which is inspired by human cognitive system and is widely used in various applications of machine learning. In this paper, we investigate how the ensemble model of gradient boosting decision tree (GBDT) to predict whether a customer will be a churner in the foreseeable future based on its activities in the search ads. We extract two types of features for the GBDT: dynamic features and static features. For dynamic features, we consider a sequence of customers’ activities (e.g., impressions, clicks) during a long period. For static features, we consider the information of customers setting (e.g., creation time, customer type). We evaluated the prediction performance in a large-scale customer data set from Bing Ads platform, and the results show that the static and dynamic features are complementary, and get the AUC (area under the curve of ROC) value 0.8410 on the test set by combining all features. The proposed model is useful to predict those customers who will be churner in the near future on the ads platform, and it has been successfully daily run on the Bing Ads platform.


Churn prediction Ensemble model Machine learning Search ads Static features Dynamic features 



We also would like to thank all of the members in Bing Ads Adinsight team and PM team at Microsoft for their discussion and help on this work.

Funding Information

This study was funded by Natural Science Foundation of the Jiangsu Higher Education Institutions of China under no. 17KJB520041 and 17KJD520010; Natural Science Foundation of Jiangsu Province BK20181189 and BK20181190; Open Project Fund of the National Laboratory of Pattern Recognition 201800020, Key Program Special Fund in XJTLU under no. KSF-A-10, KSF-A-01 and KSF-P-02; and XJTLU Research Development Fund RDF-16-02-49. In addition, A. Hussain was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) grant (AV-COGHEAR, grant reference number: EP/M026981/1).

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.


  1. 1.
    Wang Q, Huang K, Li S, Yu W. Adaptive modeling for large-scale advertisers optimization. BMC Big Data Analytics 2017;2:8.CrossRefGoogle Scholar
  2. 2.
    Kim HS, Yoon CH. Determinants of subscriber churn and customer loyalty in the Korean mobile telephony market. Telecommun Policy 2004;28(9-10):751–65.CrossRefGoogle Scholar
  3. 3.
    Hadden J, Tiwari A, Roy R, Ruta D. Computer assisted customer churn management: state-of-the-art and future trends. Comput Oper Res 2007;v34(10):2902–17.CrossRefGoogle Scholar
  4. 4.
    Yoon S, Koehler J, Ghobarah A. 2010. Prediction of advertiser churn for google adwords jsm proceedings.Google Scholar
  5. 5.
    Vafeiadis T, Diamantaras KI, Sarigiannidis G, et al. A comparison of machine learning techniques for customer churn prediction. Simul Model Pract Theory 2015;55:1–9.CrossRefGoogle Scholar
  6. 6.
    Kraljević G, Gotovac S. Modeling data mining applications for prediction of prepaid churn in telecommunication services. Automatika 2010;51(3):275–83.CrossRefGoogle Scholar
  7. 7.
    Jadhav RJ, Pawar UT. Churn prediction in telecommunication using data mining technology. Int J Adv Comput Sci Appl 2011;2(2):17–9. Scholar
  8. 8.
    Kim K, Jun CH, Lee J. Improved churn prediction in telecommunication industry by analyzing a large network. Expert Syst Appl 2014;41(15):6575–84.CrossRefGoogle Scholar
  9. 9.
    Qureshi SA, Rehman AS, Qamar AM, et al. 2014. Telecommunication subscribersćhurn prediction model using machine learning, 8th International Conference on Digital Information Management. IEEE. pp. 131–136.Google Scholar
  10. 10.
    Amin A, Anwar S, Adnan A, Nawaz M, Alawfi K, Hussain A, Huang K. Customer churn prediction in the telecommunication sector using a rough set approach. Neurocomputing 2017;237:242–54.CrossRefGoogle Scholar
  11. 11.
    Xie Y, Xiu L. 2008. Churn prediction with linear discriminant boosting algorithm. IEEE International Conference on Machine Learning and Cybernetics, pp. 228–233.Google Scholar
  12. 12.
    Glady N, Baesens B, Croux C. Modeling churn using customer lifetime value. Eur J Oper Res 2009; 197(1):402–11.CrossRefGoogle Scholar
  13. 13.
    Nie G, Wei R, Zhang L, et al. Credit card churn forecasting by logistic regression and decision tree. Expert Syst Appl An International Journal 2011;38(12):15273–85.CrossRefGoogle Scholar
  14. 14.
    Ali ÖG, Aritürk U. Dynamic churn prediction framework with more effective use of rare event data: the case of private banking. Expert Syst Appl 2014;41(17):7889–903.CrossRefGoogle Scholar
  15. 15.
    Risselada H, Verhoef PC, Bijmolt THA. Staying power of churn prediction models. J Interact Mark 2010; 24(3):198–208.CrossRefGoogle Scholar
  16. 16.
    Günther C-C, Tvete IF, Aas K, et al. Modelling and predicting customer churn from an insurance company. Scand Actuar J 2014;1:58–71.CrossRefGoogle Scholar
  17. 17.
    Ngonmang B, Viennet E, Tchuente M. Churn prediction in a real online social network using local community analysis. Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining; 2012. p. 282–288.Google Scholar
  18. 18.
    Borbora ZH, Srivastava J. User behavior modelling approach for churn prediction in online games. 2012 international conference on privacy, security, risk and trust, PASSAT 2012, and 2012 international conference on social computing, SocialCom 2012, Amsterdam, Netherlands; 2012. p. 51–60.Google Scholar
  19. 19.
    Runge J, Gao P, Garcin F, et al. Churn prediction for high-value players in casual social games. 2014 IEEE conference on Computational Intelligence and Games; 2014. p. 1–8.Google Scholar
  20. 20.
    Castro EG, Tsuzuki MSG. Churn prediction in online games using playersĺogin records: a frequency analysis approach. IEEE Transactions on Computational Intelligence and Ai in Games 2015;7(3):255–65.CrossRefGoogle Scholar
  21. 21.
    Milošević M, živić N, Andjelković I. Early churn prediction with personalized targeting in mobile social games. Expert Syst Appl 2017;83:326–32.CrossRefGoogle Scholar
  22. 22.
    Gudivada VN, Irfan MT, Fathi E, Rao DL. Cognitive analytics : going beyond big data analytics and machine learning. Handbook of Statistics 2016;35:169–205.CrossRefGoogle Scholar
  23. 23.
    Wang Q-F, Cambria E, Liu C-L, Hussain A. Common sense knowledge for handwritten chinese text recognition. Cogn Comput 2013;5(2):234–42.CrossRefGoogle Scholar
  24. 24.
    Yin X-C, Huang K, Hao H-W. DE2: dynamic ensemble of ensembles for learning nonstationary data. Neurocomputing 2015;165:14–22.CrossRefGoogle Scholar
  25. 25.
    Saliha M, Swindle AH. From spin to identifying falsification in financial text. Cogn Comput 2016;8(4): 729–45.CrossRefGoogle Scholar
  26. 26.
    Ortín S, Pesquera L. Reservoir computing with an ensemble of time-delay reservoirs. Cogn Comput 2017; 9(3):327–36.CrossRefGoogle Scholar
  27. 27.
    Wen GH, Hou Z, Li HH, Li DY, Jiang LJ, Xun EY. Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cogn Comput 2017;9:597–610.CrossRefGoogle Scholar
  28. 28.
    Ayerdi B, Savio A, Graña M. Meta-ensembles of classifiers for Alzheimerś disease detection using independent ROI features. Natural and Artificial Computation in Engineering and Medical Applications. Springer; 2013. pp. 122–130.Google Scholar
  29. 29.
    Gu Q, Ding YS, Zhang TL. An ensemble classifier based prediction of G-protein-coupled receptor classes in low homology. Neurocomputing 2015;154:110–18.CrossRefGoogle Scholar
  30. 30.
    Mogultay H, Vural F T Y. Cognitive learner: an ensemble learning architecture for cognitive state classification. IEEE 25th Signal Processing and Communications Applications Conference; 2017. p. 1–4.Google Scholar
  31. 31.
    Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001;29(5):1189–232.CrossRefGoogle Scholar
  32. 32.
    Goodfellow Ian, Bengio Yoshua, Courville A. Deep Learning. Cambridge: MIT Press; 2016.Google Scholar
  33. 33.
    Duda RO, Hart PE, Stork DG. Pattern classification, 2nd ed. New York: Wiley; 2001.Google Scholar
  34. 34.
    Coussement K, Van den Poel D. Integrating the voice of customers through call center emails into a decision support system for churn prediction. Information & Management 2008;45(3):164–74.CrossRefGoogle Scholar
  35. 35.
    Lima E, Mues C, Baesens B. Domain knowledge integration in data mining using decision tables: case studies in churn prediction. J Oper Res Soc 2009;8(8):1096–106.CrossRefGoogle Scholar
  36. 36.
    Meher AK, Wilson J, Prashanth R. 2017. Towards a large scale practical churn model for prepaid mobile markets. Advances in Data Mining Applications and Theoretical Aspects, pp. 93–106.Google Scholar
  37. 37.
    Li R, Wang P, Chen Z. A feature extraction method based on stacked auto-encoder for telecom churn prediction. In: Zhang L, Song X, and Wu Y, editors. Theory, Methodology, Tools and Applications for Modeling and Simulation of Complex Systems. AsiaSim 2016, SCS AutumnSim. Communications in Computer and Information Science. Singapore: Springer; 2016.Google Scholar
  38. 38.
    Chamberlain BP, Cardoso A, Liu CHB, et al. Customer lifetime value prediction using embeddings. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2017. p. 1753–1762.Google Scholar
  39. 39.
    Coussement K, Van den Poel D. Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques. Expert Syst Appl 2008;34(1):313–27.CrossRefGoogle Scholar
  40. 40.
    Gordini N, Veglio V. Customers churn prediction and marketing retention strategies. An application of support vector machines based on the AUC parameter-selection technique in B2B e-commerce industry. Ind Mark Manag 2016;62:100–7.CrossRefGoogle Scholar
  41. 41.
    Huang Y, Kechadi T. An effective hybrid learning system for telecommunication churn prediction. Expert Syst Appl 2013;40(14):5635–47.CrossRefGoogle Scholar
  42. 42.
    Hadiji F, Sifa R, Drachen A, et al. Predicting player churn in the wild. IEEE conference on Computational intelligence and games (CIG). IEEE; 2014. p. 1–8.Google Scholar
  43. 43.
    Keramati A, Jafari-Marandi R, Aliannejadi M, et al. Improved churn prediction in telecommunication industry using data mining techniques. Appl Soft Comput 2014;24:994–1012.CrossRefGoogle Scholar
  44. 44.
    Lemmens A, Croux C. Bagging and boosting classification trees to predict churn. J Mark Res (JMR) 2006; 43(2):276–86.CrossRefGoogle Scholar
  45. 45.
    Farquad MAH, Ravi V, Raju SN. Churn prediction using comprehensible support vector machine: an analytical CRM application. Appl Soft Comput 2014;19:31–40.CrossRefGoogle Scholar
  46. 46.
    Huang K, Yang H, King I, Lyu MR. Imbalanced learning with biased minimax probability machine. IEEE Trans Syst Man Cybern B 2006;36(4):913–23.CrossRefGoogle Scholar
  47. 47.
    Sun Y, Wong AK, Kamel MS. Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 2009;23(4):687–719.CrossRefGoogle Scholar
  48. 48.
    He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng 2009;21(9):1263–84.CrossRefGoogle Scholar
  49. 49.
    Huang K, Zhang R, Yin X-C. Imbalance learning locally and globally. Neural Process Lett 2015;41(3): 311–23.CrossRefGoogle Scholar
  50. 50.
    Xie Y, Xiu L, Ngai E, Ying W. Customer churn prediction using improved balanced random forests. Expert Syst Appl 2009;36(3):5445–9.CrossRefGoogle Scholar
  51. 51.
    Zhu B, Baesens B, Backiel A, et al. Benchmarking sampling techniques for imbalance learning in churn prediction. J Oper Res Soc 2018;69(1):49–65. Scholar
  52. 52.
    Wangperawong A, Brun C, Laudy O, et al. 2016. Churn analysis using deep convolutional neural networks and autoencoders. arXiv:1604.05377.
  53. 53.
    Kasiran Z, Ibrahim Z, Mohd Ribuan MS. Customer churn prediction using recurrent neural network with reinforcement learning algorithm in mobile phone users. Int J Int Inf Process 2014;5(1):1–11.Google Scholar
  54. 54.
    Spanoudes P, Nguyen T. 2017. Deep learning in customer churn prediction: unsupervised feature learning on abstract company independent feature vectors. arXiv:1703.03869.
  55. 55.
    Chen T. 2014. Introduction to boosted trees, University Of Washington.
  56. 56.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018
corrected publication [December/2018]

Authors and Affiliations

  1. 1.Xi’an Jiaotong-Liverpool University (XJTLU)SuzhouPeople’s Republic of China
  2. 2.Microsoft CorporationBeijingPeople’s Republic of China
  3. 3.Edinburgh Napier UniversityEdinburghScotland

Personalised recommendations