Reconciling predictive and interpretable performance in repeat buyer prediction via model distillation and heterogeneous classifiers fusion


Repeat buyer prediction is crucial for e-commerce companies to enhance their customer services and product sales. In particular, being aware of which factors or rules drive repeat purchases is as significant as knowing the outcomes of predictions in the business field. Therefore, an interpretable model with excellent prediction performance is required. Many classifiers, such as the multilayer perceptron, have exceptional predictive abilities but lack model interpretability. Tree-based models possess interpretability; however, their predictive performances usually cannot achieve high levels. Based on these observations, we design an approach to balance the predictive and interpretable performance of a decision tree with model distillation and heterogeneous classifier fusion. Specifically, we first train multiple heterogeneous classifiers and integrate them through diverse combination operators. Then, classifier combination plays the role of teacher model. Subsequently, soft targets are obtained from the teacher and guide training of the decision tree. A real-world repeat buyer prediction dataset is utilized in this paper, and we adopt features with respect to three aspects: users, merchants, and user–merchant pairs. Our experimental results show that the accuracy and AUC of the decision tree are both improved, and we provide model interpretations of three aspects.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. 1.

    Jia R, Li R, Yu M, Wang S (2017) E-commerce purchase prediction approach by user behavior data. In: International conference on computer, information and telecommunication systems (CITS), pp 1–5

  2. 2.

    Liu G, Nguyen TT, Zhao G, Zha W, Yang J, Cao J, Wu M, Zhao P, Chen W (2016) Repeat buyer prediction for E-Commerce. In: The 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 155–164

  3. 3.

    Tian Y, Ye Z, Yan Y, Sun M (2015) A practical model to predict the repeat purchasing pattern of consumers in the C2C e-commerce. Electron Commer Res 15:571–583

    Article  Google Scholar 

  4. 4.

    Zhang H, Li J, Ji Y, Yue H, Learning S (2017) Understanding by character-level. IEEE Trans Ind Inf 13:616–624

    Article  Google Scholar 

  5. 5.

    Cao J, Li W, Ma C, Tao Z (2018) Optimizing multi-sensor deployment via ensemble pruning for wearable activity recognition. Inf Fusion 41(5):68–79

    Article  Google Scholar 

  6. 6.

    Kurt S, őz E, Askin OE, őz YY (2018) Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches. Neural Comput Appl 29(8):251–262

    Article  Google Scholar 

  7. 7.

    Pai P-F, ChangLiao L-H, Lin K-P (2017) Analyzing basketball games by a support vector machines with decision tree model. Neural Comput Appl 28(12):4159–4167

    Article  Google Scholar 

  8. 8.

    Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. In: NIPS deep learning workshop

  9. 9.

    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    MathSciNet  Article  Google Scholar 

  10. 10.

    Goldstein A, Kapelner A, Bleich J, Pitkin E (2014) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. arXiv:1309.6392v2,

  11. 11.

    Apley DW (2016) Visualizing the effects of predictor variables in black box supervised learning models. arXiv:1612.08468

  12. 12.

    Che Z, Purushotham S, Khemani R, Liu Y (2016) Interpretable deep models for ICU outcome prediction. In: American medical informatics association (AMIA) annual symposium, pp 371–380

  13. 13.

    Tan S, Caruana R, Hooker G, Gordo A (2018) Transparent model distillation. arXiv:1801.08640

  14. 14.

    Tan S, Caruana R, Hooker G, Lou Y (2017) Detecting bias in black-box models using transparent model distillation. arXiv:1710.06169

  15. 15.

    Molnar C (2018) Interpretable machine learning. Retrieved from Accessed 27 Aug 2019

  16. 16.

    Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144

  17. 17.

    Kumar D, Taylor GW, Wong A (2017) Opening the black box of financial AI with CLEAR-trade: a CLass-enhanced attentive response approach for explaining and visualizing deep learning-driven stock market prediction. arXiv:1709.01574

  18. 18.

    Puri N, Gupta P, Agarwal P, Verma S, Krishnamurthy B (2018) MAGIX: model agnostic globally interpretable explanations. arXiv:1706.07160v3,

  19. 19.

    Bucilǎ C, Caruana R, Niculescu-Mizil A (2006) Compression model. KDD 06:20–23

    Google Scholar 

  20. 20.

    Zagoruyko S, Komodakis N (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928

  21. 21.

    Uijlings J, Popov S, Ferrari V (2017) Revisiting knowledge transfer for training object class detectors. arXiv:1708.06128

  22. 22.

    Pham H, Guan MY, Zoph B, Le QV , Dean J (2018) Efficient neural architecture search via parameters sharing. arXiv:1802.03268

  23. 23.

    Frosst N, Hinton G (2017) Distilling a neural network into a soft decision tree. arXiv:1711.09784,

  24. 24.

    Li W, Hou J, Yin L (2015) A classifier fusion method based on classifier accuracy. In: IEEE international conference on mechatronics and control (ICMC)

  25. 25.

    Ruta D, Gabrys B (2000) An overview of classifier fusion methods. Comput Inf Syst 7(1):1–10

    Google Scholar 

  26. 26.

    Haque MN, Noman MN, Berretta R, Moscato P (2016) Optimising weights for heterogeneous ensemble of classifiers with differential evolution. In: IEEE congress on evolutionary computation (CEC)

  27. 27.

    Riniker S, Fechner N, Landrum GA (2013) Heterogeneous classifier fusion for ligand-based virtual screening: or how decision making by committee can be a good thing. J Chem Inf Model 53(11):2829–2836

    Article  Google Scholar 

  28. 28.

    Bashir S, Qamar U, Khan FH (2015) Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble. Qual Quant 49(5):2061–2076

    Article  Google Scholar 

  29. 29.

    Kang S, Cho S, Rhee S, Kyung-Sang Y (2017) Reliable prediction of anti-diabetic drug failure using a reject option. Pattern Anal Appl 20(3):883–891

    MathSciNet  Article  Google Scholar 

  30. 30.

    Ludmila I (2004) Kuncheva, combining pattern classifiers. Wiley, Hoboken, pp 157–160

    Google Scholar 

  31. 31.

    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Routledge, New York

    Google Scholar 

  32. 32.

  33. 33.

  34. 34.

    Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. ArXiv:1412.6980

  35. 35.

    Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  36. 36.

    Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

Download references


This research is partially supported by the National Natural Science Foundation of China (Grants Nos. 71620107002, 61502360 and 71821001).

Author information



Corresponding author

Correspondence to Jingjing Cao.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest statements.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shen, Y., Xu, X. & Cao, J. Reconciling predictive and interpretable performance in repeat buyer prediction via model distillation and heterogeneous classifiers fusion. Neural Comput & Applic 32, 9495–9508 (2020).

Download citation


  • Model distillation
  • Heterogeneous classifier fusion
  • Interpretable models
  • Repeat buyer prediction