Refined Weighted Random Forest and Its Application to Credit Card Fraud Detection

  • Shiyang Xuan
  • Guanjun LiuEmail author
  • Zhenchuan Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11280)


Random forest (RF) is widely used in many applications due to good classification performance. However, its voting mechanism assumes that all base classifiers have the same weight. In fact, it is more reasonable that some have relatively high weights while some have relatively low weights because the randomization of bootstrap sampling and attributes selecting cannot guarantee all trees have the same ability of making decision. We mainly focus on the weighted voting mechanism and then propose a novel weighted RF in this paper. Experiments on 6 public datasets illustrate that our method outperforms the RF and another weighted RF. We apply our method to credit card fraud detection and experiments also show that our method is the best.


Random forest Weighted decision tree Credit card fraud 



Authors would like to thank reviewers for their helpful comments, and also thank Professor Changjun Jiang who provides authors a lot of assistance on data and experiments. This paper is supported in part by the National Natural Science Foundation of China under grand no. 61572360 and in part by the Shanghai Shuguang Program under grant no. 15SG18. Corresponding author is G.J. Liu.


  1. 1.
    Gupta, S., Johari, R.: A new framework for credit card transactions involving mutual authentication between cardholder and merchant. In: 2011 International Conference on Communication Systems and Network Technologies, pp. 22–26. IEEE (2011)Google Scholar
  2. 2.
    Thomas, K., Grier, C., Ma, J., Paxson, V., Song, D.: Design and evaluation of a real-time url spam filtering service. In: Security and Privacy, vol. 42, pp. 447–462. IEEE (2011)Google Scholar
  3. 3.
    Zhang, Y., Liu, G., Luan, W., Yan, C., Jiang, C.: An approach to class imbalance problem based on stacking and inverse random under sampling methods. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), pp. 1–6. IEEE (2018)Google Scholar
  4. 4.
    Bolton, R.J., Hand, D.J.: Unsupervised profiling methods for fraud detection. In: Credit Scoring and Credit Control VII, pp. 235–255 (2001)Google Scholar
  5. 5.
    Gmbh, Y., Co, K.G.: Global online payment methods: Full year 2016, Technical report (2016)Google Scholar
  6. 6.
    Seyedhossein, L., Hashemi, M.R.: Mining information from credit card time series for timelier fraud detection. In: 2010 5th International Symposium on Telecommunications (IST), pp. 619–624. IEEE (2010)Google Scholar
  7. 7.
    Zheng, L., Liu, G., Yan, C., Jiang, C.: Transaction fraud detection based on total order relation and behavior diversity. IEEE Trans. Comput. Soc. Syst. 99, 1–11 (2018)Google Scholar
  8. 8.
    Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit card fraud detection using hidden Markov model. IEEE Trans. Dependable Secure Comput. 5(1), 37–48 (2008)CrossRefGoogle Scholar
  9. 9.
    Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats oversampling. In: Proceedings of the ICML Workshop on Learning from Imbalanced Datasets II, pp. 1–8 (2003)Google Scholar
  10. 10.
    Quah, J.T.S., Sriganesh, M.: Real-time credit card fraud detection using computational intelligence. Expert Syst. Appl. 35(4), 1721–1732 (2008)CrossRefGoogle Scholar
  11. 11.
    Kundu, A., Panigrahi, S., Sural, S., Majumdar, A.K.: Blast-ssaha hybridization for credit card fraud detection. IEEE Trans. Dependable Secure Comput. 6(4), 309–315 (2009)CrossRefGoogle Scholar
  12. 12.
    Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., Jiang, C.: Random forest for credit card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), pp. 1–6. IEEE (2018)Google Scholar
  13. 13.
    Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support Syst. 50(3), 602–613 (2011)CrossRefGoogle Scholar
  14. 14.
    Mota, G., Fernandes, J., Belo, O.: Usage signatures analysis an alternative method for preventing fraud in E-Commerce applications. In: International Conference on Data Science and Advanced Analytics, pp. 203–208. IEEE (2014)Google Scholar
  15. 15.
    Behdad, M., Barone, L., Bennamoun, M., French, T.: Nature-inspired techniques in the context of fraud detection. IEEE Trans. Syst. Man Cyber. Part C 42(6), 1273–1290 (2012)CrossRefGoogle Scholar
  16. 16.
    Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17(3), 235–249 (2002)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Chan, P.K., Fan, W., Prodromidis, A.L., Stolfo, S.J.: Distributed data mining in credit card fraud detection. IEEE Intell. Syst. Appl. 14(6), 67–74 (2002)CrossRefGoogle Scholar
  18. 18.
    Chen, R.C., Chen, T.S., Lin, C.C.: A new binary support vector system for increasing detection rate of credit card fraud. Int. J. Pattern Recognit. Artif. Intell. 20(02), 227–239 (2006)CrossRefGoogle Scholar
  19. 19.
    Mcdonald, D.W., Ackerman, M.S.: Expertise recommender:a flexible recommendation system and architecture. In: ACM Conference on Computer Supported Cooperative Work, pp. 231–240. ACM (2000)Google Scholar
  20. 20.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefGoogle Scholar
  21. 21.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  22. 22.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). Scholar
  23. 23.
    Quinlan, J.R.: Induction on decision tree. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  24. 24.
    Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and regression trees. Biometrics 40(3), 358 (1984)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Albrecht, W.S., Albrecht, C., Albrecht, C.C.: Current trends in fraud and its detection. Inf. Syst. Secur. 17(1), 2–12 (2008)zbMATHGoogle Scholar
  27. 27.
    Li, H.B., Wang, W., Ding, H.W., Dong, J.: Trees weighting random forest method for classifying high-dimensional noisy data. In: IEEE, International Conference on E-Business Engineering, pp. 160–163. IEEE (2011)Google Scholar
  28. 28.
    Zhou, Q., Zhou, H., Li, T.: Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl. Based Syst. 95, 1–11 (2016)CrossRefGoogle Scholar
  29. 29.
    Harris, J.R., Grunsky, E.C.: Predictive lithological mapping of Canada’s North using random forest classification applied to geophysical and geochemical data. Comput. Geosci. 80, 9–25 (2015)CrossRefGoogle Scholar
  30. 30.
    Singh, K., Guntuku, S.C., Thakur, A., et al.: Big data analytics framework for peer-to-peer botnet detection using random forests. Inform. Sci. 278(19), 488–497 (2014)CrossRefGoogle Scholar
  31. 31.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)CrossRefGoogle Scholar
  32. 32.
    Fanelli, G., Dantone, M., Gall, J., et al.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)CrossRefGoogle Scholar
  33. 33.
    Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. ASA Data Sci. J. 6(6), 496–505 (2013)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)MathSciNetCrossRefGoogle Scholar
  35. 35.
  36. 36.
    Scikit-learn Homepage.

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer Science and TechnologyTongji UniversityShanghaiChina

Personalised recommendations