Skip to main content

Feature Selection Approaches to Fraud Detection in e-Payment Systems

  • Conference paper
  • First Online:
E-Commerce and Web Technologies (EC-Web 2016)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 278))

Included in the following conference series:

Abstract

Due to the large amount of data generated in electronic transactions, to find the best set of features is an essential task to identify frauds. Fraud detection is a specific application of anomaly detection, characterized by a large imbalance between the classes, which can be a detrimental factor for feature selection techniques. In this work we evaluate the behavior and impact of feature selection techniques to detect fraud in a Web Transaction scenario. To measure the effectiveness of the feature selection approach we use some state-of-the-art classification techniques to identify frauds, using real application data. Our results show that the imbalance between the classes reduces the effectiveness of feature selection and that resampling strategy applied in this task improves the final results. We achieve a very good performance, reducing the number of features and presenting financial gains of up to 57.5% compared to the actual scenario of the company.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://pagseguro.uol.com.br.

References

  1. Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for creditcard fraud: a comparative study. J. Decis. Support Syst. 50(3), 602–613 (2011)

    Google Scholar 

  2. Kim, K., Choi, Y., Park, J.: Pricing fraud detection in online shopping malls using a finite mixture model. Electron. Commer. Res. Appl. 12(3), 195–207 (2013)

    Article  Google Scholar 

  3. Almendra, V.: Finding the needle: a risk-based ranking of product listings at online auction sites for non-delivery fraud prediction. Expert Syst. Appl. 40(12), 4805–4811 (2013)

    Article  Google Scholar 

  4. Richhariya, P., Singh, P.K.: Article: a survey on financial fraud detection methodologies. Intl. J. Comput. Appl. 45(22), 15–22 (2012)

    Google Scholar 

  5. Ravisankar, P., Ravi, V., Rao, G.R., Bose, I.: Detection of financial statement fraud and feature selection using data mining techniques. Decis. Support Syst. 50(2), 491–500 (2011)

    Article  Google Scholar 

  6. Kamal, A.H.M., Zhu, X., Pandya, A., Hsu, S., Narayanan, R.: Feature selection for datasets with imbalanced class distributions. Int. J. Softw. Eng. Knowl. Eng. 20(02), 113–137 (2010)

    Article  Google Scholar 

  7. Zhang, Y., Bian, J., Zhu, W.: Trust fraud: A crucial challenge for china e-commerce market. Electron. Commer. Res. Appl. 12(5), 299–308 (2013)

    Article  Google Scholar 

  8. Chiu, C., Ku, Y., Lie, T., Chen, Y.: Internet auction fraud detection using social network analysis and classification tree approaches. Intl. J. Electronic Commerce 15(3), 123–147 (2011)

    Article  Google Scholar 

  9. Keele, S.: Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE Technical Report. EBSE (2007)

    Google Scholar 

  10. Chen, X., Wasikowski, M.: Fast: a roc-based feature selection metric for small samples and imbalanced data classification problems. In: Proceedings of the 14th ACM SIGKDD Conference on Knowledge discovery and data mining, pp. 124–132. ACM (2008)

    Google Scholar 

  11. Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Threshold-based feature selection techniques for high-dimensional bioinformatics data. Netw. Modeling Anal. Health Inform. Bioinform. 1(1–2), 47–61 (2012)

    Article  Google Scholar 

  12. Cuaya, G., Muñoz-Meléndez, A., Morales, E.F.: A minority class feature selection method. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 417–424. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25085-9_49

    Chapter  Google Scholar 

  13. Alibeigi, M., Hashemi, S., Hamzeh, A.: DBFS: an effective density based feature selection scheme for small sample size and high dimensional imbalanced data sets. Data Knowl. Eng. 81, 67–103 (2012)

    Article  Google Scholar 

  14. Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)

    Google Scholar 

  15. Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Feature selection with high-dimensional imbalanced data. In: IEEE International Conference on Data Mining Workshops, 2009, ICDMW 2009, pp. 507–514. IEEE (2009)

    Google Scholar 

  16. Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using SVM. Inf. Sci. 286, 228–246 (2014)

    Article  Google Scholar 

  17. Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning, ICML 2000, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)

    Google Scholar 

  18. Kelleher, J., Namee, B.M.: Information based learning (2011)

    Google Scholar 

  19. Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection. Chapman and Hall, Boca Raton (2008)

    Google Scholar 

  20. Mani, I., Zhang, I.: kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets (2003)

    Google Scholar 

  21. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. 16, 321–357 (2002)

    Google Scholar 

  22. Maes, S., Tuyls, k., Vanschoenwinkel, B., Manderick, B.: Credit card fraud detection using bayesian and neural networks. Vrije Universiteir Brussel (2001)

    Google Scholar 

  23. Hosmer, D.W.: Applied Logistic Regression, 2nd edn. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  24. Dobson, A.J.: An Introduction to Generalized Linear Models. Chapman and Hall, London (1990)

    Book  MATH  Google Scholar 

  25. Salzberg, S.: C4.5: Programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 16(3), 235–240 (1994)

    MathSciNet  Google Scholar 

  26. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  27. Lima, R.A.F., Pereira, A.C.M.: Fraud detection in web transactions. In: WebMedia, pp. 273–280 (2012)

    Google Scholar 

  28. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11(1), 86–92 (1940)

    Article  MathSciNet  MATH  Google Scholar 

  29. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statistical Soc. Ser. B (Methodological) 57, 289–300 (1995)

    Google Scholar 

  30. Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11. Citeseer (2003)

    Google Scholar 

Download references

Acknowledgment

This research was supported by the Brazilian National Institute of Science and Technology for the Web (CNPq grant numbers 573871/2008-6 and 477709/2012-5), MASWeb (grant FAPEMIG/PRONEX APQ-01400-14), EUBra-BIGSEA (H2020-EU.2.1.1 690116, Brazil/MCTI/RNP GA-000650/04), CAPES, CNPq, Fapemig and Universo OnLine Inc. (UOL).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Franca Lima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lima, R.F., Pereira, A.C.M. (2017). Feature Selection Approaches to Fraud Detection in e-Payment Systems. In: Bridge, D., Stuckenschmidt, H. (eds) E-Commerce and Web Technologies. EC-Web 2016. Lecture Notes in Business Information Processing, vol 278. Springer, Cham. https://doi.org/10.1007/978-3-319-53676-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53676-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53675-0

  • Online ISBN: 978-3-319-53676-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics