Feature Selection Approaches to Fraud Detection in e-Payment Systems

Lima, Rafael Franca; Pereira, Adriano C. M.

doi:10.1007/978-3-319-53676-7_9

Rafael Franca Lima⁸ &
Adriano C. M. Pereira⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 278))

Included in the following conference series:

International Conference on Electronic Commerce and Web Technologies

1314 Accesses
6 Citations

Abstract

Due to the large amount of data generated in electronic transactions, to find the best set of features is an essential task to identify frauds. Fraud detection is a specific application of anomaly detection, characterized by a large imbalance between the classes, which can be a detrimental factor for feature selection techniques. In this work we evaluate the behavior and impact of feature selection techniques to detect fraud in a Web Transaction scenario. To measure the effectiveness of the feature selection approach we use some state-of-the-art classification techniques to identify frauds, using real application data. Our results show that the imbalance between the classes reduces the effectiveness of feature selection and that resampling strategy applied in this task improves the final results. We achieve a very good performance, reducing the number of features and presenting financial gains of up to 57.5% compared to the actual scenario of the company.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://pagseguro.uol.com.br.

References

Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for creditcard fraud: a comparative study. J. Decis. Support Syst. 50(3), 602–613 (2011)
Google Scholar
Kim, K., Choi, Y., Park, J.: Pricing fraud detection in online shopping malls using a finite mixture model. Electron. Commer. Res. Appl. 12(3), 195–207 (2013)
Article Google Scholar
Almendra, V.: Finding the needle: a risk-based ranking of product listings at online auction sites for non-delivery fraud prediction. Expert Syst. Appl. 40(12), 4805–4811 (2013)
Article Google Scholar
Richhariya, P., Singh, P.K.: Article: a survey on financial fraud detection methodologies. Intl. J. Comput. Appl. 45(22), 15–22 (2012)
Google Scholar
Ravisankar, P., Ravi, V., Rao, G.R., Bose, I.: Detection of financial statement fraud and feature selection using data mining techniques. Decis. Support Syst. 50(2), 491–500 (2011)
Article Google Scholar
Kamal, A.H.M., Zhu, X., Pandya, A., Hsu, S., Narayanan, R.: Feature selection for datasets with imbalanced class distributions. Int. J. Softw. Eng. Knowl. Eng. 20(02), 113–137 (2010)
Article Google Scholar
Zhang, Y., Bian, J., Zhu, W.: Trust fraud: A crucial challenge for china e-commerce market. Electron. Commer. Res. Appl. 12(5), 299–308 (2013)
Article Google Scholar
Chiu, C., Ku, Y., Lie, T., Chen, Y.: Internet auction fraud detection using social network analysis and classification tree approaches. Intl. J. Electronic Commerce 15(3), 123–147 (2011)
Article Google Scholar
Keele, S.: Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE Technical Report. EBSE (2007)
Google Scholar
Chen, X., Wasikowski, M.: Fast: a roc-based feature selection metric for small samples and imbalanced data classification problems. In: Proceedings of the 14th ACM SIGKDD Conference on Knowledge discovery and data mining, pp. 124–132. ACM (2008)
Google Scholar
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Threshold-based feature selection techniques for high-dimensional bioinformatics data. Netw. Modeling Anal. Health Inform. Bioinform. 1(1–2), 47–61 (2012)
Article Google Scholar
Cuaya, G., Muñoz-Meléndez, A., Morales, E.F.: A minority class feature selection method. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 417–424. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25085-9_49
Chapter Google Scholar
Alibeigi, M., Hashemi, S., Hamzeh, A.: DBFS: an effective density based feature selection scheme for small sample size and high dimensional imbalanced data sets. Data Knowl. Eng. 81, 67–103 (2012)
Article Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)
Google Scholar
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Feature selection with high-dimensional imbalanced data. In: IEEE International Conference on Data Mining Workshops, 2009, ICDMW 2009, pp. 507–514. IEEE (2009)
Google Scholar
Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using SVM. Inf. Sci. 286, 228–246 (2014)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning, ICML 2000, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Google Scholar
Kelleher, J., Namee, B.M.: Information based learning (2011)
Google Scholar
Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection. Chapman and Hall, Boca Raton (2008)
Google Scholar
Mani, I., Zhang, I.: kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets (2003)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. 16, 321–357 (2002)
Google Scholar
Maes, S., Tuyls, k., Vanschoenwinkel, B., Manderick, B.: Credit card fraud detection using bayesian and neural networks. Vrije Universiteir Brussel (2001)
Google Scholar
Hosmer, D.W.: Applied Logistic Regression, 2nd edn. Wiley, New York (2000)
Book MATH Google Scholar
Dobson, A.J.: An Introduction to Generalized Linear Models. Chapman and Hall, London (1990)
Book MATH Google Scholar
Salzberg, S.: C4.5: Programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 16(3), 235–240 (1994)
MathSciNet Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Lima, R.A.F., Pereira, A.C.M.: Fraud detection in web transactions. In: WebMedia, pp. 273–280 (2012)
Google Scholar
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11(1), 86–92 (1940)
Article MathSciNet MATH Google Scholar
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statistical Soc. Ser. B (Methodological) 57, 289–300 (1995)
Google Scholar
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11. Citeseer (2003)
Google Scholar

Download references

Acknowledgment

This research was supported by the Brazilian National Institute of Science and Technology for the Web (CNPq grant numbers 573871/2008-6 and 477709/2012-5), MASWeb (grant FAPEMIG/PRONEX APQ-01400-14), EUBra-BIGSEA (H2020-EU.2.1.1 690116, Brazil/MCTI/RNP GA-000650/04), CAPES, CNPq, Fapemig and Universo OnLine Inc. (UOL).

Author information

Authors and Affiliations

Department of Computer Science (DCC), Federal University of Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, 31270-901, Brazil
Rafael Franca Lima & Adriano C. M. Pereira

Authors

Rafael Franca Lima
View author publications
You can also search for this author in PubMed Google Scholar
Adriano C. M. Pereira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Franca Lima .

Editor information

Editors and Affiliations

The Insight Centre for Data Analytics, University College Cork , Cork, Ireland
Derek Bridge
Data and Web Science Group, University of Mannheim, Mannheim, Germany
Heiner Stuckenschmidt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lima, R.F., Pereira, A.C.M. (2017). Feature Selection Approaches to Fraud Detection in e-Payment Systems. In: Bridge, D., Stuckenschmidt, H. (eds) E-Commerce and Web Technologies. EC-Web 2016. Lecture Notes in Business Information Processing, vol 278. Springer, Cham. https://doi.org/10.1007/978-3-319-53676-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-53676-7_9
Published: 15 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53675-0
Online ISBN: 978-3-319-53676-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics