Abstract
Due to the large amount of data generated in electronic transactions, to find the best set of features is an essential task to identify frauds. Fraud detection is a specific application of anomaly detection, characterized by a large imbalance between the classes, which can be a detrimental factor for feature selection techniques. In this work we evaluate the behavior and impact of feature selection techniques to detect fraud in a Web Transaction scenario. To measure the effectiveness of the feature selection approach we use some state-of-the-art classification techniques to identify frauds, using real application data. Our results show that the imbalance between the classes reduces the effectiveness of feature selection and that resampling strategy applied in this task improves the final results. We achieve a very good performance, reducing the number of features and presenting financial gains of up to 57.5% compared to the actual scenario of the company.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for creditcard fraud: a comparative study. J. Decis. Support Syst. 50(3), 602–613 (2011)
Kim, K., Choi, Y., Park, J.: Pricing fraud detection in online shopping malls using a finite mixture model. Electron. Commer. Res. Appl. 12(3), 195–207 (2013)
Almendra, V.: Finding the needle: a risk-based ranking of product listings at online auction sites for non-delivery fraud prediction. Expert Syst. Appl. 40(12), 4805–4811 (2013)
Richhariya, P., Singh, P.K.: Article: a survey on financial fraud detection methodologies. Intl. J. Comput. Appl. 45(22), 15–22 (2012)
Ravisankar, P., Ravi, V., Rao, G.R., Bose, I.: Detection of financial statement fraud and feature selection using data mining techniques. Decis. Support Syst. 50(2), 491–500 (2011)
Kamal, A.H.M., Zhu, X., Pandya, A., Hsu, S., Narayanan, R.: Feature selection for datasets with imbalanced class distributions. Int. J. Softw. Eng. Knowl. Eng. 20(02), 113–137 (2010)
Zhang, Y., Bian, J., Zhu, W.: Trust fraud: A crucial challenge for china e-commerce market. Electron. Commer. Res. Appl. 12(5), 299–308 (2013)
Chiu, C., Ku, Y., Lie, T., Chen, Y.: Internet auction fraud detection using social network analysis and classification tree approaches. Intl. J. Electronic Commerce 15(3), 123–147 (2011)
Keele, S.: Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE Technical Report. EBSE (2007)
Chen, X., Wasikowski, M.: Fast: a roc-based feature selection metric for small samples and imbalanced data classification problems. In: Proceedings of the 14th ACM SIGKDD Conference on Knowledge discovery and data mining, pp. 124–132. ACM (2008)
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Threshold-based feature selection techniques for high-dimensional bioinformatics data. Netw. Modeling Anal. Health Inform. Bioinform. 1(1–2), 47–61 (2012)
Cuaya, G., Muñoz-Meléndez, A., Morales, E.F.: A minority class feature selection method. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 417–424. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25085-9_49
Alibeigi, M., Hashemi, S., Hamzeh, A.: DBFS: an effective density based feature selection scheme for small sample size and high dimensional imbalanced data sets. Data Knowl. Eng. 81, 67–103 (2012)
Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, Heidelberg (2005)
Van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R.: Feature selection with high-dimensional imbalanced data. In: IEEE International Conference on Data Mining Workshops, 2009, ICDMW 2009, pp. 507–514. IEEE (2009)
Maldonado, S., Weber, R., Famili, F.: Feature selection for high-dimensional class-imbalanced data sets using SVM. Inf. Sci. 286, 228–246 (2014)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 17th International Conference on Machine Learning, ICML 2000, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Kelleher, J., Namee, B.M.: Information based learning (2011)
Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection. Chapman and Hall, Boca Raton (2008)
Mani, I., Zhang, I.: kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets (2003)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. 16, 321–357 (2002)
Maes, S., Tuyls, k., Vanschoenwinkel, B., Manderick, B.: Credit card fraud detection using bayesian and neural networks. Vrije Universiteir Brussel (2001)
Hosmer, D.W.: Applied Logistic Regression, 2nd edn. Wiley, New York (2000)
Dobson, A.J.: An Introduction to Generalized Linear Models. Chapman and Hall, London (1990)
Salzberg, S.: C4.5: Programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 16(3), 235–240 (1994)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Lima, R.A.F., Pereira, A.C.M.: Fraud detection in web transactions. In: WebMedia, pp. 273–280 (2012)
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11(1), 86–92 (1940)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statistical Soc. Ser. B (Methodological) 57, 289–300 (1995)
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11. Citeseer (2003)
Acknowledgment
This research was supported by the Brazilian National Institute of Science and Technology for the Web (CNPq grant numbers 573871/2008-6 and 477709/2012-5), MASWeb (grant FAPEMIG/PRONEX APQ-01400-14), EUBra-BIGSEA (H2020-EU.2.1.1 690116, Brazil/MCTI/RNP GA-000650/04), CAPES, CNPq, Fapemig and Universo OnLine Inc. (UOL).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Lima, R.F., Pereira, A.C.M. (2017). Feature Selection Approaches to Fraud Detection in e-Payment Systems. In: Bridge, D., Stuckenschmidt, H. (eds) E-Commerce and Web Technologies. EC-Web 2016. Lecture Notes in Business Information Processing, vol 278. Springer, Cham. https://doi.org/10.1007/978-3-319-53676-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-53676-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53675-0
Online ISBN: 978-3-319-53676-7
eBook Packages: Computer ScienceComputer Science (R0)