Extreme Sample Classification and Credit Card Fraud Detection

  • José R. Dorronsoro
  • Ana M. González
  • Carlos Santa Cruz
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 105)


Credit card fraud detection is an obviously difficult problem. There are two reasons for that. The first one is the overwhelming majority of good operations over fraudulent ones. The second one is the similarity of many bad operations to legal ones. In other words, to catch a fraudulent operation is akin to find needles in a haystack, only that some needles are in fact hay! In this type of problems (that we term below as Extreme Sample problems) well established methods for classifier construction, such as Multilayer Perceptrons (MLPs), may fail. Non Linear Discriminant Analysis, an alternative method, is described here and some issues pertaining to its practical use, such as fast convergence and architecture selection, are also discussed. Its performance is also compared with that of MLPs over Extreme Sample problems, and it is shown that it gives better results both over synthetic data and on credit card fraud.


Hide Layer Linear Discriminant Analysis Credit Card Hide Unit Fisher Information Matrix 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amari S (1985) Differential Geometric Methods in Statistics. Lecture Notes in Statistics 28, Springer-Verlag.Google Scholar
  2. 2.
    Amari S (1998) Natural Gradient Works Efficiently in Learning. Neural Computation 10: 251–276.CrossRefGoogle Scholar
  3. 3.
    Bernard E, Botha EC (1993) Backpropagation uses prior information efficiently. IEEE Trans. in Neural Networks 4: 794–802.CrossRefGoogle Scholar
  4. 4.
    Bernard E, Casasent D (1989) A comparison between criterion functions with an application to neural nets. IEEE Trans. in Systems, Man and Cybernetics 19: 1030–1041.CrossRefGoogle Scholar
  5. 5.
    Bourlard HA, Morgan N (1994) Connectionist Speech Recognition. Kluwer.Google Scholar
  6. 6.
    Dorronsoro J, Ginel F, Sanchez C, Santa Cruz C (1997) Neural Fraud Detection in Credit Card Operations. IEEE Trans. in Neural Networks 8: 827–834.CrossRefGoogle Scholar
  7. 7.
    Dorronsoro J, Gonzlez A, Santa Cruz C (2001) Natural gradient learning in NLDA networks. In: Proceedings of the 2001 IWANN Conference, Lecture Notes in Computer Science 2084. Springer Verlag, pp 427–434.Google Scholar
  8. 8.
    Dorronsoro J, Gonzlez A, Santa Cruz C (2001) Arquitecture selection in NLDA networks. In: Proceedings of the 2001 Internationa Conference on Artifical Neural Networks, Lecture Notes in Computer Science 2130. Springer Verlag, pp 27–32.Google Scholar
  9. 9.
    Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley.Google Scholar
  10. 10.
    Fukunaga K (1972) Introduction to Statistical Pattern Recognition. Academic Press.Google Scholar
  11. 11.
    Geman S, Bienenstock E, Doursat R ((1992) Neural networks and the bias/variance dilemma. Neural Computation 4: 1–58.Google Scholar
  12. 12.
    Golden R (1996). Mathematical Models for Neural Network Analysis and Design. MIT Press.Google Scholar
  13. 13.
    Lawrence S, Burns I, Back A, Tsoi A, Giles C (1998). Neural network classification and prior class probabilities. In: Lecture Notes in Computer Science State—of—the—Art Surveys. Springer, pp 299–314.Google Scholar
  14. 14.
    Manoukian E (1986) Modern Concepts and Theorems of Mathematical Statistics. Springer.Google Scholar
  15. 15.
    Mardia K, Kent J, Bibby J (1979) Multivariate Analysis. Academic Press.Google Scholar
  16. 16.
    Murray M, Rice J (1993) Differential Geometry and Statistics. Chapman and Hall.Google Scholar
  17. 17.
    Park H, Amari S, Fukumizu K (2000) Adaptive Natural Gradient Learning Algorithms for Various Stochastic Models. Neural Networks 13: 755–764.CrossRefGoogle Scholar
  18. 18.
    Press W, Flannery B, Teukolski S, Vetterling W (1992) Numerical Recipes in C. Cambridge U. Press.Google Scholar
  19. 19.
    Richard M, Lippmann R (1991), Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation 3: 461–483.CrossRefGoogle Scholar
  20. 20.
    Rao C (1973) Linear Statistical Inference and its Applications. Wiley.Google Scholar
  21. 21.
    Ripley B (1996) Pattern Recognition and Neural Networks. Cambridge University Press.Google Scholar
  22. 22.
    Ruck D, Rogers S, Kabrisky K, Oxley M, Suter B (1990) The multilayer perceptron as an approximation to an optimal Bayes estimator. IEEE Trans. in Neural Networks 1: 296–298.Google Scholar
  23. 23.
    Santa Cruz C, Dorronsoro J (1998) A non-linear discriminant algorithm for data projection and feature extraction. IEEE Trans. in Neural Networks 9: 1370–1376.Google Scholar
  24. 24.
    Webb A, Lowe D (1990) The optimised internal representation of multilayer classifier networks performs non-linear discriminant analysis. Neural Networks 3: 367–375.CrossRefGoogle Scholar
  25. 25.
    White H (1989) Learning in artificial neural networks: a statistical perspective, Neural Computation 1: 425–464.CrossRefGoogle Scholar
  26. 26.
    Yaeger L, Lyon R, Webb B (1997) Effective training of a neural network character classifier for word recognition. In: Advances in Neural Information Processing Systems 9. MIT Press.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • José R. Dorronsoro
    • 1
  • Ana M. González
    • 1
  • Carlos Santa Cruz
    • 1
  1. 1.Department of Computer Engineering and Instituto de Ingeniería del ConocimientoUniversidad Autónoma de MadridMadridSpain

Personalised recommendations