WITS 2020 pp 135-144 | Cite as

Automobile Insurance Claims Auditing: A Comprehensive Survey on Handling Awry Datasets

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 745)


Fraud is a very costly criminal activity. Insurance companies face the very challenging task of identifying and preventing fraudulent claims. Just like any big problem in recent years, Machine Learning has been heavily applied to fraud detection in both a supervised and non-supervised manner. But, usually supervised models do not perform well in the presence of awry, asymmetrical Datasets. This paper presents a novel approach for auditing claims in automobile insurance. Our data pipeline consists of preprocessing, feature selection, data balancing, and classification. This robust fraud detection model, built upon existing fraud detection research, gives very promising results compared to state of the art in the industry.


Insurance fraud Imbalanced dataset Automobile insurance Supervised learning 


  1. 1.
    FBI annual reports and publications, Insurance fraud, 2016
  2. 2.
    Stolfo SJ, Prodromidis AL, Tselepis S, Lee W, Fan DW (1997a) JAM: Java agents for meta-learning over distributed databases. AAAI workshop on AI approaches to fraud detection. In: Proceedings of the 3rd international conference knowledge discovery and data mining, pp 74–81Google Scholar
  3. 3.
    Phua C, Alahakoon D, Lee V, Minority report in fraud detection: classification of skewed data. ACMSIGKDD Explore Newslett 6(1):50–59Google Scholar
  4. 4.
    Pinquet J, Ayuso M, Guillen M (2007) Selection bias and auditing policies for insurance claims. J Risk Insur 74:425–440CrossRefGoogle Scholar
  5. 5.
    Nian K, Zhang H, Tayal A, Coleman T, Li Y (2016) Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J Finan Data Sci, 58–75Google Scholar
  6. 6.
    Subudhi S, Panigrahi S (2017) Use of optimized fuzzy C-means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ Comput Inf SciGoogle Scholar
  7. 7.
    Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3):21–45CrossRefGoogle Scholar
  8. 8.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefGoogle Scholar
  9. 9.
    He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328Google Scholar
  10. 10.
    Barua S, Islam MM, Yao X, Kazuyuki (2014) MWMOTE-majority weighted oversampling technique for imbalanced dataset learning. IEEE Trans Knowl Data Eng 26(2)Google Scholar
  11. 11.
    Han H, Wen-Yuan W, Bing-Huan M (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Adv Intell Comput, 878–887Google Scholar
  12. 12.
    Nguyen HM, Cooper EW, Kamei K (2009) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig 3(1):4–21CrossRefGoogle Scholar
  13. 13.
    Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. Springer, BerlinCrossRefGoogle Scholar
  14. 14.
    Mani I, Zhang I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasetsGoogle Scholar
  15. 15.
    Tomek I (2010) Two modifications of CNN. Syst Man Cybern IEEE Trans 6:769–772MathSciNetzbMATHGoogle Scholar
  16. 16.
    Wilson D (1972) AsymptoticProperties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421CrossRefGoogle Scholar
  17. 17.
    Tomek I (1976) An Experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6(6):448–452MathSciNetzbMATHGoogle Scholar
  18. 18.
    Smith D, Michael R, Martinez T, Christophe G-C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetCrossRefGoogle Scholar
  19. 19.
    Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. ICML 97:179–186Google Scholar
  20. 20.
    Jin X, Xu A, Bie R, Guo P (2006) Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In : BioDM’06: proceedings of the 2006 international conference on data mining for biomedical applications, April 2006, pp 106–115Google Scholar
  21. 21.
    Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422CrossRefGoogle Scholar
  22. 22.
    Baranauskas JA, Netto OP (2017) A tree-based algorithm for attribute selection, Sérgio Ricardo Nozawa & Alessandra Alaniz MacedoGoogle Scholar
  23. 23.
    Deng H, Runger G (2012) Feature selection via regularized trees. In: Proceedings of the 2012 international joint conference on neural networks (IJCNN). IEEEGoogle Scholar
  24. 24.
    Sundarkumar GG, Ravi V, Siddeshwar V (2015) One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In : IEEE international conference on computational intelligence and computing research (ICCIC)Google Scholar
  25. 25.
    Padhi S, Panigrahi S (2019) Use of data mining techniques for data balancing and fraud detection in automobile insurance claims. In: Bhateja V et al (eds) Intelligent computing and communication, advances in intelligent systems and computing 1034. Springer Nature Singapore Pte Ltd., page 221Google Scholar
  26. 26.
    Farquad MAH, RaviS V, Bapi Raju (2010) Support vector regression based hybrid rule extraction methods for forecasting.
  27. 27.
    Xu W, Wang S, Zhang D, Yang B (2011) Random rough subspace based neural network ensemble for insurance fraud detection. In Fourth international joint conference on computational science and optimization. IEEE, pp 1276–1280Google Scholar
  28. 28.
    Subelj L, Furlan S, Bajec M, An expert system for detecting automobile insurance fraud using social network analysisGoogle Scholar
  29. 29.
    Tao H, Zhixin L, Xiaodong S (2012) Insurance fraud identification research based on fuzzy support vector machine with dual membership. In: 2012 international conference on information management, innovation management and industrial engineeringGoogle Scholar
  30. 30.
    Sundarkumar GG, Ravi V (2015) A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. In: Engineering applications of artificial intelligenceGoogle Scholar
  31. 31.
    Tian X, Insurance fraud detection: an exploratory data mining approach. In: Southwest decision sciences institute 48th annual meetingGoogle Scholar
  32. 32.
    Itri B, Mohamed Y, Mohammed Q, Bouattane Q (2019) Performance comparative study of machine learning algorithms for automobile insurance fraud detection. In: Conference: 2019 third international conference on intelligent computing in data sciences (ICDS)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2022

Authors and Affiliations

  1. 1.Mohammed V UniversityRabatMorocco

Personalised recommendations