Automobile Insurance Claims Auditing: A Comprehensive Survey on Handling Awry Datasets

Soufiane, Ezzaim; EL Baghdadi, Salah-Eddine; Berrahou, Aissam; Mesbah, Abderrahim; Berbia, Hassan

doi:10.1007/978-981-33-6893-4_13

Automobile Insurance Claims Auditing: A Comprehensive Survey on Handling Awry Datasets

Ezzaim Soufiane ORCID: orcid.org/0000-0002-5974-6099³⁹,
Salah-Eddine EL Baghdadi³⁹,
Aissam Berrahou³⁹,
Abderrahim Mesbah³⁹ &
…
Hassan Berbia³⁹

Conference paper
First Online: 22 July 2021

1300 Accesses

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 745))

Abstract

Fraud is a very costly criminal activity. Insurance companies face the very challenging task of identifying and preventing fraudulent claims. Just like any big problem in recent years, Machine Learning has been heavily applied to fraud detection in both a supervised and non-supervised manner. But, usually supervised models do not perform well in the presence of awry, asymmetrical Datasets. This paper presents a novel approach for auditing claims in automobile insurance. Our data pipeline consists of preprocessing, feature selection, data balancing, and classification. This robust fraud detection model, built upon existing fraud detection research, gives very promising results compared to state of the art in the industry.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

FBI annual reports and publications, Insurance fraud, 2016 https://www.fbi.gov/stats-services/publications/insurance-fraud
Stolfo SJ, Prodromidis AL, Tselepis S, Lee W, Fan DW (1997a) JAM: Java agents for meta-learning over distributed databases. AAAI workshop on AI approaches to fraud detection. In: Proceedings of the 3rd international conference knowledge discovery and data mining, pp 74–81
Google Scholar
Phua C, Alahakoon D, Lee V, Minority report in fraud detection: classification of skewed data. ACMSIGKDD Explore Newslett 6(1):50–59
Google Scholar
Pinquet J, Ayuso M, Guillen M (2007) Selection bias and auditing policies for insurance claims. J Risk Insur 74:425–440
Article Google Scholar
Nian K, Zhang H, Tayal A, Coleman T, Li Y (2016) Auto insurance fraud detection using unsupervised spectral ranking for anomaly. J Finan Data Sci, 58–75
Google Scholar
Subudhi S, Panigrahi S (2017) Use of optimized fuzzy C-means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ Comput Inf Sci
Google Scholar
Polikar R (2006) Ensemble based systems in decision making. IEEE Circ Syst Mag 6(3):21–45
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp 1322–1328
Google Scholar
Barua S, Islam MM, Yao X, Kazuyuki (2014) MWMOTE-majority weighted oversampling technique for imbalanced dataset learning. IEEE Trans Knowl Data Eng 26(2)
Google Scholar
Han H, Wen-Yuan W, Bing-Huan M (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Adv Intell Comput, 878–887
Google Scholar
Nguyen HM, Cooper EW, Kamei K (2009) Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradig 3(1):4–21
Article Google Scholar
Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. Springer, Berlin
Book Google Scholar
Mani I, Zhang I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets
Google Scholar
Tomek I (2010) Two modifications of CNN. Syst Man Cybern IEEE Trans 6:769–772
MathSciNet MATH Google Scholar
Wilson D (1972) AsymptoticProperties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Article Google Scholar
Tomek I (1976) An Experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6(6):448–452
MathSciNet MATH Google Scholar
Smith D, Michael R, Martinez T, Christophe G-C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256
Article MathSciNet Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. ICML 97:179–186
Google Scholar
Jin X, Xu A, Bie R, Guo P (2006) Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In : BioDM’06: proceedings of the 2006 international conference on data mining for biomedical applications, April 2006, pp 106–115
Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
Article Google Scholar
Baranauskas JA, Netto OP (2017) A tree-based algorithm for attribute selection, Sérgio Ricardo Nozawa & Alessandra Alaniz Macedo
Google Scholar
Deng H, Runger G (2012) Feature selection via regularized trees. In: Proceedings of the 2012 international joint conference on neural networks (IJCNN). IEEE
Google Scholar
Sundarkumar GG, Ravi V, Siddeshwar V (2015) One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In : IEEE international conference on computational intelligence and computing research (ICCIC)
Google Scholar
Padhi S, Panigrahi S (2019) Use of data mining techniques for data balancing and fraud detection in automobile insurance claims. In: Bhateja V et al (eds) Intelligent computing and communication, advances in intelligent systems and computing 1034. Springer Nature Singapore Pte Ltd., page 221
Google Scholar
Farquad MAH, RaviS V, Bapi Raju (2010) Support vector regression based hybrid rule extraction methods for forecasting. https://doi.org/10.1016/j.eswa.2010.02.055
Xu W, Wang S, Zhang D, Yang B (2011) Random rough subspace based neural network ensemble for insurance fraud detection. In Fourth international joint conference on computational science and optimization. IEEE, pp 1276–1280
Google Scholar
Subelj L, Furlan S, Bajec M, An expert system for detecting automobile insurance fraud using social network analysis
Google Scholar
Tao H, Zhixin L, Xiaodong S (2012) Insurance fraud identification research based on fuzzy support vector machine with dual membership. In: 2012 international conference on information management, innovation management and industrial engineering
Google Scholar
Sundarkumar GG, Ravi V (2015) A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. In: Engineering applications of artificial intelligence
Google Scholar
Tian X, Insurance fraud detection: an exploratory data mining approach. In: Southwest decision sciences institute 48th annual meeting
Google Scholar
Itri B, Mohamed Y, Mohammed Q, Bouattane Q (2019) Performance comparative study of machine learning algorithms for automobile insurance fraud detection. In: Conference: 2019 third international conference on intelligent computing in data sciences (ICDS)
Google Scholar

Download references

Author information

Authors and Affiliations

Mohammed V University, United Nations Avenue, Rabat, 10000, Morocco
Ezzaim Soufiane, Salah-Eddine EL Baghdadi, Aissam Berrahou, Abderrahim Mesbah & Hassan Berbia

Authors

Ezzaim Soufiane
View author publications
You can also search for this author in PubMed Google Scholar
Salah-Eddine EL Baghdadi
View author publications
You can also search for this author in PubMed Google Scholar
Aissam Berrahou
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahim Mesbah
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Berbia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ezzaim Soufiane .

Editor information

Editors and Affiliations

Sidi Mohamed Ben Abdellah University, Fez, Morocco
Saad Bennani
Sidi Mohamed Ben Abdellah University, Fez, Morocco
Younes Lakhrissi
Sidi Mohamed Ben Abdellah University, Fez, Morocco
Ghizlane Khaissidi
Sidi Mohamed Ben Abdellah University, Fez, Morocco
Anass Mansouri
Sidi Mohamed Ben Abdellah University, Fez, Morocco
Youness Khamlichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soufiane, E., EL Baghdadi, SE., Berrahou, A., Mesbah, A., Berbia, H. (2022). Automobile Insurance Claims Auditing: A Comprehensive Survey on Handling Awry Datasets. In: Bennani, S., Lakhrissi, Y., Khaissidi, G., Mansouri, A., Khamlichi, Y. (eds) WITS 2020. Lecture Notes in Electrical Engineering, vol 745. Springer, Singapore. https://doi.org/10.1007/978-981-33-6893-4_13

Download citation

DOI: https://doi.org/10.1007/978-981-33-6893-4_13
Published: 22 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6892-7
Online ISBN: 978-981-33-6893-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics