Supervised Learning Methods for Fraud Detection in Healthcare Insurance

  • Prerna DuaEmail author
  • Sonali Bais
Part of the Intelligent Systems Reference Library book series (ISRL, volume 56)


Fraud in the healthcare system is a major problem whose rampant growth has deeply affected the US government. In addition to financial losses incurred due to this fraud, patients who genuinely need medical care suffer because of unavailability of services which in turn incur due lack of funds. Healthcare fraud is committed in different ways at different levels, making the fraud detection process more challenging. The data used for detecting healthcare fraud, primarily provided by insurance companies, is massive, making it impossible to audit manually for fraudulent behavior. Data-mining and Machine-Learning techniques holds the promise to provide sophisticated tools for the analysis of fraudulent patterns in these vast health insurance databases. Among the data mining methodologies, supervised classification has emerged as a key step in understanding the activity of fraudulent and non-fraudulent transactions as they can be trained and adjusted to detect complex and growing fraud schemes. This chapter provides a comprehensive survey of those data-mining fraud detection models based on supervised machine-learning techniques for fraud detection in healthcare.


Healthcare fraud Fraud detection Supervised methods Unsupervised methods 


  1. 1.
    CMS (2011) Research, statistics, data and systems: national health expenditure data. NHE fact sheetGoogle Scholar
  2. 2.
    CMS (2011) Medicare: HCPCS–general informationGoogle Scholar
  3. 3.
    FBI (2009) Reports and publications: 2009 financial crimes reportGoogle Scholar
  4. 4.
    NHCAA (2007) The NHCAA fraud fighter’s handbook: a guide to health care fraud investigations and SIU operationsGoogle Scholar
  5. 5.
    IMF (2008) World economic and financial surveys: world economic outlookGoogle Scholar
  6. 6.
    Database NHCAA (2010) Combating health care fraud in a post-reform world: seven guiding principles for policymakersGoogle Scholar
  7. 7.
    NHCAA The problem of health care fraud, consumer alert: the impact of health care fraud on you, report of national health care anti-fraud association (NHCAA)Google Scholar
  8. 8.
    Koh H, Tan G (2005) Data mining applications in healthcare. j healthc inf mgmt 19(2):64–72Google Scholar
  9. 9.
    OIG (2011) Medical fraud cases: OIG most wanted fugitiveGoogle Scholar
  10. 10.
    He H, Hawkins S, Graco W, Yao X (2000) Application of genetic algorithms and k-nearest neighbor method in real world medical fraud detection problem. J Adv Comput Intell Intell Inf 4(2):130–137Google Scholar
  11. 11.
    Chan CL, Lan CH (2001) A data mining technique combining fuzzy sets theory and bayesian classifier—an application of auditing the health insurance fee. In: Proceedings of the International conference on artificial intelligence, pp 402–408Google Scholar
  12. 12.
    Ormerod T, Morley N, Ball L, Langley C, Spenser C (2003) Using ethnography to design a mass detection tool (MDT) for the early discovery of insurance fraud. In: Proceedings of the ACM CHI conference, 650–651Google Scholar
  13. 13.
    Ortega PA, Figueroa CJ, Ruz GA (2006) A medical claim fraud/abuse detection system based on data mining: a case study in chile. In: Proceedings of international conference on data mining, 224–231Google Scholar
  14. 14.
    Viveros MS, Nearhos JP, Rothman MJ (1996) Applying data mining techniques to a health insurance information system. In: Proceedings of the 22nd VLDB conference, Mumbai, India, pp 286–294Google Scholar
  15. 15.
    Yang WS, Hwang SY (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68CrossRefGoogle Scholar
  16. 16.
    Liou F, Tang Y, Chen J (2008) Detecting hospital fraud and claim abuse through diabetic outpatient services. Health Care Manage Sci, 353–358Google Scholar
  17. 17.
    Shan Y, Jeacocke D, Murray D, Sutinen (2008) A mining medical specialist billing patterns for health service management. In: Roddick J, Li J, Christen P, Kennedy P, (eds) Proceeding 7th Australasian data mining conference (AusDM 2008), Glenelg, South Australia. CRPIT, 87. ACS 105–110Google Scholar
  18. 18.
    Sokol L, Garcia B, West M, Rodriguez J, Johnson K (2001) Precursory steps to mining HCFA health care claims. In: Proceedings of the 34th Hawaii International conference on system sciencesGoogle Scholar
  19. 19.
    Yang WS (2002) Process analyzer and its application on medical care. In: Proceedings of 23rd International conference on information systems (ICIS02), SpainGoogle Scholar
  20. 20.
    Li J, Huang K, Jin J, Shi J (2008) A survey on statistical methods for health care fraud detection. Health Care Manage Sci, 275–287Google Scholar
  21. 21.
    Table F, Raineri A, Maturana S, Kaempffer A (2008) Fraud in the health systems of chile: a detection model. Am J Public Health, pp 56–61Google Scholar
  22. 22.
    Ghahramani Z (2004) Unsupervised learningGoogle Scholar
  23. 23.
    Rosella (2011) Predictive knowledge and data mining: healthcare fraud detectionGoogle Scholar
  24. 24.
    Hall C (1996) Intelligent data mining at IBM: new products and applications. Intell Softw Strateg 7(5):1–11Google Scholar
  25. 25.
    Report on the use of health information technology to enhance and expand health care anti-fraud activities. Foundation of research and education of AHIMAGoogle Scholar
  26. 26.
    FBI (2011) Scams and Safety: common fraud schemesGoogle Scholar
  27. 27.
    London: The Guardian (2007) The mystery of John DarwinGoogle Scholar
  28. 28.
    Herb Denenberg (2005) The denenberg report: the insurance commissioners, other government agencies, and the insurance companies focus on insurance fraud committed by policyholders, but nothing is done about the multi-billion dollar racket of insurance fraud committed by insurance companiesGoogle Scholar
  29. 29.
    Bhuvaneswari R, Kalaiselvi K (2012) naive bayesian classification approach in healthcare applications. Int j comput sci telecommun, 3(1):106–112Google Scholar
  30. 30.
    Silver M, Sakata T, Su HC, Herman C, Dolins SB, O’Shea MJ (2001) Case study: how to apply data mining techniques in a healthcare dataware house. J Healthcare Inf Manage 15(2):155–164Google Scholar
  31. 31.
    Relles D, Ridgeway G, Carter G (2002) Data mining and the implementation of a prospective payment system for inpatient rehabilitation. Health Serv Outcomes Res Method 3(3–4):247–266CrossRefGoogle Scholar
  32. 32.
    Anonymous (1999) Texas medicaid fraud and abuse detection system recovers $2.2 million, wins national award. Health Manag Technol 20(10):8Google Scholar
  33. 33.
    Tu J (1995) Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol pp 1225–1231Google Scholar
  34. 34.
    Lewis R (2000) An introduction to classification and regression tree (CART) analysis. Presented at annual meeting of the society for academic emergency medicineGoogle Scholar
  35. 35.
    Nayak J, Cook D (2001) Approximate association rule mining. In: Proceedings of the 14th International florida artificial intelligence research society conferenceGoogle Scholar
  36. 36.
    Cunningham P, Delany S (2007) k-Nearest neighbour classifiers. Technical report, UCD-CSI-2007-4Google Scholar
  37. 37.
    Russel S, Norvig P (2003) Artificial intelligence: a modern approach. Prentice-Hall, 2nd editionGoogle Scholar
  38. 38.
    Vose D (1995) The simple genetic algorithm: foundations and theoryGoogle Scholar
  39. 39.
    Berger J (2006) The case for objective bayesian analysis. Bayesian Anal 1(3):385–402MathSciNetGoogle Scholar
  40. 40.
    Vick K (2009) As rescissions spawn outrage, health insurers cite fraud control. The Washington post,, Information Accessed on May 2012
  41. 41.
    Jeffries D, Zaidi I, Jong B, Holland M, Miles D (2008) Analysis of flow cytometry data using an automatic processing tool. Cytometry Part A 73A:857–867CrossRefGoogle Scholar
  42. 42.
    Larose D (2005) Discovering knowledge in data, An introduction to data mining. Wiley InterScienceGoogle Scholar
  43. 43.
    Niedermaye D (2008) An introduction to bayesian networks and their contemporary applications, innovations in bayesian networks. Springer, pp 117–130Google Scholar
  44. 44.
    De Jong KS, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach learn 13:161–188CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Department of Health Informatics and Information ManagementLouisiana Tech UniversityRustonUSA
  2. 2.School of Biological SciencesLouisiana Tech UniversityRustonUSA
  3. 3.Department of Computer ScienceLouisiana Tech UniversityRustonUSA

Personalised recommendations