Artificial Intelligence and Law

, Volume 22, Issue 2, pp 175–209 | Cite as

Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence

  • Bettina BerendtEmail author
  • Sören Preibusch


Decision makers in banking, insurance or employment mitigate many of their risks by telling “good” individuals and “bad” individuals apart. Laws codify societal understandings of which factors are legitimate grounds for differential treatment (and when and in which contexts)—or are considered unfair discrimination, including gender, ethnicity or age. Discrimination-aware data mining (DADM) implements the hope that information technology supporting the decision process can also keep it free from unjust grounds. However, constraining data mining to exclude a fixed enumeration of potentially discriminatory features is insufficient. We argue for complementing it with exploratory DADM, where discriminatory patterns are discovered and flagged rather than suppressed. This article discusses the relative merits of constraint-oriented and exploratory DADM from a conceptual viewpoint. In addition, we consider the case of loan applications to empirically assess the fitness of both discrimination-aware data mining approaches for two of their typical usage scenarios: prevention and detection. Using Mechanical Turk, 215 US-based participants were randomly placed in the roles of a bank clerk (discrimination prevention) or a citizen / policy advisor (detection). They were tasked to recommend or predict the approval or denial of a loan, across three experimental conditions: discrimination-unaware data mining, exploratory, and constraint-oriented DADM (eDADM resp. cDADM). The discrimination-aware tool support in the eDADM and cDADM treatments led to significantly higher proportions of correct decisions, which were also motivated more accurately. There is significant evidence that the relative advantage of discrimination-aware techniques depends on their intended usage. For users focussed on making and motivating their decisions in non-discriminatory ways, cDADM resulted in more accurate and less discriminatory results than eDADM. For users focussed on monitoring for preventing discriminatory decisions and motivating these conclusions, eDADM yielded more accurate results than cDADM.


Discrimination discovery and prevention Data mining for decision support Discrimination-aware data mining Responsible data mining Evaluation User studies Online experiment Mechanical Turk 



We thank Brendan Van Alsenoy and Albrecht Zimmermann for many inspiring discussions and valuable comments on an earlier version of the paper, and the Flemish Agency for Innovation through Science and Technology (IWT) and the Fonds Wetenschappelijk Onderzoek—Vlaanderen (FWO) for support through the projects SPION (Grant Number 100048) resp. Data Mining for Privacy in Social Networks (Grant Number 65269).


  1. Alhadeff J, Van Alsenoy B, Dumortier J (2011) The accountability principle in data protection regulation: origin, development and future directions. Presented at the privacy and accountability 2011 conference, Berlin, 5–6 Apr 2011. 11 Oct 2013
  2. Arnott D (2006) Cognitive biases and decision support systems development: a design science approach. Inf Syst J 16(1):55–78CrossRefGoogle Scholar
  3. Avraham R, Logue KD, Schwarcz D (2013) Understanding insurance anti-discrimination laws. Technical report. U of Michigan law & econ research paper no. 12-017; U of Michigan public law research paper no. 289; U of Texas Law. Law and econ research paper no. 234; Minnesota legal studies research paper no. 12-45. 20 Aug 2013
  4. Berendt B (2012) More than modelling and hiding: towards a comprehensive view of web mining and privacy. Data Min Knowl Discov 24(3):697–737CrossRefGoogle Scholar
  5. Berendt B, Preibusch S (2012) Exploring discrimination: a user-centric evaluation of discrimination-aware data mining. In: Vreeken et al. (2012), pp 344–351Google Scholar
  6. Berendt B, Preibusch S, Teltzrow M (2008) A privacy-protecting business-analytics service for online transactions. Int J Electron Commer 12:115–150CrossRefGoogle Scholar
  7. Boston Consulting Group (2012) The value of our digital identity. Liberty global policy series. 20 Aug 2013
  8. Bresnahan J, Shapiro M (1966) A general equation and technique for the exact partitioning of chi-square contingency tables. Psychol Bull 66:252–262CrossRefGoogle Scholar
  9. Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min Knowl Discov 21(2):277–292CrossRefMathSciNetGoogle Scholar
  10. Chen JQ, Lee SM (2003) An exploratory cognitive DSS for strategic decision making. Decis Support Syst 36(2):147–160CrossRefGoogle Scholar
  11. Duhigg C (2009) What does your credit-card company know about you? New York Times, 12 May 2009. 20 Aug 2013
  12. Eickhoff C, de Vries AP (2013) Increasing cheat robustness of crowdsourcing tasks. Inf Retr 16(2):121–137CrossRefGoogle Scholar
  13. Erickson TA, Mattson ME (1981) From words to meaning: a semantic illusion. J Verbal Learn Verbal Behav 20:540–552CrossRefGoogle Scholar
  14. EU (2004/2012) Council Directive 2004/113/EC of 13 December 2004 implementing the principle of equal treatment between men and women in the access to and supply of goods and services. 20 Aug 2013
  15. EU (2006) Directive 2006/54/EC of the European Parliament and of the Council of 5 July 2006 on the implementation of the principle of equal opportunities and equal treatment of men and women in matters of employment and occupation. 20 Aug 2013
  16. European Commission (2012) How does the data protection reform strengthen citizens’ rights? 20 Aug 2013
  17. European Court of Justice (2011) Case C-236/09, Association Belge des Consommateurs Test-Achats ASBL and Others v Conseil des ministres. 20 Aug 2013
  18. Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT Press, Cambridge, MA, pp 1–34Google Scholar
  19. Federal Trade Commission (2012) Protecting consumer privacy in an era of rapid change: recommendations for businesses and policymakers. FTC report. 20 Aug 2013
  20. Fine C (2010) Delusions of gender. The real science behind sex differences. Icon Books, LondonGoogle Scholar
  21. Gao B, Berendt B (2011) Visual data mining for higher-level patterns: discrimination-aware data mining and beyond. In: Proceedings of the 20th machine learning conference of Belgium and The Netherlands. 20 Aug 2013
  22. Goodman J, Cryder C, Cheema A (2012) Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. J Behav Decis Mak 26:213–224CrossRefGoogle Scholar
  23. Gutwirth S, De Hert P (2006) Privacy, data protection and law enforcement. Opacity of the individual and transparency of power. In: Claes E, Duff A, Gutwirth S (eds) Privacy and the criminal law. Intersentia, Antwerp, pp 61–104Google Scholar
  24. Hajian S (2013) Simultaneous discrimination prevention and privacy protection in data publishing and mining. PhD thesis, Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, Tarragona, CataloniaGoogle Scholar
  25. Hajian S, Domingo-Ferrer J (2013) Direct and indirect discrimination prevention methods. In: Custers B, Calders T, Schermer B, Zarsky TZ (eds) Discrimination and privacy in the information society, studies in applied philosophy, epistemology and rational ethics, vol 3. Springer, Berlin, pp 241–254CrossRefGoogle Scholar
  26. Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459CrossRefGoogle Scholar
  27. Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Discrimination prevention in data mining for intrusion and crime detection. In: IEEE SSCI 2011Google Scholar
  28. Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: Vreeken et al. (2012), pp 360–369Google Scholar
  29. Heckerman D (2013) From wet to dry: how machine learning and big data are changing the face of biological sciences.
  30. Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of ICDM’10, pp 869–874Google Scholar
  31. Kamiran F, Karim A, Verwer S, Goudriaan H (2012) Classifying socially sensitive data without discrimination: an analysis of a crime suspect dataset. In: Vreeken et al. (2012), pp 370–377Google Scholar
  32. Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644CrossRefGoogle Scholar
  33. Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Considerations on fairness-aware data mining. In: Vreeken et al. (2012), pp 378–385Google Scholar
  34. Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: ECML/PKDD (2), LNCS, vol 7524, pp 35–50. SpringerGoogle Scholar
  35. Kaplan B (2001) Evaluating informatics applications—clinical decision support systems literature review. Int J Med Inform 64(1):15–37CrossRefGoogle Scholar
  36. Knudsen S (2006) Intersectionality—a theoretical inspiration in the analysis of minority cultures and identities in textbooks. In: Caught in the web or lost in the textbook, pp 61–76. 20 Aug 2013
  37. Lewis JR (1995) IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int J Hum–Comput Interact 7(1):57–78. 31 July 2012Google Scholar
  38. Luong BT (2011) Generalized discrimination discovery on semi-structured data supported by ontology. PhD thesis, IMT Institute for Advanced Studies, Lucca, ItalyGoogle Scholar
  39. Luong BT, Ruggieri S, Turini F (2011) k-nn as an implementation of situation testing for discrimination discovery and prevention. In: KDD, pp 502–510. ACMGoogle Scholar
  40. Mancuhan K, Clifton C (2012) Discriminatory decision policy aware classification. In: Vreeken et al. (2012), pp 386–393Google Scholar
  41. Marghescu D, Rajanen M, Back B (2004) Evaluating the quality of use of visual data-mining tools. In: Proceedings of 11th European conference on IT evaluation, 11–12 Nov 2004, Amsterdam, pp 239–250. Academic Conferences LimitedGoogle Scholar
  42. Microsoft (2012) New York City Police Department and Microsoft partner to bring real-time crime prevention and counterterrorism technology solution to global law enforcement agencies. 20 Aug 2013
  43. Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. GCD at 20 Aug 2013
  44. Park H, Reder ML (2004) Moses illusion. In: Pohl FR (ed) Cognitive illusions, pp 275–291. Psychology Press, LondonGoogle Scholar
  45. Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of KDD’08, pp 560–568. ACMGoogle Scholar
  46. Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: ICAIL, pp 157–166. ACMGoogle Scholar
  47. Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: SDM, pp 581–592Google Scholar
  48. Pedreschi D, Ruggieri S, Turini F (2012) A study of top-k measures for discrimination discovery. In: SAC ’12, pp 126–131. ACM, New York, NY, USAGoogle Scholar
  49. Perer A, Shneiderman B (2009) Integrating statistics and visualization for exploratory power: from long-term case studies to design guidelines. IEEE Comput Graphics Appl 29(3):39–51CrossRefGoogle Scholar
  50. Pitt G (2009) Genuine occupational requirements. EC anti-discrimination legislation for legal practitioners, 27–28 Apr 2009, Trier, Germany. 20 Aug 2013
  51. Plaisant C (2004) The challenge of information visualization evaluation. In: Costabile MF (ed) AVI, pp 109–116. ACM Press, New YorkGoogle Scholar
  52. Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev (to appear). doi: 10.1017/S0269888913000039
  53. Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. TKDD ACM Trans Knowl Discov 4(2):1–40Google Scholar
  54. Ruggieri S, Pedreschi D, Turini F (2010) DCUBE: discrimination discovery in databases. In: Proceedings of SIGMOD’10, pp 1127–1130Google Scholar
  55. Schanze E (2013) Injustice by generalization. Notes on the Test-Achats decision of the European Court of Justice. Ger Law J 14(2):423–433Google Scholar
  56. Sedlmair M, Meyer M, Munzner T (2012) Design study methodology: reflections from the trenches and the stacks. IEEE Trans Vis Comput Graphics 18(12):2431–2440Google Scholar
  57. Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4):13–22Google Scholar
  58. Sykes JB (ed) (1982) The concise Oxford dictionary, 7th edn. Oxford University Press, OxfordGoogle Scholar
  59. Vreeken J, Ling C, Zaki MJ, Siebes A, Yu JX, Goethals B, Webb GI, Wu X (eds) (2012) 12th IEEE ICDM workshops, Brussels, Belgium, 10 Dec 2012. IEEE Computer SocietyGoogle Scholar
  60. Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: Barbará D, Kamath C (eds) SDM. SIAM, Philadelphia, PAGoogle Scholar
  61. Zuccon G, Leelanupab T, Whiting S, Yilmaz E, Jose JM, Azzopardi L (2013) Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems. Inf Retr 16(2):267–305CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Department of Computer ScienceKU LeuvenLeuvenBelgium
  2. 2.Microsoft ResearchCambridgeUK

Personalised recommendations