Skip to main content
Log in

Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Decision makers in banking, insurance or employment mitigate many of their risks by telling “good” individuals and “bad” individuals apart. Laws codify societal understandings of which factors are legitimate grounds for differential treatment (and when and in which contexts)—or are considered unfair discrimination, including gender, ethnicity or age. Discrimination-aware data mining (DADM) implements the hope that information technology supporting the decision process can also keep it free from unjust grounds. However, constraining data mining to exclude a fixed enumeration of potentially discriminatory features is insufficient. We argue for complementing it with exploratory DADM, where discriminatory patterns are discovered and flagged rather than suppressed. This article discusses the relative merits of constraint-oriented and exploratory DADM from a conceptual viewpoint. In addition, we consider the case of loan applications to empirically assess the fitness of both discrimination-aware data mining approaches for two of their typical usage scenarios: prevention and detection. Using Mechanical Turk, 215 US-based participants were randomly placed in the roles of a bank clerk (discrimination prevention) or a citizen / policy advisor (detection). They were tasked to recommend or predict the approval or denial of a loan, across three experimental conditions: discrimination-unaware data mining, exploratory, and constraint-oriented DADM (eDADM resp. cDADM). The discrimination-aware tool support in the eDADM and cDADM treatments led to significantly higher proportions of correct decisions, which were also motivated more accurately. There is significant evidence that the relative advantage of discrimination-aware techniques depends on their intended usage. For users focussed on making and motivating their decisions in non-discriminatory ways, cDADM resulted in more accurate and less discriminatory results than eDADM. For users focussed on monitoring for preventing discriminatory decisions and motivating these conclusions, eDADM yielded more accurate results than cDADM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Sections 2 and 3.13.3 extend on a previous workshop paper (Berendt and Preibusch 2012), and Sect. 3.4 summarises the user study presented in detail in that paper.

  2. Otherwise called, e.g., “potentially discriminatory (PD) items” (Pedreschi et al. 2008) or “sensitive attributes” (Hajian and Domingo-Ferrer 2013; Kamiran et al. 2010). A feature or item is an attribute with a value or value range; thus for example “gender” is an attribute and “female” a feature. All three terms refer to the formal representation of legal grounds of discrimination (the reasons specified by the law that will serve as a basis for demanding relief) and other grounds in the databases used for data mining. While Pedreschi et al. (2008) point out that PD items may comprise more than just legally-defined sensitive attributes, they still assume a priori knowledge about these items.

  3. “Bad patterns” correspond to, e.g., “α-discriminatory rules” in Pedreschi et al. (2008).

  4. See for example Hajian et al. (2011), Kamiran et al. (2010) for measures of utility.

  5. E.g. the “actuarial factors related to sex” discussed in Sect. 2.1.

  6. E.g. “Differences in treatment may be accepted only if they are justified by a legitimate aim. A legitimate aim may, for example, be the protection of victims of sex-related violence (in cases such as the establishment of single-sex shelters), reasons of privacy and decency (in cases such as the provision of accommodation by a person in a part of that person’s home), the promotion of gender equality or of the interests of men or women (for example single-sex voluntary bodies), the freedom of association (in cases of membership of single-sex private clubs), and the organisation of sporting activities (for example single-sex sports events).” (EU 2004, Recital (16)).

  7. E.g. “Any limitation should nevertheless be appropriate and necessary in accordance with the criteria derived from case law of the Court of Justice of the European Communities.” (EU 2004, Recital (16))

  8. We claim this analogy due to the focus on hiding and sanitising patterns that privacy-preserving and discrimination-aware data mining share. However, using one does not imply the other, and their relation is in general non-trivial (Hajian 2013; Hajian et al. 2012).

  9. Our focus was not on analysing any specific true lending data, but on how people deal with data mining results that in reality often are or seem to be non-causal, with correlations often going against common sense and referring to features that act as a positive risk factor in one rule and as a negative risk factor in another one. However, we wanted to create a possible loan-related model. We therefore used the attributes of the German Credit Dataset (Newman et al. 1998) as well as their values, and added further values to create a sufficient number of features (for example, we converted the binary “foreign worker” attribute into a multi-valued attribute specifying the country of origin of the loan applicant).

  10. The US Census 2012 reports: 85 % (compared to our 98 %) “high school or more”, 28 % (compared to our 44 %)“Bachelor’s degree or more”, 10 % (compared to our 6 %)“advanced degree or more”. (http://www.census.gov/compendia/statab/2012/tables/12s0233).

  11. All results reported as significant in the following were significant at α = .01.

  12. The original observation was that when asked “How many animals of each kind did Moses take on the Ark,” most people respond “two,” even though they know that it was Noah, not Moses, who took the animals on the Ark.

  13. Due to the exploratory nature of this analysis, we did not test these values for statistical significance.

References

  • Alhadeff J, Van Alsenoy B, Dumortier J (2011) The accountability principle in data protection regulation: origin, development and future directions. Presented at the privacy and accountability 2011 conference, Berlin, 5–6 Apr 2011. http://ssrn.com/abstract=1933731. 11 Oct 2013

  • Arnott D (2006) Cognitive biases and decision support systems development: a design science approach. Inf Syst J 16(1):55–78

    Article  Google Scholar 

  • Avraham R, Logue KD, Schwarcz D (2013) Understanding insurance anti-discrimination laws. Technical report. U of Michigan law & econ research paper no. 12-017; U of Michigan public law research paper no. 289; U of Texas Law. Law and econ research paper no. 234; Minnesota legal studies research paper no. 12-45. http://dx.doi.org/10.2139/ssrn.2135800. 20 Aug 2013

  • Berendt B (2012) More than modelling and hiding: towards a comprehensive view of web mining and privacy. Data Min Knowl Discov 24(3):697–737

    Article  Google Scholar 

  • Berendt B, Preibusch S (2012) Exploring discrimination: a user-centric evaluation of discrimination-aware data mining. In: Vreeken et al. (2012), pp 344–351

  • Berendt B, Preibusch S, Teltzrow M (2008) A privacy-protecting business-analytics service for online transactions. Int J Electron Commer 12:115–150

    Article  Google Scholar 

  • Boston Consulting Group (2012) The value of our digital identity. Liberty global policy series. http://www.lgi.com/PDF/public-policy/The-Value-of-Our-Digital-Identity.pdf. 20 Aug 2013

  • Bresnahan J, Shapiro M (1966) A general equation and technique for the exact partitioning of chi-square contingency tables. Psychol Bull 66:252–262

    Article  Google Scholar 

  • Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min Knowl Discov 21(2):277–292

    Article  MathSciNet  Google Scholar 

  • Chen JQ, Lee SM (2003) An exploratory cognitive DSS for strategic decision making. Decis Support Syst 36(2):147–160

    Article  Google Scholar 

  • Duhigg C (2009) What does your credit-card company know about you? New York Times, 12 May 2009. http://www.nytimes.com/2009/05/17/magazine/17credit-t.html?pagewanted=all&_r=0. 20 Aug 2013

  • Eickhoff C, de Vries AP (2013) Increasing cheat robustness of crowdsourcing tasks. Inf Retr 16(2):121–137

    Article  Google Scholar 

  • Erickson TA, Mattson ME (1981) From words to meaning: a semantic illusion. J Verbal Learn Verbal Behav 20:540–552

    Article  Google Scholar 

  • EU (2004/2012) Council Directive 2004/113/EC of 13 December 2004 implementing the principle of equal treatment between men and women in the access to and supply of goods and services. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2004:373:0037:0043:EN:PDF. 20 Aug 2013

  • EU (2006) Directive 2006/54/EC of the European Parliament and of the Council of 5 July 2006 on the implementation of the principle of equal opportunities and equal treatment of men and women in matters of employment and occupation. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2006:204:0023:0036:EN:PDF. 20 Aug 2013

  • European Commission (2012) How does the data protection reform strengthen citizens’ rights? http://ec.europa.eu/justice/data-protection/document/review2012/factsheets/2_en.pdf. 20 Aug 2013

  • European Court of Justice (2011) Case C-236/09, Association Belge des Consommateurs Test-Achats ASBL and Others v Conseil des ministres. http://curia.europa.eu/juris/liste.jsf?language=en&num=C-236/09. 20 Aug 2013

  • Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT Press, Cambridge, MA, pp 1–34

  • Federal Trade Commission (2012) Protecting consumer privacy in an era of rapid change: recommendations for businesses and policymakers. FTC report. http://www.ftc.gov/os/2012/03/120326privacyreport.pdf. 20 Aug 2013

  • Fine C (2010) Delusions of gender. The real science behind sex differences. Icon Books, London

    Google Scholar 

  • Gao B, Berendt B (2011) Visual data mining for higher-level patterns: discrimination-aware data mining and beyond. In: Proceedings of the 20th machine learning conference of Belgium and The Netherlands. http://www.benelearn2011.org/. 20 Aug 2013

  • Goodman J, Cryder C, Cheema A (2012) Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. J Behav Decis Mak 26:213–224

    Article  Google Scholar 

  • Gutwirth S, De Hert P (2006) Privacy, data protection and law enforcement. Opacity of the individual and transparency of power. In: Claes E, Duff A, Gutwirth S (eds) Privacy and the criminal law. Intersentia, Antwerp, pp 61–104

  • Hajian S (2013) Simultaneous discrimination prevention and privacy protection in data publishing and mining. PhD thesis, Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, Tarragona, Catalonia

  • Hajian S, Domingo-Ferrer J (2013) Direct and indirect discrimination prevention methods. In: Custers B, Calders T, Schermer B, Zarsky TZ (eds) Discrimination and privacy in the information society, studies in applied philosophy, epistemology and rational ethics, vol 3. Springer, Berlin, pp 241–254

    Chapter  Google Scholar 

  • Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459

    Article  Google Scholar 

  • Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Discrimination prevention in data mining for intrusion and crime detection. In: IEEE SSCI 2011

  • Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: Vreeken et al. (2012), pp 360–369

  • Heckerman D (2013) From wet to dry: how machine learning and big data are changing the face of biological sciences. http://research.microsoft.com/apps/video/default.aspx?id=189426

  • Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of ICDM’10, pp 869–874

  • Kamiran F, Karim A, Verwer S, Goudriaan H (2012) Classifying socially sensitive data without discrimination: an analysis of a crime suspect dataset. In: Vreeken et al. (2012), pp 370–377

  • Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644

    Article  Google Scholar 

  • Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Considerations on fairness-aware data mining. In: Vreeken et al. (2012), pp 378–385

  • Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: ECML/PKDD (2), LNCS, vol 7524, pp 35–50. Springer

  • Kaplan B (2001) Evaluating informatics applications—clinical decision support systems literature review. Int J Med Inform 64(1):15–37

    Article  Google Scholar 

  • Knudsen S (2006) Intersectionality—a theoretical inspiration in the analysis of minority cultures and identities in textbooks. In: Caught in the web or lost in the textbook, pp 61–76. http://iartem.no/documents/caught_in_the_web.pdf. 20 Aug 2013

  • Lewis JR (1995) IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int J Hum–Comput Interact 7(1):57–78. http://hcibib.org/perlman/question.cgi. 31 July 2012

    Google Scholar 

  • Luong BT (2011) Generalized discrimination discovery on semi-structured data supported by ontology. PhD thesis, IMT Institute for Advanced Studies, Lucca, Italy

  • Luong BT, Ruggieri S, Turini F (2011) k-nn as an implementation of situation testing for discrimination discovery and prevention. In: KDD, pp 502–510. ACM

  • Mancuhan K, Clifton C (2012) Discriminatory decision policy aware classification. In: Vreeken et al. (2012), pp 386–393

  • Marghescu D, Rajanen M, Back B (2004) Evaluating the quality of use of visual data-mining tools. In: Proceedings of 11th European conference on IT evaluation, 11–12 Nov 2004, Amsterdam, pp 239–250. Academic Conferences Limited

  • Microsoft (2012) New York City Police Department and Microsoft partner to bring real-time crime prevention and counterterrorism technology solution to global law enforcement agencies. http://www.microsoft.com/en-us/news/Press/2012/Aug12/08-08NYPDPR.aspx. 20 Aug 2013

  • Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. GCD at http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29. 20 Aug 2013

  • Park H, Reder ML (2004) Moses illusion. In: Pohl FR (ed) Cognitive illusions, pp 275–291. Psychology Press, London

  • Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of KDD’08, pp 560–568. ACM

  • Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: ICAIL, pp 157–166. ACM

  • Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: SDM, pp 581–592

  • Pedreschi D, Ruggieri S, Turini F (2012) A study of top-k measures for discrimination discovery. In: SAC ’12, pp 126–131. ACM, New York, NY, USA

  • Perer A, Shneiderman B (2009) Integrating statistics and visualization for exploratory power: from long-term case studies to design guidelines. IEEE Comput Graphics Appl 29(3):39–51

    Article  Google Scholar 

  • Pitt G (2009) Genuine occupational requirements. EC anti-discrimination legislation for legal practitioners, 27–28 Apr 2009, Trier, Germany. http://www.era-comm.eu/oldoku/Adiskri/05_Occupational_requirements/2009_Pitt_EN.pdf. 20 Aug 2013

  • Plaisant C (2004) The challenge of information visualization evaluation. In: Costabile MF (ed) AVI, pp 109–116. ACM Press, New York

  • Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev (to appear). doi:10.1017/S0269888913000039

  • Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. TKDD ACM Trans Knowl Discov 4(2):1–40

    Google Scholar 

  • Ruggieri S, Pedreschi D, Turini F (2010) DCUBE: discrimination discovery in databases. In: Proceedings of SIGMOD’10, pp 1127–1130

  • Schanze E (2013) Injustice by generalization. Notes on the Test-Achats decision of the European Court of Justice. Ger Law J 14(2):423–433

    Google Scholar 

  • Sedlmair M, Meyer M, Munzner T (2012) Design study methodology: reflections from the trenches and the stacks. IEEE Trans Vis Comput Graphics 18(12):2431–2440

    Google Scholar 

  • Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4):13–22

    Google Scholar 

  • Sykes JB (ed) (1982) The concise Oxford dictionary, 7th edn. Oxford University Press, Oxford

    Google Scholar 

  • Vreeken J, Ling C, Zaki MJ, Siebes A, Yu JX, Goethals B, Webb GI, Wu X (eds) (2012) 12th IEEE ICDM workshops, Brussels, Belgium, 10 Dec 2012. IEEE Computer Society

  • Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: Barbará D, Kamath C (eds) SDM. SIAM, Philadelphia, PA

  • Zuccon G, Leelanupab T, Whiting S, Yilmaz E, Jose JM, Azzopardi L (2013) Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems. Inf Retr 16(2):267–305

    Article  Google Scholar 

Download references

Acknowledgements

We thank Brendan Van Alsenoy and Albrecht Zimmermann for many inspiring discussions and valuable comments on an earlier version of the paper, and the Flemish Agency for Innovation through Science and Technology (IWT) and the Fonds Wetenschappelijk Onderzoek—Vlaanderen (FWO) for support through the projects SPION (Grant Number 100048) resp. Data Mining for Privacy in Social Networks (Grant Number 65269).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bettina Berendt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berendt, B., Preibusch, S. Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22, 175–209 (2014). https://doi.org/10.1007/s10506-013-9152-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-013-9152-0

Keywords

Navigation