Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence

Berendt, Bettina; Preibusch, Sören

doi:10.1007/s10506-013-9152-0

Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence

Published: 10 January 2014

Volume 22, pages 175–209, (2014)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Bettina Berendt¹ &
Sören Preibusch²

2192 Accesses
28 Citations
5 Altmetric
Explore all metrics

Abstract

Decision makers in banking, insurance or employment mitigate many of their risks by telling “good” individuals and “bad” individuals apart. Laws codify societal understandings of which factors are legitimate grounds for differential treatment (and when and in which contexts)—or are considered unfair discrimination, including gender, ethnicity or age. Discrimination-aware data mining (DADM) implements the hope that information technology supporting the decision process can also keep it free from unjust grounds. However, constraining data mining to exclude a fixed enumeration of potentially discriminatory features is insufficient. We argue for complementing it with exploratory DADM, where discriminatory patterns are discovered and flagged rather than suppressed. This article discusses the relative merits of constraint-oriented and exploratory DADM from a conceptual viewpoint. In addition, we consider the case of loan applications to empirically assess the fitness of both discrimination-aware data mining approaches for two of their typical usage scenarios: prevention and detection. Using Mechanical Turk, 215 US-based participants were randomly placed in the roles of a bank clerk (discrimination prevention) or a citizen / policy advisor (detection). They were tasked to recommend or predict the approval or denial of a loan, across three experimental conditions: discrimination-unaware data mining, exploratory, and constraint-oriented DADM (eDADM resp. cDADM). The discrimination-aware tool support in the eDADM and cDADM treatments led to significantly higher proportions of correct decisions, which were also motivated more accurately. There is significant evidence that the relative advantage of discrimination-aware techniques depends on their intended usage. For users focussed on making and motivating their decisions in non-discriminatory ways, cDADM resulted in more accurate and less discriminatory results than eDADM. For users focussed on monitoring for preventing discriminatory decisions and motivating these conclusions, eDADM yielded more accurate results than cDADM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In AI we trust? Perceptions about automated decision-making by artificial intelligence

Article 01 January 2020

Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of HR recruitment and HR development

Article Open access 20 November 2020

Artificial Intelligence and Fraud Detection

Notes

Sections 2 and 3.1–3.3 extend on a previous workshop paper (Berendt and Preibusch 2012), and Sect. 3.4 summarises the user study presented in detail in that paper.
Otherwise called, e.g., “potentially discriminatory (PD) items” (Pedreschi et al. 2008) or “sensitive attributes” (Hajian and Domingo-Ferrer 2013; Kamiran et al. 2010). A feature or item is an attribute with a value or value range; thus for example “gender” is an attribute and “female” a feature. All three terms refer to the formal representation of legal grounds of discrimination (the reasons specified by the law that will serve as a basis for demanding relief) and other grounds in the databases used for data mining. While Pedreschi et al. (2008) point out that PD items may comprise more than just legally-defined sensitive attributes, they still assume a priori knowledge about these items.
“Bad patterns” correspond to, e.g., “α-discriminatory rules” in Pedreschi et al. (2008).
See for example Hajian et al. (2011), Kamiran et al. (2010) for measures of utility.
E.g. the “actuarial factors related to sex” discussed in Sect. 2.1.
E.g. “Differences in treatment may be accepted only if they are justified by a legitimate aim. A legitimate aim may, for example, be the protection of victims of sex-related violence (in cases such as the establishment of single-sex shelters), reasons of privacy and decency (in cases such as the provision of accommodation by a person in a part of that person’s home), the promotion of gender equality or of the interests of men or women (for example single-sex voluntary bodies), the freedom of association (in cases of membership of single-sex private clubs), and the organisation of sporting activities (for example single-sex sports events).” (EU 2004, Recital (16)).
E.g. “Any limitation should nevertheless be appropriate and necessary in accordance with the criteria derived from case law of the Court of Justice of the European Communities.” (EU 2004, Recital (16))
We claim this analogy due to the focus on hiding and sanitising patterns that privacy-preserving and discrimination-aware data mining share. However, using one does not imply the other, and their relation is in general non-trivial (Hajian 2013; Hajian et al. 2012).
Our focus was not on analysing any specific true lending data, but on how people deal with data mining results that in reality often are or seem to be non-causal, with correlations often going against common sense and referring to features that act as a positive risk factor in one rule and as a negative risk factor in another one. However, we wanted to create a possible loan-related model. We therefore used the attributes of the German Credit Dataset (Newman et al. 1998) as well as their values, and added further values to create a sufficient number of features (for example, we converted the binary “foreign worker” attribute into a multi-valued attribute specifying the country of origin of the loan applicant).
The US Census 2012 reports: 85 % (compared to our 98 %) “high school or more”, 28 % (compared to our 44 %)“Bachelor’s degree or more”, 10 % (compared to our 6 %)“advanced degree or more”. (http://www.census.gov/compendia/statab/2012/tables/12s0233).
All results reported as significant in the following were significant at α = .01.
The original observation was that when asked “How many animals of each kind did Moses take on the Ark,” most people respond “two,” even though they know that it was Noah, not Moses, who took the animals on the Ark.
Due to the exploratory nature of this analysis, we did not test these values for statistical significance.

References

Alhadeff J, Van Alsenoy B, Dumortier J (2011) The accountability principle in data protection regulation: origin, development and future directions. Presented at the privacy and accountability 2011 conference, Berlin, 5–6 Apr 2011. http://ssrn.com/abstract=1933731. 11 Oct 2013
Arnott D (2006) Cognitive biases and decision support systems development: a design science approach. Inf Syst J 16(1):55–78
Article Google Scholar
Avraham R, Logue KD, Schwarcz D (2013) Understanding insurance anti-discrimination laws. Technical report. U of Michigan law & econ research paper no. 12-017; U of Michigan public law research paper no. 289; U of Texas Law. Law and econ research paper no. 234; Minnesota legal studies research paper no. 12-45. http://dx.doi.org/10.2139/ssrn.2135800. 20 Aug 2013
Berendt B (2012) More than modelling and hiding: towards a comprehensive view of web mining and privacy. Data Min Knowl Discov 24(3):697–737
Article Google Scholar
Berendt B, Preibusch S (2012) Exploring discrimination: a user-centric evaluation of discrimination-aware data mining. In: Vreeken et al. (2012), pp 344–351
Berendt B, Preibusch S, Teltzrow M (2008) A privacy-protecting business-analytics service for online transactions. Int J Electron Commer 12:115–150
Article Google Scholar
Boston Consulting Group (2012) The value of our digital identity. Liberty global policy series. http://www.lgi.com/PDF/public-policy/The-Value-of-Our-Digital-Identity.pdf. 20 Aug 2013
Bresnahan J, Shapiro M (1966) A general equation and technique for the exact partitioning of chi-square contingency tables. Psychol Bull 66:252–262
Article Google Scholar
Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min Knowl Discov 21(2):277–292
Article MathSciNet Google Scholar
Chen JQ, Lee SM (2003) An exploratory cognitive DSS for strategic decision making. Decis Support Syst 36(2):147–160
Article Google Scholar
Duhigg C (2009) What does your credit-card company know about you? New York Times, 12 May 2009. http://www.nytimes.com/2009/05/17/magazine/17credit-t.html?pagewanted=all&_r=0. 20 Aug 2013
Eickhoff C, de Vries AP (2013) Increasing cheat robustness of crowdsourcing tasks. Inf Retr 16(2):121–137
Article Google Scholar
Erickson TA, Mattson ME (1981) From words to meaning: a semantic illusion. J Verbal Learn Verbal Behav 20:540–552
Article Google Scholar
EU (2004/2012) Council Directive 2004/113/EC of 13 December 2004 implementing the principle of equal treatment between men and women in the access to and supply of goods and services. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2004:373:0037:0043:EN:PDF. 20 Aug 2013
EU (2006) Directive 2006/54/EC of the European Parliament and of the Council of 5 July 2006 on the implementation of the principle of equal opportunities and equal treatment of men and women in matters of employment and occupation. http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2006:204:0023:0036:EN:PDF. 20 Aug 2013
European Commission (2012) How does the data protection reform strengthen citizens’ rights? http://ec.europa.eu/justice/data-protection/document/review2012/factsheets/2_en.pdf. 20 Aug 2013
European Court of Justice (2011) Case C-236/09, Association Belge des Consommateurs Test-Achats ASBL and Others v Conseil des ministres. http://curia.europa.eu/juris/liste.jsf?language=en&num=C-236/09. 20 Aug 2013
Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery: an overview. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. MIT Press, Cambridge, MA, pp 1–34
Federal Trade Commission (2012) Protecting consumer privacy in an era of rapid change: recommendations for businesses and policymakers. FTC report. http://www.ftc.gov/os/2012/03/120326privacyreport.pdf. 20 Aug 2013
Fine C (2010) Delusions of gender. The real science behind sex differences. Icon Books, London
Google Scholar
Gao B, Berendt B (2011) Visual data mining for higher-level patterns: discrimination-aware data mining and beyond. In: Proceedings of the 20th machine learning conference of Belgium and The Netherlands. http://www.benelearn2011.org/. 20 Aug 2013
Goodman J, Cryder C, Cheema A (2012) Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. J Behav Decis Mak 26:213–224
Article Google Scholar
Gutwirth S, De Hert P (2006) Privacy, data protection and law enforcement. Opacity of the individual and transparency of power. In: Claes E, Duff A, Gutwirth S (eds) Privacy and the criminal law. Intersentia, Antwerp, pp 61–104
Hajian S (2013) Simultaneous discrimination prevention and privacy protection in data publishing and mining. PhD thesis, Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, Tarragona, Catalonia
Hajian S, Domingo-Ferrer J (2013) Direct and indirect discrimination prevention methods. In: Custers B, Calders T, Schermer B, Zarsky TZ (eds) Discrimination and privacy in the information society, studies in applied philosophy, epistemology and rational ethics, vol 3. Springer, Berlin, pp 241–254
Chapter Google Scholar
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459
Article Google Scholar
Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Discrimination prevention in data mining for intrusion and crime detection. In: IEEE SSCI 2011
Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: Vreeken et al. (2012), pp 360–369
Heckerman D (2013) From wet to dry: how machine learning and big data are changing the face of biological sciences. http://research.microsoft.com/apps/video/default.aspx?id=189426
Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of ICDM’10, pp 869–874
Kamiran F, Karim A, Verwer S, Goudriaan H (2012) Classifying socially sensitive data without discrimination: an analysis of a crime suspect dataset. In: Vreeken et al. (2012), pp 370–377
Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644
Article Google Scholar
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Considerations on fairness-aware data mining. In: Vreeken et al. (2012), pp 378–385
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: ECML/PKDD (2), LNCS, vol 7524, pp 35–50. Springer
Kaplan B (2001) Evaluating informatics applications—clinical decision support systems literature review. Int J Med Inform 64(1):15–37
Article Google Scholar
Knudsen S (2006) Intersectionality—a theoretical inspiration in the analysis of minority cultures and identities in textbooks. In: Caught in the web or lost in the textbook, pp 61–76. http://iartem.no/documents/caught_in_the_web.pdf. 20 Aug 2013
Lewis JR (1995) IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int J Hum–Comput Interact 7(1):57–78. http://hcibib.org/perlman/question.cgi. 31 July 2012
Google Scholar
Luong BT (2011) Generalized discrimination discovery on semi-structured data supported by ontology. PhD thesis, IMT Institute for Advanced Studies, Lucca, Italy
Luong BT, Ruggieri S, Turini F (2011) k-nn as an implementation of situation testing for discrimination discovery and prevention. In: KDD, pp 502–510. ACM
Mancuhan K, Clifton C (2012) Discriminatory decision policy aware classification. In: Vreeken et al. (2012), pp 386–393
Marghescu D, Rajanen M, Back B (2004) Evaluating the quality of use of visual data-mining tools. In: Proceedings of 11th European conference on IT evaluation, 11–12 Nov 2004, Amsterdam, pp 239–250. Academic Conferences Limited
Microsoft (2012) New York City Police Department and Microsoft partner to bring real-time crime prevention and counterterrorism technology solution to global law enforcement agencies. http://www.microsoft.com/en-us/news/Press/2012/Aug12/08-08NYPDPR.aspx. 20 Aug 2013
Newman DJ, Hettich S, Blake CL, Merz CJ (1998) UCI repository of machine learning databases. GCD at http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29. 20 Aug 2013
Park H, Reder ML (2004) Moses illusion. In: Pohl FR (ed) Cognitive illusions, pp 275–291. Psychology Press, London
Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of KDD’08, pp 560–568. ACM
Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: ICAIL, pp 157–166. ACM
Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: SDM, pp 581–592
Pedreschi D, Ruggieri S, Turini F (2012) A study of top-k measures for discrimination discovery. In: SAC ’12, pp 126–131. ACM, New York, NY, USA
Perer A, Shneiderman B (2009) Integrating statistics and visualization for exploratory power: from long-term case studies to design guidelines. IEEE Comput Graphics Appl 29(3):39–51
Article Google Scholar
Pitt G (2009) Genuine occupational requirements. EC anti-discrimination legislation for legal practitioners, 27–28 Apr 2009, Trier, Germany. http://www.era-comm.eu/oldoku/Adiskri/05_Occupational_requirements/2009_Pitt_EN.pdf. 20 Aug 2013
Plaisant C (2004) The challenge of information visualization evaluation. In: Costabile MF (ed) AVI, pp 109–116. ACM Press, New York
Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev (to appear). doi:10.1017/S0269888913000039
Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. TKDD ACM Trans Knowl Discov 4(2):1–40
Google Scholar
Ruggieri S, Pedreschi D, Turini F (2010) DCUBE: discrimination discovery in databases. In: Proceedings of SIGMOD’10, pp 1127–1130
Schanze E (2013) Injustice by generalization. Notes on the Test-Achats decision of the European Court of Justice. Ger Law J 14(2):423–433
Google Scholar
Sedlmair M, Meyer M, Munzner T (2012) Design study methodology: reflections from the trenches and the stacks. IEEE Trans Vis Comput Graphics 18(12):2431–2440
Google Scholar
Shearer C (2000) The CRISP-DM model: the new blueprint for data mining. J Data Warehous 5(4):13–22
Google Scholar
Sykes JB (ed) (1982) The concise Oxford dictionary, 7th edn. Oxford University Press, Oxford
Google Scholar
Vreeken J, Ling C, Zaki MJ, Siebes A, Yu JX, Goethals B, Webb GI, Wu X (eds) (2012) 12th IEEE ICDM workshops, Brussels, Belgium, 10 Dec 2012. IEEE Computer Society
Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: Barbará D, Kamath C (eds) SDM. SIAM, Philadelphia, PA
Zuccon G, Leelanupab T, Whiting S, Yilmaz E, Jose JM, Azzopardi L (2013) Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems. Inf Retr 16(2):267–305
Article Google Scholar

Download references

Acknowledgements

We thank Brendan Van Alsenoy and Albrecht Zimmermann for many inspiring discussions and valuable comments on an earlier version of the paper, and the Flemish Agency for Innovation through Science and Technology (IWT) and the Fonds Wetenschappelijk Onderzoek—Vlaanderen (FWO) for support through the projects SPION (Grant Number 100048) resp. Data Mining for Privacy in Social Networks (Grant Number 65269).

Author information

Authors and Affiliations

Department of Computer Science, KU Leuven, Leuven, Belgium
Bettina Berendt
Microsoft Research, Cambridge, UK
Sören Preibusch

Authors

Bettina Berendt
View author publications
You can also search for this author in PubMed Google Scholar
Sören Preibusch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bettina Berendt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berendt, B., Preibusch, S. Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22, 175–209 (2014). https://doi.org/10.1007/s10506-013-9152-0

Download citation

Published: 10 January 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10506-013-9152-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence

Abstract

Access this article

Similar content being viewed by others

In AI we trust? Perceptions about automated decision-making by artificial intelligence

Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of HR recruitment and HR development

Artificial Intelligence and Fraud Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence

Abstract

Access this article

Similar content being viewed by others

In AI we trust? Perceptions about automated decision-making by artificial intelligence

Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of HR recruitment and HR development

Artificial Intelligence and Fraud Detection

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation