Fair Kernel Learning

Pérez-Suay, Adrián; Laparra, Valero; Mateo-García, Gonzalo; Muñoz-Marí, Jordi; Gómez-Chova, Luis; Camps-Valls, Gustau

doi:10.1007/978-3-319-71249-9_21

Adrián Pérez-Suay¹⁸,
Valero Laparra¹⁸,
Gonzalo Mateo-García¹⁸,
Jordi Muñoz-Marí¹⁸,
Luis Gómez-Chova¹⁸ &
…
Gustau Camps-Valls¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10534))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4409 Accesses
22 Citations
2 Altmetric

Abstract

New social and economic activities massively exploit big data and machine learning algorithms to do inference on people’s lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient.

We present novel fair regression and dimensionality reduction methods built on a previously proposed fair classification framework. Both methods rely on using the Hilbert Schmidt independence criterion as the fairness term. Unlike previous approaches, this allows us to simplify the problem and to use multiple sensitive variables simultaneously. Replacing the linear formulation by kernel functions allows the methods to deal with nonlinear problems. For both linear and nonlinear formulations the solution reduces to solving simple matrix inversions or generalized eigenvalue problems. This simplifies the evaluation of the solutions for different trade-off values between the predictive error and fairness terms. We illustrate the usefulness of the proposed methods in toy examples, and evaluate their performance on real world datasets to predict income using gender and/or race discrimination as sensitive variables, and contraceptive method prediction under demographic and socio-economic sensitive descriptors.

The research was funded by the Spanish Ministry of Economy and Competitiveness (MINECO) through the projects TIN2015-64210-R and TEC2016-77741-R (ERDF).

You have full access to this open access chapter, Download conference paper PDF

Multi-stage Bias Mitigation for Individual Fairness in Algorithmic Decisions

Fairness in Machine Learning

Achieving fairness with a simple ridge penalty

Article Open access 18 September 2022

Keywords

1 Introduction

“Perfect objectivity is an unrealistic goal; fairness, however, is not.” –M. Pollan, 2004

Current and upcoming application of machine learning to real-life’s problems is overwhelming. Applications have enormous consequences in people’s life, and impact decisions on education, economy, health care, and climate policies. The issue is certainly relevant. New social and economic activities massively exploit big data and machine learning algorithms to do inferences, and they decide on the best curriculum to fill in a position [15], to determine wages and in pre-trial risk assessment [4, 9], and to evaluate risk of violence [8]. Companies, governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat this kind of problems^{Footnote 1}. Machine learning methods are actually far from being fair, just, or equitable in any way. After all, standard pattern analysis is often about model fitting and not the gender issue. Undoubtedly, attaining fair machine learning algorithms is a timely important concern. Fairness is an elusive concept though, so it is the inclusion of such qualitative measure in machines that only learn from data.

Several approaches exist in the literature to account for fairness in machine learning. One of the earliest approaches tackled the bias problem through the definition of classification rules [22, 24]. Later, some other works focused on (mainly) pre-processing the data [12, 16, 21, 23]: down-weighting sensitive features or directly removing them have been the preferred choices. Perhaps the most naive approach is to simply discard the sensitive input features that bias discrimination [31]. Removing gender, disability or race, to predict monthly income is, however, not a good choice because model’s accuracy may be largely impacted by the lack of informative features, and because some other correlated features enter the model anyway. This effect is known in statistics as the omitted variable bias [6].

Another simple approach consists on including ad hoc weights and data normalization to match the prior belief about fairness. Noting that data pre-processing is a quite arbitrary approach, Kamiran and Calders et al. proposed three solutions to learn fair classifiers [17]. Classifiers basically used the sensitive features only during learning and not at the prediction time. A step forward in this direction was presented in [12], where authors proposed pre-processing the data by removing information from all attributes correlated with the objective variable. The intuition behind this approach is that training on discrimination-free data is likely to yield more equitable predictions. A discussion of several more algorithms for binary protected and outcome variables can be found in [18]. Other authors have focused on finding transformations of the input space in order to extract features that do not retain information about the sensitive input variables [30].

All in all, the relevance of fair methods in machine learning is ever increasing, and a wide body of literature and approaches exist. We focus in this paper in a field known as ‘disparate impact’, in which outcomes should not differ based on individuals’ protected class membership. Many definitions for the elusive concept of fairness in machine learning are available (see [3, 5, 7, 11, 12, 16, 22, 29]): redlining, negative legacy, underestimation or subset targeting, to name a few. We frame our methods in the ‘indirect discrimination’ subfield.

Recently, an interesting regularization framework for fair classification was proposed in [19]. The framework optimizes a functional that jointly minimizes the classification error and the dependence between predictions and the sensitive variables using mutual-information concepts. We build our proposal upon this framework, and extend it to regression, and to unsupervised dimensionality reduction problems with kernel methods. The proposed kernel machines exploit cross-covariance operators in Hilbert spaces. Both theoretical and empirical advantages are gained. Advantageously, the solutions only involve solving simple matrix inversion or generalized eigenproblems. This allows to check different solutions when the trade-off between prediction and fairness is modified. The methods are able to deal naturally with input variables of several dimensions for the regular as well as for the sensitive variables. Note that this is especially important for the fairness term, where a robust measure of dependence is needed. On top of this, the proposed methods can incorporate prior knowledge about the fairness, invariances and interestingness of the feature relations. We illustrate performance in toy data as well as in two real problems: income prediction subject to gender and/or race discrimination, and contraceptive method prediction under demographic and socio-economic sensitive descriptors.

The remainder of the paper is organized as follows. Section 2 describes the problem statement, introduces notation and presents the fair kernel regression framework in the input and the Hilbert space. Section 3 extends the fair kernel learning framework to dimensionality reduction problems. Toy examples guide the presentation of the two approaches. Experimental evidence of performance is given in Sect. 4. Conclusions finalize the paper in Sect. 5.

2 Fair Regression Framework

This section starts by defining the notation and the concept of fair predictions. Then we introduce the proposed framework for performing fair regression learning based on cross-covariance operators for dependence estimation in Hilbert spaces. We conclude with an illustrative example.

2.1 Notation, Preliminaries, and the Regularization Framework

Let us define the notation and the problem setting. We are given n samples of a response (or target) data matrix ${\mathbf Y}\in \mathbb R^{n\times c}$, and $d+q$ prediction variables: d unprotected ${\mathbf X}_u\in \mathbb R^{n\times d}$ and q sensitive ${\mathbf S}\in \mathbb R^{n\times q}$. The goal is to obtain an accurate prediction function (or model) f for the target variable ${\mathbf Y}$ from the input data, ${\mathbf X}=({\mathbf X}_u,{\mathbf S})$. This function is said to be totally fair if the predictions are statistically independent of the sensitive (protected) features [5, 10].

Therefore, two main ingredients are needed to perform fair predictions: we need to ensure independence of the predictions on the sensitive variables, and simultaneously to obtain a good approximation of the target variables. The regularization framework proposed in [19] tackles the problem of finding a fair function f for classification by including a term to enforce fair classification. In our proposal the proposed function f tries to learn the relation between observed input-output data pairs $({\mathbf x}_1,{\mathbf y}_1),\ldots ,({\mathbf x}_n,{\mathbf y}_n)\in {\mathcal X}\times {\mathcal Y}$ such that generalizes well (good predictions $\hat{{\mathbf y}}_*=f({\mathbf x}_*)\in {\mathcal Y}$ for the unseen input data point ${\mathbf x}_*\in {\mathcal X}$), and the predictions should be as independent as possible of the sensitive features. Then, the following functional needs to be optimized:

$$\begin{aligned} {\mathcal L} = \dfrac{1}{n}\sum _{i=1}^n V(f({\mathbf x}_i),{\mathbf y}_i) + \lambda ~\varOmega (\Vert f\Vert _{\mathcal H}) + \mu ~I(f({\mathbf x}),\mathbf{s}), \end{aligned}$$

(1)

where V is the error cost function, $\varOmega (\Vert f\Vert _{\mathcal H})$ acts as a regularizer of the predictive function and controls the smoothness and complexity of the model, and $I(f({\mathbf x}),\mathbf{s})$ measures the independence between model’s predictions and the protected variables. Note that one aims to minimize the amount of information that the model shares with the sensitive variables while controlling the trade-off between fitting and independence through hyperparameters $\lambda $ and $\mu $. By setting $\mu =0$ one obtains the ordinary Tikhonov’s regularized functional, and by setting $\lambda =0$ one obtains the unregularized versions of this framework.

The framework admits many variants depending of the cost function V, regularizer $\varOmega $ and the independence measure, I. For example, in [19], the function f was the logistic regression classifier and I was a simplification of the mutual information estimate. Despite the good results reported in [19], these choices did not allow to solve the problem in closed-form, nor coping with more than one sensitive variable at the same time, since the proposed mutual information is an uni-dimensional dependence measure. In the following section, we elaborate further this framework under the concept of cross-covariance operators in Hilbert spaces, which lead to closed-form solutions and allow to deal with several sensitive variables simultaneously.

2.2 Fair Linear Regression

Let us now provide a straightforward instantiation of the proposed framework for fair linear regression (FLR). We will adopt a linear predictive model for f, i.e. the matrix of predictions for a test data matrix ${\mathbf X}_*$ is given by $\hat{\mathbf Y}_*= {\mathbf X}_*{\mathbf W}$, the mean square error for the cost function $V = \Vert {\mathbf Y}-{\mathbf X}{\mathbf W}\Vert _2^2$ and the standard $\ell _2$ regularization for model weights $\varOmega := \Vert {\mathbf W}\Vert _2^2$. Other choices could be taken, leading to alternative formulations. In order to measure dependence, we will rely on the cross-covariance operator between the predictions and the sensitive variables in Hilbert space. Let us consider two spaces ${\mathcal Y}\subseteq \mathbb R^{c}$ and ${\mathcal S}\subseteq \mathbb R^{q}$, where random variables $(\hat{\mathbf y},\mathbf{s})$ are sampled from the joint distribution ${\mathbb P}_{{\mathbf y}\mathbf{s}}$. Given a set of pairs ${\mathcal D}=\{(\hat{\mathbf y}_1,\mathbf{s}_1),\ldots ,(\hat{\mathbf y}_n,\mathbf{s}_n)\}$ of size n drawn from ${\mathbb P}_{{\mathbf y}\mathbf{s}}$, an empirical estimator of HSIC [14] allows us to define

$$I:=\text {HSIC}({\mathcal Y},{\mathcal S},{\mathbb P}_{{\mathbf y}\mathbf{s}}) = \Vert {\mathbf C}_{ys}\Vert _{\text {HS}}^2 = \Vert \tilde{\mathbf Y}^\top \tilde{\mathbf S}\Vert ^2 = \frac{1}{n^2}\text {Tr}(\tilde{\mathbf Y}^\top \tilde{\mathbf S} \tilde{\mathbf S}^\top \tilde{\mathbf Y}),$$

where $\Vert \cdot \Vert _{\text {HS}}$ is the Hilbert-Schmidt norm, ${\mathbf C}_{ys}$ is the empirical cross-covariance matrix between predictions and sensitive variables^{Footnote 2}, $\tilde{\mathbf Y}$ and $\tilde{\mathbf{S}}$ represent the feature-centered ${\mathbf Y}$ and $\mathbf{S}$ respectively, and $\text {Tr}(\cdot )$ denotes the trace of the matrix. We want to stress that HSIC allows us to estimate dependencies between multidimensional variables, and that HSIC is zero if an only if there is no second-order dependence between $\hat{\mathbf y}$ and $\mathbf{s}$. In the next section we extend the formulation to higher-order dependencies with the use of kernels [25, 26].

Plugging these definitions of f, V, $\varOmega $ and I in Eq. (1), one can easily show that the solution has the following closed-form solution for weight estimates

$$\begin{aligned} \widehat{\mathbf W}= (\tilde{\mathbf X}^\top \tilde{\mathbf X}+ \lambda ~\mathbf{I} + \frac{\mu }{n^2}~\tilde{\mathbf X}^\top \tilde{\mathbf S} \tilde{\mathbf S}^\top \tilde{\mathbf X})^{-1} \tilde{\mathbf X}^\top {\mathbf Y}, \end{aligned}$$

(2)

where fairness is trivially controlled with $\mu $, which acts as an additional regularization term. Also note that when $\mu =0$ the ordinary regularized least squares solution is obtained.

2.3 Fair Kernel Regression

Let us now extend the previous model to the nonlinear case in terms of the prediction function, the regularizer and the dependence measure by means of reproducing kernels [25, 26]. We call this method the fair kernel regression (FKR) model. We proceed in the standard way in kernel machines by mapping data ${\mathbf X}$ and ${\mathbf S}$ to a Hilbert space ${\mathcal H}$ via the mapping functions $\phi (\cdot )$ and $\psi (\cdot )$ respectively. This yields ${\varvec{\varPhi }},{\varvec{\varPsi }}\in {\mathcal H}\subseteq \mathbb R^{d_{\mathcal H}}$, where $d_{\mathcal H}$ is the (unknown and possibly infinite) dimensionality of mapped points in ${\mathcal H}$. The corresponding kernel matrices can be defined as: $\tilde{\mathbf{K}}=\tilde{\varvec{\varPhi }}\tilde{\varvec{\varPhi }}^\top $ and $\tilde{\mathbf{K}}_S=\tilde{\varvec{\varPsi }}\tilde{\varvec{\varPsi }}^\top $. Now the prediction function is $\hat{\mathbf Y}= {\varvec{\varPhi }}{\mathbf W}_{\mathcal H}$, the regularizer is $\varOmega := \Vert {\mathbf W}_{\mathcal H}\Vert _2^2$, and the dependence measure I is the HSIC estimate between predictions $\hat{\mathbf Y}$ and sensitive variables ${\mathbf S}$, which can now be estimated in Hilbert spaces: $I:=\text {HSIC}({\mathcal Y},{\mathcal H},{\mathbb P}_{{\mathbf y}\mathbf{s}}) = \Vert {\mathbf C}_{ys}\Vert _{\text {HS}}^2$. Now, by plugging all these terms in the cost function, using the representer’s theorem ${\mathbf W}_{\mathcal H} = \tilde{{\varvec{\varPhi }}}^\top \varvec{\varLambda }$ and after some simple linear algebra, we obtain the dual weights in closed-form

$$\begin{aligned} \varvec{\varLambda } = (\tilde{\mathbf{K}} + \lambda \mathbf{I} + \frac{\mu }{n^2} \tilde{\mathbf{K}}\tilde{\mathbf{K}}_S)^{-1} {\mathbf Y}, \end{aligned}$$

(3)

which can be used for prediction with a new point ${\mathbf x}_*$ by using $\hat{\mathbf y}_*= \mathbf{k}_*\varvec{\varLambda }$, where $\mathbf{k}_*= [K({\mathbf x}_*,{\mathbf x}_1),\ldots ,K({\mathbf x}_*,{\mathbf x}_n)]^\top $. In the case where $\mu = 0$ the method reduces to standard kernel ridge regression (KRR) method [26]. Centering points in feature spaces can be done implicitly with kernels [26]: a kernel matrix $\mathbf{K}$ is centered by doing $\tilde{\mathbf{K}} = \mathbf{H}{} \mathbf{K}{} \mathbf{H}$, where $\mathbf{H}= \mathbf{I} - \frac{1}{n}\mathbbm {1}\mathbbm {1}^\top $.

Lemma 1

Both KRR and FKR model weights are bounded in norm by the same quantitiy.

Proof

Let us assume the same kernel matrix ${\tilde{\mathbf{K}}}$ for KRR and FKR, and also suppose $\lambda ,\mu \ge 0$, then the following bound is satisfied: $\Vert (\tilde{\mathbf{K}} + \lambda \mathbf{I} + \frac{\mu }{n^2} \tilde{\mathbf{K}}_S\tilde{\mathbf{K}})^{-1}\Vert \le \Vert (\tilde{\mathbf{K}} + \lambda \mathbf{I})^{-1}\Vert $. Given $\mu \ge 0$ the following inequality, with $\succeq $ meaning the standard PSD order, holds true: $\tilde{\mathbf{K}} + \lambda \mathbf{I} + \frac{\mu }{n^2} \tilde{\mathbf{K}}_S\tilde{\mathbf{K}}\succeq \tilde{\mathbf{K}} + \lambda \mathbf{I}$. Then also holds $(\tilde{\mathbf{K}} + \lambda \mathbf{I} + \frac{\mu }{n^2} \tilde{\mathbf{K}}_S\tilde{\mathbf{K}})^{-1}\preceq (\tilde{\mathbf{K}} + \lambda \mathbf{I})^{-1}$, and by taking norms we have the following inequality $\Vert (\tilde{\mathbf{K}} + \lambda \mathbf{I} + \frac{\mu }{n^2} \tilde{\mathbf{K}}_S\tilde{\mathbf{K}})^{-1}\Vert \le \Vert (\tilde{\mathbf{K}} + \lambda \mathbf{I})^{-1}\Vert $. FKR model weights can be bounded

$$\begin{aligned} \begin{aligned} \Vert \varvec{\varLambda }_{\text {FKR}}\Vert&= \left\| (\tilde{\mathbf{K}} + \lambda \mathbf{I} + \frac{\mu }{n^2} \tilde{\mathbf{K}}_S\tilde{\mathbf{K}})^{-1} {\mathbf Y}\right\| \le \left\| (\tilde{\mathbf{K}} + \lambda \mathbf{I} + \frac{\mu }{n^2} \tilde{\mathbf{K}}_S\tilde{\mathbf{K}})^{-1}\right\| \left\| {\mathbf Y}\right\| \\&\le \left\| (\tilde{\mathbf{K}} + \lambda \mathbf{I})^{-1}\right\| \left\| {\mathbf Y}\right\| , \end{aligned} \end{aligned}$$

(4)

which is the same bound for KRR weights:

$$\Vert \varvec{\varLambda }_{\text {KRR}}\Vert = \left\| (\tilde{\mathbf{K}} + \lambda \mathbf{I})^{-1} {\mathbf Y}\right\| \le \left\| (\tilde{\mathbf{K}} + \lambda \mathbf{I})^{-1}\right\| \left\| {\mathbf Y}\right\| .$$

Illustrative example. Here we illustrate the performance of the proposed methods in a controlled synthetic experiment. The data considers a sensitive variable drawn from a zero mean Gaussian with standard deviation $\sigma _s$, $\mathbf{s}\sim {\mathcal N}(0,\sigma _s)$, and a parametric function $f_s(\mathbf{s})$ that yields an intermediate variable $\mathbf{a}$ buried in additive white Gaussian noise (AWGN), i.e. ${\mathbf a} = f_s({\mathbf s})\ +\ \mathbf{n}_f$, where $\mathbf{n}_f\sim {\mathcal N}(0,\sigma _f)$. System’s output combines both the sensitive as well as its arbitrarily transformed version affected by AWGN ${\mathbf y}=g_s({\mathbf s}) + g_r({\mathbf a}) + \mathbf{n}_y$, where $\mathbf{n}_y\sim {\mathcal N}(0,\sigma _y)$. In this example we used $f_s(x) = \log (x)$, and $g_r(x) = g_s(x) = x^2$. This system ensures that, even without using variable ${\mathbf s}$ explicitly as an input factor in the regression model, the information conveyed in ${\mathbf s}$ is embedded in ${\mathbf a}$ indirectly. Two settings are considered, with and without using the sensitive variable ${\mathbf s}$ as an input feature. In both experiments we used the RBF kernel function and fitted hyperparameters ($\lambda $, $\mu $ and the kernel widths) to be optimal for each $\mu $ value. Figure 1 shows the results for the four different configurations (linear and nonlinear, with and without considering $\mathbf{s}$), the horizontal axis represents the mean square error (MSE) of the prediction and the vertical axis the HSIC between the prediction and the protected variable. An ideal fair model would obtain zero MSE and zero HSIC. For each configuration, we give the family of solutions that can be obtained by modifying the parameter $\mu $. Classical solutions that do not include the fairness term show that KRR improves the ordinary LR results in MSE terms, but both methods obtain similar HSIC values. On the other hand, the inclusion of the sensitive variable $\mathbf{s}$ as input feature obtains more fair results in HSIC terms but worst results in MSE terms. The fairness paths are obtained for different $\mu $ values. The nonlinear regression methods outperform in general the linear counterparts. Including the sensitive variable as input returns better trade-off results. For example, the FKR can be tuned to have the same fairness level as KRR$\backslash $S but obtaining around 30% lower prediction error. A similar conclusion can be extracted in the linear case, yet the improvement is smaller.

3 Fair Dimensionality Reduction Framework

Let us now show a different frame for fair machine learning. Rather than optimizing a regression model, we are here concerned about obtaining fair feature representations. We rely on the field of multivariate analysis to develop both linear and nonlinear (kernel) dimensionality reduction methods.

3.1 Fair Dimensionality Reduction

Let us define two training data matrices as before, the full design matrix ${\mathbf X}\in \mathbb R^{n\times d}$, and the sensitive data matrix ${\mathbf S}\in \mathbb R^{n\times q}$, and a labelled data matrix ${\mathbf Y}\in \mathbb R^{n\times c}$ (here we use 1-of-c encoding). The goal here is to find a fair projection matrix ${\mathbf V}\in \mathbb R^{d\times n_p}$ such that the projected data, ${\mathbf X}{\mathbf V}$ keeps as much information as possible from the input features yet minimally aligned with the protected, sensitive features. We denoted ${\mathbf V}=[\mathbf{v}_1|\cdots |\mathbf{v}_{n_p}]$, where $\mathbf{v}_i$ is the i-th projection vector and $n_p$ is the dimension of the projection subspace. Hereafter, the terms alignment and statistical dependency will be used interchangeably. As before, in order to minimize alignment (dependence) between random variables ${\mathbf X}$ and ${\mathbf S}$, we will use the cross-covariance operator, whose empirical estimate reduces to compute the norm of the corresponding empirical cross-covariance given by the HSIC estimator. We will also use HSIC to maximize the dependence between the projected and the original data. The problem can thus be easily formalized as the maximization of the following Rayleigh quotient:

$$\begin{aligned} {\mathbf V}^*= {\mathop {\arg \mathrm {max}} \limits _{\mathbf V}} \left\{ \dfrac{\text {HSIC}(\tilde{\mathbf X}{\mathbf V},\tilde{\mathbf X})}{\text {HSIC}(\tilde{\mathbf X}{\mathbf V},\tilde{\mathbf S})} \right\} = {\mathop {\arg \mathrm {max}} \limits _{\mathbf V}} \left\{ \dfrac{\text {Tr}( {\mathbf V}^\top (\tilde{\mathbf X}^\top \tilde{\mathbf X}\tilde{\mathbf X}^\top \tilde{\mathbf X}){\mathbf V})}{\text {Tr}({\mathbf V}^\top (\tilde{\mathbf X}^\top \tilde{\mathbf S}\tilde{\mathbf S}^\top \tilde{\mathbf X}){\mathbf V})} \right\} , \end{aligned}$$

where $\tilde{\mathbf X}$ represents the feature-centered ${\mathbf X}$. This leads to solving a generalized eigenvalue problem with the empirical covariance ${\mathbf C}_{xx}=\frac{1}{n}\tilde{\mathbf X}^\top \tilde{\mathbf X}$ and input-sensitive cross-covariance ${\mathbf C}_{xs}=\frac{1}{n}\tilde{\mathbf X}^\top \tilde{\mathbf S}$:

$$\begin{aligned} {\mathbf C}_{xx}{\mathbf C}_{xx}^\top ~{\mathbf{v}} = \lambda {\mathbf C}_{xs}{\mathbf C}_{xs}^\top ~{\mathbf{v}}. \end{aligned}$$

The solution resembles that of the standard orthonormalized partial least squares (OPLS) [27]. Note that the generalized eigenproblem involves symmetric matrices. The matrix projection operator ${\mathbf V}$ can then been used to obtain fair scores for a new data point ${\mathbf x}_*\in \mathbb R^{d\times 1}$ as follows $\tilde{{\mathbf x}}_*' = {\mathbf V}^\top {\mathbf x}_*\in \mathbb R^{n_p\times 1}$.

3.2 Kernel Fair Dimensionality Reduction

Let us now derive a nonlinear version of FDR by means of reproducing kernels [25, 26]. We proceed in the standard way by mapping data ${\mathbf X}$ and ${\mathbf S}$ to a Hilbert space ${\mathcal H}$ via mapping functions ${\phi }(\cdot ), {\psi }(\cdot )$, which yield ${\varvec{\varPhi }}$, ${\varvec{\varPsi }}\in \mathbb R^{n\times d_{\mathcal H}}$ respectively, where $d_{\mathcal H}$ is the dimensionality of ${\mathcal H}$. The FDR ratio now translates into finding a projection matrix ${\mathbf U}= [\mathbf{u}_1|\cdots |\mathbf{u}_{n_p}]\in \mathbb R^{d_{\mathcal H}\times n_p}$ such that:

$$\begin{aligned} {\mathbf U}^*= {\mathop {\arg \mathrm {max}} \limits _{\mathbf U}} \left\{ \dfrac{\text {Tr}({\mathbf U}^\top ({\tilde{{\varvec{\varPhi }}}}^\top {\tilde{{\varvec{\varPhi }}}}{\tilde{{\varvec{\varPhi }}}}^\top {\tilde{{\varvec{\varPhi }}}}){\mathbf U})}{\text {Tr}({\mathbf U}^\top ({\tilde{{\varvec{\varPhi }}}}^\top {\tilde{{\varvec{\varPsi }}}}{\tilde{{\varvec{\varPsi }}}}^\top {\tilde{{\varvec{\varPhi }}}}){\mathbf U})} \right\} , \end{aligned}$$

where ${\tilde{{\varvec{\varPhi }}}}$, and ${\tilde{{\varvec{\varPsi }}}}$ contain the centered data in Hilbert space. Now, by applying the representer’s theorem ${\mathbf U}=\tilde{{\varvec{\varPhi }}}^\top \varvec{\varLambda }$ (where $\varvec{\varLambda }= [\varvec{\alpha }_1|\cdots |\varvec{\alpha }_{n_p}]^\top $), replacing dot products with kernel functions, $\tilde{k}_x({\mathbf x},{\mathbf x}')=\tilde{{\phi }}({\mathbf x})^\top \tilde{{\phi }}({\mathbf x}')$, $\tilde{k}_s({\mathbf s},{\mathbf s}')=\tilde{{\psi }}({\mathbf s})^\top \tilde{{\psi }}({\mathbf s}')$, and defining kernel matrices, $\tilde{\mathbf{K}}_x = \tilde{{\varvec{\varPhi }}}\tilde{{\varvec{\varPhi }}}^\top $, and $\tilde{\mathbf{K}}_s = \tilde{{\varvec{\varPsi }}}\tilde{{\varvec{\varPsi }}}^\top $, we obtain a dual problem:

$$\begin{aligned} \varvec{\varLambda }^*= {\mathop {\arg \mathrm {max}} \limits _{\varvec{\varLambda }}} \left\{ \dfrac{\text {Tr}(\varvec{\varLambda }^\top (\tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_x )\varvec{\varLambda })}{\text {Tr}(\varvec{\varLambda }^\top ( \tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_s\tilde{\mathbf{K}}_x )\varvec{\varLambda })} \right\} , \end{aligned}$$

which reduces again to solving a generalized eigenproblem:

$$\begin{aligned} \tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_x {\varvec{\alpha }} = \lambda \tilde{\mathbf{K}}_s\tilde{\mathbf{K}}_x {\varvec{\alpha }}. \end{aligned}$$

This problem can be solved iteratively by first computing the leading pair $\{\lambda _i,\varvec{\alpha }_i\}$, and then deflating the matrices. The deflation equation for KFDR can be written as:

$$\begin{aligned} \tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_x \leftarrow \tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_x - \lambda _i \tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_s\tilde{\mathbf{K}}_x \varvec{\alpha }_i \varvec{\alpha }_i^\top \tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_s\tilde{\mathbf{K}}_x. \end{aligned}$$

which is equivalent to

$$\begin{aligned} \tilde{\mathbf{K}}_x \leftarrow \tilde{\mathbf{K}}_x - \sqrt{\lambda _i} \tilde{\mathbf{K}}_x \tilde{\mathbf{K}}_s\varvec{\alpha }_i, \end{aligned}$$

i.e., at each step we remove from the kernel matrix the best approximation based on the newly extracted projections of the sensitive data $\tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_s \varvec{\alpha }_i$. The deflation procedure decreases by 1 the rank of the matrix, so the maximum number of features that can be extracted with KFDR is $\text {rank}(\tilde{\mathbf{K}}_x\tilde{\mathbf{K}}_s)$, which for most mapping functions will be $n_p=\min \{n,c\}$.

The KFDR method is again similar to the KOPLS in [1, 2], but here we seek for independent projections from the inequitable variables ${\mathbf S}$ while maximizing the variance. As for any kernel multivariate analysis method, projecting a new test point ${\mathbf x}_*\in \mathbb R^{d\times 1}$ is possible, $\tilde{\mathbf x}_*' = {\mathbf U}^\top \tilde{{\phi }}({\mathbf x}_*) = \varvec{\varLambda }^\top \tilde{{\varvec{\varPhi }}}\tilde{{\phi }}({\mathbf x}_*) = \varvec{\varLambda }^\top \tilde{\mathbf{k}}_*$, where $\tilde{\mathbf{k}}_* = [k_x({\mathbf x}_1,{\mathbf x}_*),\ldots ,k_x({\mathbf x}_n,{\mathbf x}_*)]^\top $.

Invariant feature extraction. This example considers $n=1000$ points drawn from a sinus function buried in noise, $b_i = \sin (a_i) + n_i$, where ${\mathbf a}\sim {\mathcal U}(0,1.5\pi )$ and $n_i\sim {\mathcal N}(0,0.1)$. We compare the maps of PCA and FDR and their kernel counterparts. For illustration purposes we consider two different configurations of the inputs, by switching the sensitive variable to be either ${\mathbf a}$ or ${\mathbf b}$. Note that this only changes the results for FDR since PCA and KPCA do not distinguish between sensitive and unprotected variables. Figure 2 shows the first component projection as a color map in the background for the different methods. Essentially PCA and FDR methods cannot account for the nonlinear feature relations, but FDR allows one to easily force invariance to a pre-specified dimension of interest. Compare for instance the first (PCA) and the second (FDR, ${\mathbf S} = {\mathbf a}$) plots. The first component found by PCA is diagonal, revealing it has information about both components (${\mathbf a}$ and ${\mathbf b}$). On the other hand, the first component found by FDR is vertical, thus avoiding the information in the horizontal axis, i.e. it is insensitive to the information in ${\mathbf a}$, as expected. Similar effects are observed in the kernel versions, yet recovering the nonlinear structure of the manifold.

Noise-aware feature extraction. We generated a bidimensional banana-shaped distribution corrupted by correlated noise in the $\pi /4$-direction to which we want to be independent, cf. Fig. 3. We compare results of KFDR with those from standard KPCA. Projections #2 and #3 capture the noise distribution while for the KFDR all extracted projections are invariant to variations in the $\pi /4$ direction where the noise is mostly present. The method is intimately related to the kernel signal to noise ratio in [13].

4 Experimental Evidence

The aim of this section is to empirically test the proposed methods on real data. We will see that using regular models and removing the sensitive variables is not enough to obtain fair solutions. First, we will present the data used and then we will evaluate both proposals, regression and dimensionality reduction, using two different datasets from the UCI Machine Learning repository. Source code and illustrative demos are available in http://isp.uv.es/soft_regression.html for the interested reader.

4.1 Datasets

We consider two datasets from the UCI repostory [20]: the Adult dataset and the Contraceptive dataset. Both of them involve sensitive attributes and pose problems of equitable prediction.

Dataset 1: Adult Dataset. This dataset is readily pre-processed and available from the libsvm website^{Footnote 3}, and has been used in several studies about fair machine learning methods for classification and feature extraction [12, 19, 28, 30]. The original Adult data set contains 14 features, among which six are continuous and eight are categorical. In this data set, continuous features were discretized into quantiles, and each quantile was represented by a binary feature. Also, a categorical feature with m categories is converted to m binary features. Finally, the original 14 features are pre-processed into 123 features. Details on how each feature is finally converted can be found in Table 1. The dataset is already split into 2 sets, the first one for training the models, which consists of 32561 instances, and the second one was used for testing the results and contains 16281 instances. Both in regression and dimensionality reduction experiments we fit the hyperparameters using 5000 instances to train and 5000 to validate both randomly selected from the training set. Afterwards we evaluate those models using the whole test set. All the presented results are the mean of twenty-five realizations of each experiment.

Table 1. Original and processed features for the adult dataset from the UCI repository. The type column distinguishes between a continuous (c) or discrete (d) attribute.

Full size table

Dataset 2: Contraceptive Method Choice Data Set. In the second problem we study the drivers for adoption of contraceptive types by a women cohort. We used the Contraceptive Method Choice (CMC) Data Set from the UCI repository, which can be downloaded from https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice. This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are married women who were either not pregnant or do not know if they were at the time of interview. The problem is to predict the current contraceptive method choice leading to three possibilities: ‘no use’, ‘long-term methods’, or ‘short-term methods’ of a woman based on demographic and socio-economic descriptors. We simplified the problem and considered the classes using/not-using a contraceptive method. Table 2 summarizes the total number of features and the class attributes.

Table 2. Original and processed features for the contraceptive method choice data set from the UCI repository. The type column distinguishes between a continuous (c) or discrete (d) attributes.

Full size table

The data set consists of 1473 samples with 9 features, and one variable to infer, the contraceptive method. In order to train our algorithms, we split the data into train (500 samples), validation (500 samples) and test sets (the remaining 473 samples). The experiment is performed 25 times, and results are averaged to avoid skewed conclusions.

4.2 Experimental Setup

In the regression experiment, we optimize the hyperparameters $\lambda $ (model regularization), $\sigma $ (kernel width) and $\sigma _S$ (the kernel parameter for the dependence estimation) using different logarithmically spaced values. Specifically we tried seven values in the interval $[10^{-4},10^3]$ for $\lambda $, 10 values in $[10^{-4},10^4]$ for $\sigma $, and 10 values in $[10^{-1},10^2]$ for $\sigma _S$. We start by seeking the optimal $\lambda $ and $\sigma $ parameters that minimize the error in the validation data. Once these two parameters are fixed we explore the kernel parameter for the dependence estimation in order to maximize the dependence between the model and the sensitive data. Finally, we try 25 different logarithmic spaced values in the interval $[10^{-7},10^3]$ for the $\mu $ fairness hyperparameter (large $\mu $ values imply more fair models).

In the FDR experiment the only hyperparameter to tune is $\sigma _S$, which is optimized to maximize the dependence between the transformed data and the sensitive variables. We optimized this parameter trying 15 values in the interval $[10^{-5},10^3]$. Different number of components $n_p$ were extracted.

4.3 Results for Fair Regression

We analyze the performance of both linear (FLR) and the nonlinear kernel (FKR) formulations. As done in the toy example, we explore the possibility of including or not the sensitive variables $\mathbf{S}$ in the models. Figure 4 shows the results for different values of $\mu $. Since the original data was collected for a classification problem, we binarized the outputs ($c=2$), and treat it as a regression problem, afterwards we use max-vote to obtain the predicted class. We analyze two different situations: one where the methods avoid discriminating only by gender and another when the methods avoid discriminating by gender and race simultaneously. Note that in the latter case the sensitive variable is bidimensional. While this situation is quite general, using complicated information measures like mutual information (as proposed in [19]) increases dramatically the complexity of the problem. However, in our case, it is straightforward to deal with multidimensional sensitive variables.

In both cases we observe a similar behavior as in the toy example. Both the linear and kernel classical versions (LR and KRR) obtain relatively good classification error rates, but their dependence with the sensitive variables is relatively high. The use of fair versions open the possibility of decreasing this dependence while yielding similar classification errors. Results are better when using the kernel version FKR, which is capable of learning a model with low classification error rate and virtually independent of the sensitive features. Including the sensitive variables when using our proposed method obtains better results in the kernel case. In the linear case, removing the sensitive features has almost no impact on the results.

When it comes to the second dataset, we performed our experimentation over the sensitive variables: wife’s education, husband education, number of children ever born and also media exposure (features 2, 3, 4 and 9 respectively). The experiments were done by considering only one sensitive variable at a time. Figure 5 shows the results for all these protected variables. Several conclusions can be derived: (1) kernel fair regression outperforms the linear counterpart in all the hyperparameter space (both on error and fairness); (2) removing the sensitive feature degrades results, as its information is implicitly conveyed by other included features; and (3) one can achieve arbitrary fairness levels tuning the $\mu $ hyperparameter, at the cost of a moderately increased prediction error (+2–5% increase in classification error).

4.4 Results for Fair Dimensionality Reduction

We analyzed the performance of the proposed dimensionality reduction in the income prediction dataset. We present results of using a k-nn classifier ($k=1$) after reducing the dimensionality of the data set using different methods. In particular, we analyzed the standard Principal Components Analysis (PCA), Kernel PCA (KPCA), and the proposed fair dimensionality Reduction (FDR) and its kernel counterpart (KFDR). As in the previous experiment, we also analyze the solution with and without the sensitive features as inputs.

Figure 6 shows the solutions of different methods. In the case of PCA and KPCA we show results for different numbers of features, which affects the classification error but minimally the fairness score. In both experiments the best fairness-accuracy trade-offs are given by the FDR and KFDR when using all variables as inputs. In particular, when avoiding the gender discrimination, the proposed framework shows better classification error for the KFDR. When we use as sensitive variables gender and race the differences of the proposals with regard the classical methods are more noticeable since the classification errors are similar but the dependence achieved by the proposals are several orders of magnitude lower.

5 Conclusions

We have presented novel fair nonlinear regression and dimensionality reduction methods. We included a term to the cost function based on the Hilbert-Schmidt independence criterion which enforces fairness in the solutions and allows to deal with several sensitive variables simultaneously. We presented the methods in linear fashion and extended them to deal with nonlinear problems by using kernel functions. For both the linear and nonlinear cases, the solution for the regression weights and the basis functions in dimensionality reduction are expressed in closed-form, as they only involve solving matrix inversion or generalized eigenproblems respectively.

Tuning the fairness hyperparameter in regression allows us to input sensitive variables to the regression model while keeping the solution fair. This increases the information that can be used by the model during the prediction rather than just ignoring them. Methods performance were successfully illustrated using both synthetic and real data.

We would like to highlight that introducing kernels (and adopting HSIC) for fairness is not incidental: it allows us to achieve closed-form solutions, to trim fairness-fitness with a single hyperparameter, and to encode prior knowledge in a simple way. Interpretability of the models is obviously an issue and will be explored in the near future. While the framework aims to deal with ‘population fairness’, not with ‘individuals’ fairness’, this refinement can be easily included in our kernel formulations by defining an individual/group diagonal matrix $\mathbf {F}$ and replacing ${\mathbf X}$ with ${\mathbf X}\mathbf {F}$ ($\mathbf {I}$ with $\mathbf {F}^{-1}$ for the kernel formulations). As a future work, we also aim to include kernel conditional independence tests. The proposed framework could be easily extended to other machine learning algorithms, from neural networks to Gaussian processes.

Notes

1.
http://www.fatml.org/.
2.
The covariance matrix is ${\mathcal C}_{{\mathbf y}\mathbf{s}} = {\mathbb E}_{{\mathbf y}\mathbf{s}}({\mathbf y}\mathbf{s}^\top ) - {\mathbb E}_{{\mathbf y}}({\mathbf y}){\mathbb E}_{\mathbf{s}}(\mathbf{s}^\top )$, where ${\mathbb E}_{{\mathbf y}\mathbf{s}}$ is the expectation with respect to ${\mathbb P}_{{\mathbf y}\mathbf{s}}$, and ${\mathbb E}_{{\mathbf y}}$ is the marginal expectation with respect to ${\mathbb P}_{{\mathbf y}}$ (hereafter we assume that all these quantities exist).
3.
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html.

References

Arenas-García, J., Petersen, K.B., Hansen, L.K.: Sparse kernel orthonormalized PLS for feature extraction in large data sets. In: NIPS, vol. 19. MIT Press (2007)
Google Scholar
Arenas-García, J., Camps-Valls, G.: Efficient kernel orthonormalized PLS for remote sensing applications. IEEE Trans. Geos. Remote Sens. 46, 2872–2881 (2008)
Article Google Scholar
Barocas, S., Selbst, A.D.: Big Data’s Disparate Impact. SSRN eLibrary (2014)
Google Scholar
Brennan, T., Dieterich, W., Ehret, B.: Evaluating the predictive validity of the compas risk and needs assessment system. Crim. Justice Behav. 36(1), 21–40 (2009)
Article Google Scholar
Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. CoRR abs/1610.07524 (2016)
Google Scholar
Clarke, K.A.: The phantom menace: omitted variable bias in econometric research. Conflict Manag. Peace Sci. 22(4), 341–352 (2005)
Article Google Scholar
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. CoRR abs/1701.08230 (2017)
Google Scholar
Cunningham, M.D., Sorensen, J.R.: Actuarial models for assessing prison violence risk. Assessment 13(3), 253–265 (2006)
Article Google Scholar
Dieterich, W., Mendoza, C., Brennan, T.: COMPAS risk scales: demonstrating accuracy equity and predictive parity. Working paper, Northpointe Inc., Res. Dep. (2016)
Google Scholar
Dimitrakakis, C., Liu, Y., Parkes, D., Radanovic, G.: Subjective fairness: fairness is in the eye of the beholder. Technical report arXiv: 1706.00119 (2017)
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: ITCS 2012, pp. 214–226. ACM, New York (2012)
Google Scholar
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, pp. 259–268. ACM, New York (2015)
Google Scholar
Gómez-Chova, L., Nielsen, A.A., Camps-Valls, G.: Explicit signal to noise ratio in reproducing kernel Hilbert spaces. In: IGARSS, pp. 3570–3573, July 2011
Google Scholar
Gretton, A., Herbrich, R., Hyvärinen, A.: Kernel methods for measuring independence. J. Mach. Learn. Res. 6, 2075–2129 (2005)
MathSciNet MATH Google Scholar
Hoffman, M., Kahn, L.B., Li, D.: Discretion in hiring, Working Paper 16–055. Harvard Business School (2015)
Google Scholar
Kamiran, F., Calders, T.: Classifying without discriminating. In: 2009 2nd International Conference on Computer, Control and Communication, pp. 1–6, February 2009
Google Scholar
Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33(1), 1–33 (2012)
Article Google Scholar
Kamishima, T., Akaho, S., Asoh, H., Sakuma, J.: The independence of fairness-aware classifiers. In: 2013 IEEE 13th International Conference on Data Mining Work, pp. 849–858 (2013)
Google Scholar
Kamishima, T., Akaho, S., Asoh, H., Sakuma, J.: Fairness-aware classifier with prejudice remover regularizer. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS (LNAI), vol. 7524, pp. 35–50. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33486-3_3
Chapter Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Luo, L., Liu, W., Koprinska, I., Chen, F.: Discrimination-aware association rule mining for unbiased data analytics. In: Madria, S., Hara, T. (eds.) DaWaK 2015. LNCS, vol. 9263, pp. 108–120. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22729-0_9
Chapter Google Scholar
Pedreschi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 560–568. ACM, New York (2008)
Google Scholar
Ristanoski, G., Liu, W., Bailey, J.: Discrimination aware classification for imbalanced datasets. In: CIKM 2013, pp. 1529–1532. ACM, New York (2013)
Google Scholar
Ruggieri, S., Pedreschi, D., Turini, F.: Data mining for discrimination discovery. ACM Trans. Knowl. Discov. Data 4(2), 9:1–9:40 (2010)
Article Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels - Support Vector Machines, Regularization, Optimization and Beyond. MIT Press Series, Cambridge (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. CUP, Cambridge (2004)
Book MATH Google Scholar
Worsley, K.J., Poline, J.B., Friston, K.J., Evans, A.C.: Characterizing the response of pet and fMRI data using multivariate linear models. Neuroimage 6, 305–319 (1998)
Article Google Scholar
Zafar, M.B., Valera, I., Rodriguez, M.G., Gummadi, K.P.: Learning fair classifiers, May 2016. http://arxiv.org/abs/1507.05259
Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness constraints: mechanisms for fair classification. In: Singh, A., Zhu, J. (eds.) Proceedings of the 20th International Conference on Artificial Intelligence and Statistics Proceedings of Machine Learning Research, vol. 54, pp. 962–970. PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017
Google Scholar
Zemel, R.S., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: ICML (3), vol. 28, pp. 325–333 (2013)
Google Scholar
Zeng, J., Ustun, B., Rudin, C.: Interpretable classification models for recidivism prediction. J. R. Stat. Soc. Ser. A (Stat. Soc.) 180, 689–722 (2016)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Image Processing Laboratory (IPL), Universitat de València, Spain, C/Cat. José Beltrán, 2, 46980, Paterna, València, Spain
Adrián Pérez-Suay, Valero Laparra, Gonzalo Mateo-García, Jordi Muñoz-Marí, Luis Gómez-Chova & Gustau Camps-Valls

Authors

Adrián Pérez-Suay
View author publications
You can also search for this author in PubMed Google Scholar
Valero Laparra
View author publications
You can also search for this author in PubMed Google Scholar
Gonzalo Mateo-García
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Muñoz-Marí
View author publications
You can also search for this author in PubMed Google Scholar
Luis Gómez-Chova
View author publications
You can also search for this author in PubMed Google Scholar
Gustau Camps-Valls
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrián Pérez-Suay .

Editor information

Editors and Affiliations

Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Aalto University School of Science, Espoo, Finland
Jaakko Hollmén
University of Ljubljana, Ljubljana, Slovenia
Ljupčo Todorovski
KU Leuven Kulak, Kortrijk, Belgium
Celine Vens
Jožef Stefan Institute, Ljubljana, Slovenia
Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pérez-Suay, A., Laparra, V., Mateo-García, G., Muñoz-Marí, J., Gómez-Chova, L., Camps-Valls, G. (2017). Fair Kernel Learning. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2017. Lecture Notes in Computer Science(), vol 10534. Springer, Cham. https://doi.org/10.1007/978-3-319-71249-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-71249-9_21
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71248-2
Online ISBN: 978-3-319-71249-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fair Kernel Learning

Abstract

Similar content being viewed by others