Can local explanation techniques explain linear additive models?

Rahnama, Amir Hossein Akhavan; Bütepage, Judith; Geurts, Pierre; Boström, Henrik

doi:10.1007/s10618-023-00971-3

Can local explanation techniques explain linear additive models?

Open access
Published: 19 September 2023

Volume 38, pages 237–280, (2024)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Can local explanation techniques explain linear additive models?

Download PDF

Amir Hossein Akhavan Rahnama ORCID: orcid.org/0000-0002-6846-5707¹,
Judith Bütepage¹,
Pierre Geurts² &
…
Henrik Boström¹

1046 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Local model-agnostic additive explanation techniques decompose the predicted output of a black-box model into additive feature importance scores. Questions have been raised about the accuracy of the produced local additive explanations. We investigate this by studying whether some of the most popular explanation techniques can accurately explain the decisions of linear additive models. We show that even though the explanations generated by these techniques are linear additives, they can fail to provide accurate explanations when explaining linear additive models. In the experiments, we measure the accuracy of additive explanations, as produced by, e.g., LIME and SHAP, along with the non-additive explanations of Local Permutation Importance (LPI) when explaining Linear and Logistic Regression and Gaussian naive Bayes models over 40 tabular datasets. We also investigate the degree to which different factors, such as the number of numerical or categorical or correlated features, the predictive performance of the black-box model, explanation sample size, similarity metric, and the pre-processing technique used on the dataset can directly affect the accuracy of local explanations.

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Article Open access 08 March 2021

Counterfactual explanations and how to find them: literature review and benchmarking

Article Open access 28 April 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

As machine learning models have become more complex, the need for techniques that explain the decision-making process of the black-box models has grown (Molnar et al. 2022; Rudin 2018; Ribeiro et al. 2016). To make the decision-making process more accessible to humans, explanation techniques can be used to estimate the importance of features of the data to the model’s predicted output. Explanation techniques extract the information from the black-box model in a post-hoc manner, i.e., based on the model that is already trained on a dataset (Molnar et al. 2022; Montavon et al. 2018). Explanations can have different representations, such as logic rules (Ribeiro et al. 2018), example-based explanations (van der Waa et al. 2021) and, arguably the most popular type of explanation in the literature, feature attributions (Ribeiro et al. 2016; Lundberg and Lee 2017). The focus of our study is feature attribution techniques that can explain the predicted output of any class of machine learning models for a single instance in a dataset. These techniques can be further divided into additive vs. non-additive explanations. The sum of importance scores in a local additive explanation equals the predicted output score for the explained instance (Lundberg and Lee 2017).

In Rudin (2018), the author argues that local explanations,^{Footnote 1} such as LIME and SHAP, can be inaccurate and should not be used in high-stake decision-making domains. The main underlying reason for this argument is the infidelity (inaccuracy) of explanations. The study includes examples of the failure cases of explanations in object detection scenarios. Similarly, other studies have evaluated local explanations of neural networks trained on text and image data^{Footnote 2}. However, the majority of the datasets in high-stake decision-making scenarios, e.g., health and diagnostic (Hakkoum et al. 2022), law (Wang et al. 2022) and so forth are tabular datasets. The question of the explanation accuracy of models used in these high-stake domains is of critical importance.

In this work, we propose to evaluate explanation techniques not when explaining black-box models but when explaining linear additive models, such as Linear and Logistic regression trained on tabular datasets. In particular, we investigate whether local model-agnostic additive explanations can explain linear additive models with high explanation accuracy. We demonstrate how to extract Model-Intrinsic Additive Scores (MIAS) from these models that can directly be compared to the feature importance scores generated with a local explanation technique (see Sect. 4.1 for the definition of explanation accuracy that we employ in this study and more details).

One might wonder whether the answer to this research question is essential since linear additive models are intrinsically interpretable and are not representative black boxes. We show that since we can extract local ground truth importance scores from linear additive models and measure the explanation accuracy directly, testing the ability to explain these models can serve as a sanity check for evaluating local additive explanation techniques. This evaluation should be the first step when designing new evaluation techniques. If an explanation technique cannot accurately explain a simple model, we cannot trust its explanations of black-box models either.

One of the most important aspects of evaluating local explanations is understanding the factors affecting the explanation’s accuracy. Some studies have studied factors that can cause the accuracy of local explanations to decrease (Molnar et al. 2022; Gosiewska and Biecek 2019). The authors have pointed out three main factors that can affect the accuracy of local model-agnostic explanations: (1) The presence of categorical features, (2) The presence of correlated features in the dataset, and (3) Explaining models with low predictive performance. Even though these limitations are frequently mentioned in the literature (Molnar et al. 2022; Gosiewska and Biecek 2019; Guidotti 2021) , the investigations are not conclusive for tabular datasets, and the degree to which these factors can contribute to the accuracy of explanations is not well studied beyond simple cases of synthetic datasets. When explaining linear additive models, our study investigates the aforementioned factors’ effect on synthetic and real tabular datasets. Moreover, we show that the accuracy of local explanations is affected by other factors, e.g., the explanation sample size, the choice of similarity metric, and the prepossessing technique used on the dataset.

In our investigation, two widely used techniques for generating local additive model-agnostic explanations, Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al. 2016) and SHapley Additive exPlanations (SHAP) (Lundberg and Lee 2017), are evaluated along with the non-additive explanation technique Local Permutation Importance (LPI) (Casalicchio et al. 2018). The reason for including LPI in our study is to examine how a technique that does not rely on the “additivity” of the local explanations can still produce accurate explanations for linear additive models. We evaluate the explanation accuracy for regression and classification tasks, using linear regression models for the former and logistic regression and Gaussian Naive Bayes models for the latter.

In conclusion, our contributions are

1.
We present a novel principled method to extract the local ground truth model-intrinsic importance scores from additive terms in linear additive models.
2.
Based on these scores, we describe how to measure the explanation accuracy of local explanation techniques directly, thus providing a sanity check for these methods.
3.
Using our proposed accuracy measure, we show that the previously mentioned factors can indeed influence explanation accuracy.

The key findings from the empirical investigations are: (1) LIME and SHAP pass the proposed sanity check for Linear Regression models, (2) The explanation techniques frequently fail the proposed sanity check when explaining Logistic Regression and naive Bayes models, (3) The explanation accuracy of additive explanations of LIME and SHAP is overall larger than for the non-additive local explanations of LPI when explaining linear additive models, (4) In some datasets, LPI explanations are more accurate than explanations of LIME and SHAP when explaining linear additive classification models, even though LPI explanations are not additive (5) All of the aforementioned factors may significantly affect explanation accuracy, however, their effect is largely dependent on the type of model explained and the explanation technique itself, and (6) The most accurate local explanations are not necessarily the the most robust^{Footnote 3} and vice versa.

The rest of the paper is organized as follows. We provide an extensive background on evaluating local explanations in Sect. 2.2. In Sect. 3, we provide a motivating example that shows the limitations of current evaluation measures for evaluating local additive models and highlights the key differences between the proposed evaluation method with other previously proposed approaches. We formally introduce the evaluation method in Sect. 4. In Sect. 5, we empirically study the accuracy of explanation techniques on 40 tabular datasets using the proposed evaluation framework. We discuss the most important findings and the limitations of our study in Sect. 6, and finally, we summarize the main conclusions and point out directions for future research in Sect. 7.

2 Background

Explanation techniques can be divided into global vs. local techniques and model-agnostic vs. model-based techniques. Global explanation techniques (Breiman 2001) provide importance scores for features with respect to a dataset (Freitas 2014). Local explanation techniques (Ribeiro et al. 2016; Lundberg and Lee 2017) provide importance scores for a prediction of a single instance (Ribeiro et al. 2016). Model-agnostic explanation techniques (Ribeiro et al. 2016) can produce explanations for any type of black-box model (Ribeiro et al. 2016). On the other hand, model-based explanation techniques (Zeiler and Fergus 2014) are tailored for one type of machine learning model (Montavon et al. 2018). We focus on local model-agnostic explanation techniques. These can be further divided into additive vs. non-additive explanations. The sum of importance scores in a local additive explanation equals the predicted output for the explained instance (Lundberg and Lee 2017). In contrast, local non-additive explanations do not satisfy the additivity criterion (Lundberg and Lee 2017). Some of the most popular explanation techniques, such as LIME and SHAP, fall into the former category.

In this section, we first formalize local explanations, as produced by LIME, SHAP, and LPI, and then discuss methods to evaluate such explanations.

2.1 Local explanations

We first present a formalization of local additive explanations in our study, i.e. LIME and SHAP, based on the notation used in Lundberg and Lee (2017). As discussed in Sect. 1, in local explanations, a black-box model’s predicted output is decomposed into an additive sum of feature importance scores. In simpler words, each feature importance score is the contribution of that feature to the predicted output of the explained model. The formal representation of local explanations, as produced by LIME and SHAP, is shown in Eq. 1. In this equation, the black box model f predicted probability for a designated class given instance x is decomposed into an additive sum and $\phi _j$ is the local feature contribution of feature j and $x_j$^{Footnote 4} is the value of feature j in x.

$$\begin{aligned} f(x) = \sum _{j=1}^M \phi _j x_j \end{aligned}$$

(1)

Local Permutation Importance (LPI) is a local non-additive model-agnostic explanation technique. The core idea behind LPI (Casalicchio et al. 2018) is that the importance of a feature can be estimated by the average change of a black-box’s predicted output when the value of this feature is replaced by another value. To change the feature value, LPI randomly permutates feature values of a single dimension across all data points in a given dataset.

More formally, LPI is calculated as follows. Let $\pi $ be a random permutation of the index sequence $\langle 1, \ldots , N\rangle $, and let $\pi _i$ denote the position of index i in $\pi $. The importance of feature j at $x_n$ is then defined as:

$$\begin{aligned} \Phi _n^j = \frac{1}{N} \sum _{k=1}^N\left( f({{\hat{x}}_k}) - f(x_n)\right) \end{aligned}$$

(2)

where ${\hat{x}}_k$ is defined as follows:

$$\begin{aligned} {\hat{x}}_{k}^l = {\left\{ \begin{array}{ll} x_{n}^l &{} l \ne j \\ x_{\pi _k}^j &{} l = j, \end{array}\right. } \end{aligned}$$

(3)

where $k \in [1, N]$ and $l \in [1, M]$. In simpler terms, ${\hat{x}}_{k}$ is equal to $x_n$ except that the value of the jth feature is replaced by $x_{\pi _k}^j$. It is noteworthy that in our study, f(x) is the log odds ratio prediction function of class c instead of the predicted values $f(x_n)$ for Logistic Regression and Naive Bayes models.

2.2 Evaluating local explanations

The evaluation methods for local explanations are categorized into the human evaluation and functionally grounded evaluation methods (Doshi-Velez and Kim 2017). In human evaluation methods, the accuracy of a local explanation is measured by how accurately human subjects can guess the prediction of black-box models when they have only access to the explanation (Poursabzi-Sangdeh et al. 2021). Since human studies are costly and time-consuming, functionally grounded evaluation methods use different proxies to measure the quality of local explanations. This study focuses on evaluating local explanations using the latter techniques.

Evaluating local explanations using the functionally-grounded method is challenging. We should remember that we need local explanations, or explanations in general, because we do not understand black boxes. For a direct evaluation of explanations, ground truth importance scores are information only directly accessible when we can understand the model. On the other hand, black-box models are models we cannot understand. Because of this, all evaluation methods of local explanations either measure the explanations indirectly, e.g., robustness measures, or they induce further assumption of the data generation process or the model type explained. Because of this, we need to consider that these measures study different characteristics of an explanation.

This section provides a background on each of these evaluation procedures. The majority of studies that have evaluated local explanations have focused on three categories of evaluation procedures: evaluating explanations using robustness measures (Sect. 2.2.1) , using ground truth feature importance scores from synthetic datasets (Sect. 2.2.2) , and using interpretable models (Sect. 2.2.3). Our proposed method belongs to the latter category.

2.2.1 Robustness measures

Most studies on evaluating local explanations, especially for neural network models, use the robustness measures (Fong and Vedaldi 2017; Montavon et al. 2018; Alvarez-Melis and Jaakkola 2018). Robustness measures do not rely on the ground truth importance scores to evaluate explanations (Alvarez-Melis and Jaakkola 2018; Montavon et al. 2018; Adebayo et al. 2018; Lakkaraju et al. 2020; Agarwal et al. 2022). Instead, the main assumption of these measures is that nullifying important (unimportant) features from a local explanation needs to cause large (small) changes in the predicted scores of the black-box models of that instance. In these measures, the black-box model is used as an oracle to extract new prediction scores on the new variation of the explained instance after subsets of its features are nullified. Measures such as faithfulness (Alvarez Melis and Jaakkola 2018), fidelity (Amparore et al. 2021), Prediction Gap on Important Features (PGI), and Prediction Gap on Unimportant Features (PGU) (Agarwal et al. 2022) are all variations of robustness measures. The main reasons behind the popularity of robustness measures are: (1) There is no need to access ground truth importance scores for evaluating local explanations (2) They can evaluate local explanations of arbitrary datasets and explained models..

Our study uses the prevalent Deletion and Preservation robustness measures initially proposed in Fong and Vedaldi (2017); Samek et al. (2016). Our definitions follow the notation from Hsieh et al. (2020). Let $S_{r} \subset U$ be the set of top-K features ranked in descending order by their absolute importance scores obtained from an explanation technique (K is a hyper-parameter). Let ${\bar{S}}_r = U \setminus S_r$ where U is the set of all features. Deletion measures the absolute change in a black box’s predicted output after replacing feature values in $S_r$ with a baseline value. Similarly, Preservation reflects the absolute change in the predicted output of a black-box model following the replacement of feature values in ${\bar{S}}_r$ with a baseline. The baseline value can be a binary value or the average value of the corresponding feature in the training or validation set (Fong and Vedaldi 2017). There are no agreed optimal values for these robustness measures. However, a robust explanation should have relatively large Deletion and low Preservation values (Montavon et al. 2018).

Robustness measures are intrinsically prone to have the following limitations: First, since robustness measures are not calculated based on the ground truth importance scores, we cannot argue that robust explanations are directly accurate (We show an example of this limitation in Sect. 3). Second, the prediction of an instance after removing its features can cause out-of-distribution predictions or at worst, can turn the instance into an adversarial example. Hence the predictions of the oracle can no longer be trusted to evaluate local explanation. Rahnama and Boström (2019); Hooker et al. (2019); Hsieh et al. (2020). Third, there is a lack of agreement on a unified approach to nullify features (Sturmfels et al. 2020). Fourth, there are no agreements on the most optimal threshold of the magnitude of the change in the predicted probability of the model after (important) unimportant features are removed (Alvarez-Melis and Jaakkola 2018; Sturmfels et al. 2020).

In Hooker et al. (2019), the authors propose an extra step of retraining the model after nullifying important features to avoid the problem of out-of-distribution prediction of the explained model. This is mainly to tackle the second limitation of the robustness measure. In their study, the model-based explanations of CNN, such as Integrated Gradients (IG) and Guided Backpropagation, showed low robustness on neural network models trained on the ImageNET dataset. However, the authors do not provide empirical or theoretical evidence that the retrained model will have the same properties as the original model we intend to explain. In addition, studies have shown that the correlation relationship among features does not hold in the new model after the retraining step (Nguyen and Martínez 2020). In Agarwal et al. (2022), the authors propose the OpenXAI framework that includes numerous robustness measures. They showed that model-based gradient explanation techniques such as Gradient$*$Input (Shrikumar et al. 2016) provided more robust explanations than LIME and SHAP explanations across numerous datasets.

2.2.2 Ground truth from synthetic datasets

Some studies have proposed to evaluate explanations directly based on extracting ground truth importance scores from synthetic datasets. These studies aim to tackle the first limitation of the robustness measure, as discussed in the previous section. The core assumption behind these evaluation methods is that obtaining ground truth from the black-box models on arbitrary data is challenging. Therefore, we can simplify the data these models are trained on. Using specific data generation processes enables these methods to control the importance of each feature for the generated labels prior to the training phase of explained models. Local explanations that provide feature importance scores similar to these priors are considered the most accurate.

The SenecaRC algorithm (Guidotti 2021) generates data from a polynomial function that can include varying operators such as sin or cos in its polynomial terms. After that, a sample is generated based on the chosen polynomial function. Lastly, the algorithm returns the ground truth importance scores for the explained instance x based on the following steps: (1) the closest instance $x*$ to x on the decision boundary of an explained model, g, is found, and (2) the derivative of the ground truth polynomial is evaluated at this point and returned as true importance scores for x.

In Liu et al. (2021), the authors provide a set of synthetic datasets and evaluate the quality of local explanations using (robustness) measures such as faithfulness and fidelity. SHAP and SHAPR (Aas et al. 2021) explanations were observed to have higher faithfulness compared to LIME and Model Agnostic SuPervised Local Explanations (MAPLE) explanations for the considered set of synthetic datasets. In their evaluation, the authors show that LIME, SHAP, and MAPLE (Plumb et al. 2018) explanations fail to provide accurate explanations for (synthetic) tabular datasets with large numbers of uninformative features.

In Agarwal et al. (2022), the authors proposed a synthetic SynthGauss dataset. They argue that their proposed dataset is more suitable for evaluating explanations than the dataset in Liu et al. (2021) since features are independent in their proposed dataset and local neighborhoods do not overlap in the dataset. In their study, model-based gradient explanations such as SmoothGrad (Omeiza et al. 2019)was observed to outperform LIME and SHAP explanations across numerous datasets.

The main limitations of evaluation approaches based on synthetic ground truth are two-fold: (1) Since the priors of feature importance scores are set before the explained model is trained, there are no guarantees that the model has learned the relationship between features and the label in the synthetic dataset according to our prior importance scores (Faber et al. 2021) (2) Synthetic datasets are not complex in terms of empirical feature distribution and interactions between their features unlike many tabular datasets (Guidotti 2021). As a result, we cannot directly conclude that since a local explanation is inaccurate on these synthetic datasets, it is also inaccurate on larger and more complex datasets.

2.2.3 Ground truth using interpretable models

As mentioned earlier, we cannot directly extract ground truth importance scores from complex black-box models. The extraction of the ground truth can be made easier if we explain a simpler class of machine learning models. The methods that extract ground truth importance scores from interpretable models follow this assumption. The strength of these evaluation methods is that we are no longer restricted to evaluating local explanations on simplified datasets. These methods obtain the ground truth importance scores extracted directly from the trained model. Unlike the ground truth from synthetic datasets, we can guarantee that these importance scores are directly extracted from the knowledge that exists in the trained model. However, this comes at the cost of only being able to evaluate simple models. This type of evaluation can thus only be used as a sanity check and not to evaluate the accuracy of any model on any dataset.

In Agarwal et al. (2022), the authors proposed to extract ground truth importance scores from the weights of Logistic Regression. Based on their evaluation, model-based explanations such as SmoothGrad have larger similarities to their proposed ground truth than LIME and SHAP explanations. The main limitation of their baseline for extracting ground truth is that the authors have used the weights of Logistic Regression, a global explanation, as their baseline for evaluating local explanations. Global explanations are one vector of the feature importance scores for an entire dataset that is equal for all instances (Freitas 2014). On the other hand, local explanations exhibit properties of the locality of that instance in the data input space (Ribeiro et al. 2016). Based on this, measuring the similarity of all unique local explanations for each instance to the global explanation can lead to incorrect conclusions. ^{Footnote 5}.

In this study we propose a method that extracts the local ground truth importance scores for three linear additive models, linear and logistic regression and Naive Bayes. In contrast to the aforementioned method, we thus know how much each feature contributes to the model’s predicted output and can directly compare the importance scores generated by a local explanation technique.

3 Motivating example

In this section, we show an example that highlights reasons the current evaluation methods cannot provide the correct evaluation method for evaluating local explanations of linear additive models. Let us reiterate that we expect a local ground truth importance score to include some of the instance’s locality in the model’s decision space. We show that the currently available approaches either allocate equal ground truth measures to all instances, disregarding the instance locality, or fail to allocate the correct importance to all features. Our example uses Seneca-RC’s synthetic dataset generation and compares the baselines from synthetic datasets proposed by Guidotti (2021), the ground truth proposed by Agarwal et al. (2022) and robustness measures (Hsieh et al. 2020; Fong and Vedaldi 2017).

Let $Y = 2 x_0 - x_1$ be the data generation process where features $x_0$ contribute positively and $x_1$ negatively to the label. We sample one thousand instances from Seneca-RC’s data generation process where no extra redundant features are added, and we set the noise level to 0.3. We train a Logistic Regression model on this generated dataset^{Footnote 6}. The model achieves a test accuracy of 0.98 on this dataset. The decision boundary (see Fig. 1 ) shows that the model has correctly identified that both features in combination are important for separating instances from different classes. The arrows on the top of each instance represent the ground truth importance scores based on each evaluation method.

The Seneca-RC ground truth importance scores are all equal to the vector, $[1, -1]$, irrespective of the position of the instance in the prediction space or the decision boundary of the model. This is because the derivative of the data generation process with respect to each feature is a constant value. Therefore, the ground truth from Seneca-RC does not reflect the true locality of instances in the decision space of the trained Logistic Regression model (Sect. 2.2.2).

The ground truth of OpenXAI (Agarwal et al. 2022) is also constant across all instances. This is because the Logistic Regression model weights are directly used as the baseline for obtaining ground truth importance scores for all instances in this approach. Since the model weights are the summary of the importance of features for all instances, all instances are then evaluated using one equal ground truth scores regardless of their position in the decision space (see Sect. 2.2.3 for details).

For the robustness measures, we no longer show the ground truth but the robustness values of each feature for every instance. The value of the arrow on top of instances shows the absolute change in the predicted scores of class one (blue circles) after that feature is nullified separately. We nullify each feature using the average values of that feature in the dataset as it is generally practiced in tabular datasets (Liu et al. 2021; Montavon et al. 2018; Molnar et al. 2022) . We can see that for most instances, robustness measures do not set any importance to the second feature on the y-axis even though it plays an important role in the linear boundary of logistic regression and the data generation process. Moreover, an instance will receive zero robustness by default along an axis, i.e. for a feature, if its feature values are similar to the empirical average of each feature. This is because nullifying those features will not affect the predicted output.

On the other hand, our proposed Model-Intrinsic Additive Scores (MIAS) allocate different values for instances that are located on the decision plane of the Logistic Regression model. As we show later in Sect. 4.3, the MIAS score of Logistic Regression models sets importance to both features in explaining the log odds ratio of the model. We can also see that the instances will then have arrows toward the subspace with maximum log odds of their predicted class visualized by the shades in the background. We can see that the MIAS vectors of instances close to the decision boundary are more different since the uncertainty in the model’s predicted output is larger in those parts of the plane. In the next Section, we present how we can calculate the MIAS scores of linear additive models such as Linear and Logistic Regression and Gaussian Naive Bayes.

4 Evaluation methodology

In this section, we introduce our proposed evaluation framework in detail. As shown in Sect. 3, all current evaluation measures have shortcomings in the way in which ground truth importance scores are allocated for linear additive models.

Our study proposes a new method for evaluating local model-agnostic explanations of linear additive models. Our evaluation methods fall into the category of evaluation methods using interpretable models (Sect. 2.2.3). Unlike the work of Agarwal et al. (2022), we follow a more principled method for the evaluation of local explanations. We extract the ground truth by extracting individual additive terms from the prediction function of any class of linear additive models, e.g., Logistic and Linear Regression and Naive Bayes.

Our approach can extract ground truth importance scores from models where the prediction function is linear additive, e.g. in Linear regression models. Moreover, we also extract the ground truth for models in cases where the prediction function is not linear additive directly but can be transformed into a linear additive function such as in Logistic Regression and Naive Bayes models. As shown in Sect. 3, our ground truth importance scores allocate the ground truth on a single instance level.

In Sect. 4.1, we discuss the main logic behind our evaluation method to extract our so-called Model-intrinsic Additive Score (MIAS) for linear additive models such as Linear and Logistic Regression and Gaussian Naive Bayes. Lastly, we argue for the choice of similarity metric in Sect. 4.5.

4.1 Model-intrinsic additive scores

As we mentioned, we follow a more principled approach to extract our ground truth importance scores.We formulate the problem as follows. In Eq. 1, we can see that we have one linear additive decomposition of f(x). If the prediction function f can also be represented as an additive sum similar to Eq. 1 like the following:

$$\begin{aligned} f(x) = \sum _{j=1}^M \lambda _j x_j, \end{aligned}$$

(4)

we can measure the explanation accuracy by measuring the similarity of individual additive terms $\phi _j x_j$, importance scores of feature j in Eq. 1, to $\lambda _j x_j$. This is possible as both equations are linear additive decompositions of f(x). An additive structure like Eq. 4 is directly visible in linear additive models such as linear regression and can also be extracted in Logistic Regression and Naive Bayes models. Even though these additive structures have long existed in the machine learning literature, they have to the best of our knowledge, not been used as a means to evaluate local explanations.

Definition 1

Local Explanation accuracy: Let $\Phi $ be a local explanation for instance x. The local explanation accuracy is defined as $\sum _{j=1,..., M} d(\phi _j x_j, \lambda _j x_j)$ where $\lambda _j$ is the weight for feature j in form of 4 and d is a similarity metric. Based on this, we call $\lambda _j x_j$ a Model-Intrinsic Additive Score (MIAS) for feature j.

We want to highlight that our proposed MIAS score includes the input instance’s feature value in calculating the ground truth. In other words, unlike global explanations, each MIAS score is specific to the single instance explained. The inclusion of feature values in calculating our ground truth is similar to the proposal of Liu et al. (2019) in which the input feature values are used for obtaining the local gradient-based explanations for neural network models.

Algorithm 1 summarizes the logic of our evaluation framework. On a high level, to evaluate an explanation technique g of the linear additive model f, we extract the Model-intrinsic Additive Scores (MIAS) $\Lambda $. After that, we obtain a local model-agnostic explanation $\Phi $ for a single instance $x_n$ from the explanation technique g. We then compute the similarity between $\Lambda $ and $\Phi $ using similarity metric $\rho $, i.e. $ rho (|| \Lambda , \Phi ||) $.

In general, we are interested in comparing explanation accuracy across different datasets. Therefore, we run Algorithm 1 over the test sets of each dataset. The higher the average values of $r_{x_n, f, g}$ are over a test set, the more accurate the explanations of g when explaining model f are for that dataset.

The logic behind the extraction of MIAS importance scores for Linear Regression, Logistic Regression, and Gaussian Naive Bayes models are discussed in Sects. 4.2, 4.3 and 4.4 respectively (function t in Algorithm 1).

4.2 Linear regression

As we said earlier, the linear regression model has a linear additive structure in the following form:

$$\begin{aligned} f(x) = w_0 + \sum _{j=1}^M w_j x_j \end{aligned}$$

(5)

where $w_j$ is the weight for feature j and $w_0$ represents the intercept and $x_j$ is the j-th component of x. In our study, we consider $w_j x_j$ as the MIAS score for the contribution of feature j to the predicted output of f(x), i.e. $\Lambda =w_j x_j $.

4.3 Logistic regression

Given weights $w \in {\mathbb {R}}^{M+1}$ and an instance $x_n \in {\mathbb {R}}^{M}$, a logistic regression model is defined as:

$$\begin{aligned} P(y_n = c \,\vert \,x_n, w) = \frac{1}{1 + e ^{{-\sum _{m=0}^{M} w^m x_n^m}}} \end{aligned}$$

(6)

where $x_n^0 = 1$. Even though there is no direct linear additive form of this prediction function, we can derive an additive decomposition of a model prediction using the log odds ratio for $x_n$ concerning class $c \in \{0,1\}$:

$$\begin{aligned} log \frac{P(y_n = c \,\vert \,x_n, w )}{P(y_n = \lnot c \,\vert \,x_n, w )} = \sum _{m=0}^{M} w^m x_n^m \end{aligned}$$

(7)

where $\lnot c$ is the complement of class c and $\lambda _n^m = w^m x_n^m$ is the Model-Intrinsic Additive Score (MIAS) for feature m. Note that in this case, we explain the log odds and therefore, $f(x) \leftarrow \log \frac{P(y_n = c \,\vert \,x_n, w )}{P(y_n = \lnot c \,\vert \,x_n, w )}$ in Eq. 4.

In Fig. 2, we compare the weights of a Logistic Regression model (its global explanations) to the MIAS scores obtained for a single test instance of the Pima Indians dataset. Notice that the global explanation will be the same for all test instances, whereas MIAS scores are different for each instance.

4.4 Naive Bayes

Given input $x_n = (x_n^1,..., x_n^M)$ and a mean and variance vector, $\mu _c \in {\mathbb {R}}^{M}$ and $\sigma _c \in {\mathbb {R}}^{M}$, we can apply the Bayes theorem:

$$\begin{aligned} P(y_n = c \,\vert \,x_n) = \frac{P(x_n \,\vert \,y_n = c ) P(y_n = c)}{P(x_n)}, \end{aligned}$$

(8)

where the likelihood $P(x_n \,\vert \,y_n = c )$, under the naive assumption of conditional independence, can be computed as:

$$\begin{aligned} \prod _{m=1}^M P(x_n^m \,\vert \,y_n = c ) = \prod _{m=1}^M {\mathcal {N}}(x_n^m \,\vert \,\mu _c^m, \sigma _c^m). \end{aligned}$$

(9)

Similar to the case of logistic regression, the prediction function does not naturally decompose into additive parts. However, the log odds ratio for an instance $x_n$ for class c has an intrinsic natural additive decomposition:

$$\begin{aligned} \text {log} \frac{P(y_n = c \,\vert \,x_n)}{P(y_n = \lnot c \,\vert \,x_n)} = \sum _{m=1}^{M} \text {log} \frac{ {\mathcal {N}}(x_n^m \,\vert \,\mu _c^m, \sigma _c^m) }{{\mathcal {N}}(x_n^m \,\vert \,\mu _{\lnot c}^m, \sigma _{\lnot c}^m)} + const. \end{aligned}$$

(10)

where $const = \text {log}\frac{P(y_n=c)}{P(y_n=\lnot c)}$. Based on this, the MIAS importance scores of feature m is $\lambda _n^m = \text {log} \frac{ {\mathcal {N}}(x_n^m \,\vert \,\mu _c^m, \sigma _c^m) }{{\mathcal {N}}(x_n^m \,\vert \,\mu _{\lnot c}^m, \sigma _{\lnot c}^m)}$. Note that in this case, instead of f(x), we explain the log odds ratio prediction in Eq. 4.

In Fig. 3, an example of our MIAS scores for a single instance is visualized in comparison to explanations of LIME, SHAP, and LPI for the Pima Indians dataset for the Logistic Regression model. See the appendix for a similar visualization for the Naive Bayes model on this dataset.

4.5 Similarity measure

We measure accuracy in terms of how similar an explanation is to the MIAS importance scores. Several studies have used measures such as Cosine or Euclidean distance (Montavon et al. 2018; Yang and Kim 2019) to measure the similarity of explanations.

Similar to Ghorbani et al. (2019), we argue that the Spearman’s Rank correlation sometimes may be a more suitable measure for comparing explanations in tabular datasets, as it is not affected by the absolute values of importance scores but only the ranking of these values. Additionally, the interpretation of explanations might differ across different types of explanation techniques. The rank correlation measure makes it possible to compare feature importance scores between additive and non-additive explanations techniques that cannot be directly compared. Lastly, in contrast to Euclidean and Cosine similarity, the metric comes with interpretable measures of direction and strength. One drawback of using a rank-based measure is that it might be sensitive if a dataset has many unimportant feature dimensions. In this case, the performance across all explanation techniques will be low as the ranking of unimportant features will vary randomly.

An incorrect choice of a similarity metric that does not fit the use case may lead to wrong conclusions. To illustrate this, we provide an example comparing two local explanations using Euclidean and Cosine similarity along with Spearman’s rank correlation. Suppose we need to measure the accuracy of two different explanations $\phi _1 = [0.21, 0.1, 0.32]$ and $\phi _2 = [0.21, 0.3, 0.12]$ to the ground truth score $\lambda = [0.32, 0.2, 0.42]$.

$$\begin{aligned} \begin{array}{rl} Euclidean\, S(\lambda , \phi _1) &{}= 0.179 \\ Spearman\, C(\lambda , \phi _1) &{}= 1 \\ Cosine\,S(\lambda , \phi _1) &{}= 0.99 \\ \end{array} \qquad \qquad \qquad \begin{array}{rl} Euclidean\, S(\lambda , \phi _2) &{}= 0.28 \\ Spearman\, C(\lambda , \phi _2) &{}= -1 \\ Cosine\, S(\lambda , \phi _2) &{}= 0.81 \\ \end{array} \end{aligned}$$

Based on Spearman’s rank correlation, the ranking of $\phi _1$ correlates perfectly with $\lambda $, while the ranking of $\phi _2$ negatively correlates with $\lambda $. Using this rank-based metric, we can thus conclude that explanation $\phi _1$ is more accurate than $\phi _2$. The Euclidean,^{Footnote 7} and Cosine Similarity instead vote in favor of $\phi _2$ as the more accurate explanation. We show the role of similarity metric in our experiments later in Sect. 5.2.6.

5 Empirical investigation

In this section, we will present the results of our empirical investigation. We describe the experimental setup in Sect. 5.1. After that, we provide in Sect. 5.2 the result of our empirical experiments on local explanation accuracy for all explanation techniques and models considered.^{Footnote 8}

5.1 Experimental setup

In this section, we describe our experimental setup for the dataset and models that we used for obtaining explanations in Section 5.1.1. After that, we provide some information about the hyperparameters for generating explanations in Sect. 5.1.2.

5.1.1 Data and model

We assess the proposed evaluation framework using the total of 40 different tabular datasets concerning both (binary and multi-class) classification and regression tasks. All the datasets are publicly available at the UCI, Kaggle, or Keel repositories.^{Footnote 9} Unless otherwise stated, the numerical features are standardized and categorical features are one-hot encoded. For datasets for which no separate test set has been provided at the source, a random hold-out set of 25% was used. The information for each dataset is shown in the appendix.

We trained logistic and linear Regression along with Gaussian naive Bayes models using the aforementioned datasets. To tune the hyper-parameters of the logistic regression models, grid-search was employed. Hyper-parameters were chosen after 100 trials with the hyper-parameter space consisting of L1 and L2 regularization with the regularization parameter selected from a grid of values between 0 to 4. Tables 1 and 2 report the test accuracy of the models.

Table 1 Information about the datasets used in our study for classification tasks

Can local explanation techniques explain linear additive models?

Abstract

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

Counterfactual explanations and how to find them: literature review and benchmarking

1 Introduction

2 Background

2.1 Local explanations

2.2 Evaluating local explanations

2.2.1 Robustness measures

2.2.2 Ground truth from synthetic datasets

2.2.3 Ground truth using interpretable models

3 Motivating example

4 Evaluation methodology

4.1 Model-intrinsic additive scores

Definition 1

4.2 Linear regression

4.3 Logistic regression

4.4 Naive Bayes

4.5 Similarity measure

5 Empirical investigation

5.1 Experimental setup

5.1.1 Data and model

5.1.2 Generating explanations

5.2 Experiments

5.2.1 All datasets

5.2.2 The data effect

5.2.2.1 Numerical Features

5.2.2.2 Categorical features

5.2.2.3 Correlated features

5.2.3 Model generalization

5.2.4 Explanation sample size

5.2.5 Variance in explanation accuracy

5.2.6 Choice of similarity measure

5.2.7 Pre-processing effect

5.2.8 Explanation robustness

6 Discussion

7 Concluding remarks

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 Logistic regression example

1.2 Naïve Bayes example

1.3 The data effect

1.4 Statistical significance

1.5 Datasets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation