Keywords

1 Introduction

For machine learning (ML) models, data and results, there is a demand for transparency, ease of understanding and explanations [24] to satisfy a citizen’s “right to explanation” in the European Union [20] and to meet health care requirements for justification and explanation [7, 22].

Without quantitative measures of transparency and understandability, doctors (or users) will select models which maximize accuracy but may unnecessarily or unintentionally neglect or sacrifice transparency and understandability, or they will choose models in an ad hoc manner to try and meet all criteria. We refer to the transparency and understandability of models as inherent model interpretability—defined further in Sect. 3.

We propose criteria and measures of inherent model interpretability to help a doctor select ML models (Table 1 steps 1 and 2) which are more transparent and understandable, in a quantitative and objective manner. More transparent models can offer additional views of results (Table 1 step 3) for interpretation. Our measures facilitate the inclusion of better models as candidates and the selection of better models for use.

Table 1. Measures of inherent model interpretability facilitate model selection (bold text) in steps 1 and 2.

Some of our proposed measures are specific to support vector machines (SVM), as one popular ML method. We perform experiments to validate the SVM measures against a set of propositions and evaluate their utility by concordance or matched pair agreement.

Notably, the proposed measures do not provide an interpretation or explanation. They also do not indicate how useful or meaningful a model is in the context of data. For example, a model that always classifies patient data as belonging to the positive class is very understandable (interpretable). We can easily construct the explanation of the model and result—all patients are classified as positive—but that does not mean that the model is useful, meaningful, appropriate, or unbiased. Accuracy and common sense address the latter issues. The proposed measures only indicate how understandable a model is, i.e., how likely we are able to provide an interpretation, as the necessary basis for subsequent explanation.

Making ML more interpretable facilitates its use in health care because there is a perception that ML is a black box [31] lacking interpretability which inhibits its use. Greater use is important because for a good number of health care problems and data, ML methods offer better accuracy in classification [12, 15, 41] than common alternatives among statistical methods, decision trees and rule-based methods and instance-based methods. Interpretable ML also facilitates research on models and model fit.

2 Notation

A machine learning task begins with data in a matrix X consisting of N instances \(\underline{x}_{i}\) which are vectors, each containing \(n_{\text {}}\) features.

$$\begin{aligned} X = \left[ \underline{x}_{1}, \, \underline{x}_{2}, \, \ldots , \underline{x}_{N}\right] ^{\text {T}}\qquad \quad \underline{x}_{i}\in \mathbb {R}^{n} \end{aligned}$$
(1)

Entry \(x_{i,j}\) in the matrix is the \(j^{th}\) feature of instance \(\underline{x}_{i}\). We assume real-valued features converting any atomic data type to reals as needed (Appendix A).

A supervised learning task also has N targets (or outcomes) in a vector \(\underline{y}\) which are binary in classification,

$$\begin{aligned} \underline{y} =\left[ y_{1},\, y_{2}, \, \ldots , y_{N}\right] ^{\text {T}}\qquad \quad y_{i}\in \left\{ -1,\,+1\right\} \end{aligned}$$
(2)

or continuous in regression:

$$\begin{aligned} \underline{y} =\left[ y_{1}, \, y_{2}, \, \ldots , y_{N}\right] ^{\text {T}}\qquad \quad y_{i}\in \mathbb {R} \end{aligned}$$
(3)

In binary classification there are \(N^{+}\) instances in the positive class and \(N^{-}\) instances in the negative class.

Fig. 1.
figure 1

A model consists of a learning method, SVM in this case, and all of its associated parts as depicted above. Most machine learning and statistical models (or classifiers) have an underlying continuous output that most accurately describes the model’s behaviour.

We refer to a posterior model (e.g., Fig. 1), or simply model, as a learning method (e.g., SVM, neural networks) with all of its associated learning/estimation functions (e.g., kernels and transfer functions), hyperparameters, structure (e.g., layers, connections, components in a composite kernel), constraints and learned model weights, in the context of specific data. A model only learns from, and has meaning in, the context of specific data.

We refer to an initial model as a model in the context of specific data with initial model weights prior to learning/iteration.

We refer to a family of models, or a prior model, as the set of models possible when hyperparameters are variables (not specified)—e.g., SVM with a Gaussian RBF kernel with unspecified box constraint and kernel width.

The prior, initial and posterior models are available at different points in the process of machine learning and/or statistical learning process (Fig. 2).

Fig. 2.
figure 2

We measure inherent model interpretability at several points (dashed arrows) in the process of machine learning and/or statistical learning (partially derived from [25]). Note: some steps may not apply to some methods and models.

Other notation is introduced in the context of discussion.

3 Inherent Model Interpretability Concept and Measures

We propose the concept of inherent model interpretability as distinguished from an individual’s understanding and we propose two measures for any learning method or model with numeric inputs.

Feynman said that if we understand a concept we must be able to describe it at a freshman level, which often requires simplification or reduction, otherwise we don’t really understand it [21]. Badii et al. express that complexity is closely related to understanding and that understanding comes from accurate models which use condensed information or reduction schemes [4]. Miller indicates that selection is a key attribute of explanations [38]. Hence, we posit that the simpler a model is, the easier it is to understand, interpret and describe, with all other aspects of the model being equal. This leads to the following general measure.

3.1 A General Measure of Inherent Model Interpretability

As stated above, the simpler a model is, the more interpretable it is, inherently. Formally, we propose the following definition.

Definition 1

Inherent model interpretability (or understandability) U, is a measure with range \(\left[ 0,1\right] \) based on either: a measure of model transparency T in the same range, the inverse of semi-infinite model complexity \(H_{\infty }\), or the inverse of finite model complexity \(H_{b}\), respectively as follows:

$$\begin{aligned} U=&{\left\{ \begin{array}{ll} T &{} {(i)}\;\;T\;\;\;\,\in \left[ 0,\,1\right] \\ \frac{1}{1+\left( H_{\infty }-a\right) } &{} {(ii) }\,H_{\infty }\in \left[ a,\,\infty \right) \quad a\in \mathbb {R}^{+};\;a<\infty \\ 1-\left( \frac{H_{b}-a}{b-a}\right) &{} {(iii) }H_{b}\in \left[ a,\,b\right] \quad \;\,a,\,b\in \mathbb {R}^{+};\;a,\,b<\infty \end{array}\right. } \end{aligned}$$
(4)

where:

  • \(H_{\infty }\) and \(H_{b}\) are measures of model complexity based on parts [4] in the categories of information, entropy, code length or dimension [33],

  • inherent indicates that the measure is independent of an individual, e.g., their specific learning and forgetting curves [44], and

  • the multiplicative inverse [29] in (4)ii or additive inverse [57] in (4)iii are applied as needed for absolute or relative measure respectively according to the comparison required. The relative measure is preferred where applicable since it is more intuitive and interpretable (not shown).

    • e.g., to compare a set of models where the range \(\left[ a,\,b\right] \) is known to encompass them all, a relative measure (iii) is fine, however, to compare them to any future model where the maximum b is not known, use an absolute measure (ii), i.e., let \(b=\infty \).

The separation of model interpretability into at least two parts, one part that is inherent to the model (and data) and another part that depends on the individual, aligns with the functionally-grounded approach [17].

In order to use this general measure, one must further define T, \(H_{\infty }\) or \(H_{b}\), as we do in subsequent sections. We note also that measurement may be performed prior to, initially at, or posterior to, optimizing the model weights (Fig. 2).

3.2 A New Measure: Simplicity of Output Sensitivity

We consider the continuous underlying output of a classifier (e.g., Fig. 1) to be the most accurate representation of a classifier’s behaviour. It is available most learning classifiers, in machine learning or statistical learning, such as, neural networks, SVM, logistic regression and naive bayes. It is also facilitated by most implementations, e.g., for SVM it is available in Matlab, R, Python, SPSS, Weka, libsvm and Orange, where the output may be the probability of the positive class or a non-probabilistic value, e.g., “classification score”.

Some measure or analyze a classifier’s behaviour based on its binary output instead [46]—this approach lacks fine-grained behavioural information. Others measure classifier behaviour by modeling its responses with a separate explanation model that provides a continuous output [5, 46]—this post hoc approach may not meet untested legal, assurance or business requirements.

We use the underlying continuous output, and the logic similar to the previous measure to posit that:

If a model is uniformly sensitive in its output to changing values in input features and instances, then its sensitivity is simple to describe, understand and interpret (as one value). Conversely, a model that is differently sensitive to each feature and instance is more difficult to describe, understand and interpret, in those terms or from that perspective. Formally, we propose the following definition:

Definition 2

The simplicity of output sensitivity \(U_{Hs}\) is a measure of inherent model interpretability. It describes the simplicity of the sensitivity of the model’s continuous output (e.g., Fig. 1) to changes in input. It is specified as the inverse of Shannon entropy \(H_{\text {s}}\) with a finite range (4)iii, repeated below:

$$\begin{aligned} U_{H\text {s}}&=1-\left( \frac{H_{\text {s}}}{H_{\text {max}}}\right) \qquad H_{\text {s}}\in \left[ 0,\,H_{\text {max}}\right] \end{aligned}$$
(5)
$$\begin{aligned} H_{\text {s}}&=-\underset{i}{\sum }f_{i}\left( s\right) \log f_{i}\left( s\right) ,\quad i=1\ldots N_{s}\end{aligned}$$
(6)
$$\begin{aligned} H_{\text {max}}&=-\underset{i=1}{{\mathop {\sum }\limits ^{\left| s\right| }}}\frac{1}{\left| s\right| }\log \frac{1}{\left| s\right| } \end{aligned}$$
(7)

where s is the set of sensitivities \(S_{j,q}\) of the model’s continuous output \(\hat{y}_{\text {c}}\) (the value which is underlying for a classifier) to small changes \(\varepsilon =\left( 0.1\right) \cdot 3\sigma \) in each input instance j, one feature q at a time,

$$\begin{aligned} s&=\left\{ S_{j,q}\right\} \end{aligned}$$
(8)
$$\begin{aligned} S_{j,q}&=\frac{\hat{y}_{\text {c}}\left( \underline{x}_{j}+\underline{\varepsilon }_{q}\right) -\hat{y}_{\text {c}}\left( \underline{x}_{j}-\underline{\varepsilon }_{q}\right) }{2\varepsilon }\\ \underline{\varepsilon }_{q}&=\left[ \ldots 0 \,\, \varepsilon \,\, 0 \,\, \ldots \right] ^{\text {T}}\qquad \varepsilon ~{in~q}^{th}~{cell}\nonumber \end{aligned}$$
(9)

and where \(N_{s}\) is the number of bins according to standard binning methods for histograms [18, 47, 53].

We use entropy to measure the global complexity of sensitivities across the space for input data. In the literature, entropy has been applied quite differently to measure the information loss of perturbed features, to indicate their influence—we use entropy instead to measure the complexity of influence with perturbed features.

Our measure uses a first-order central difference (first derivative approximation) as a standard and easy to understand approach to sensitivity that does not require knowing or differentiating the model’s formulas. We can generalize this idea to second and third-order differences/derivatives, and so on, like the derivatives in deep Taylor decomposition [39]—but the latter requires a model’s formulas and derivatives. Whereas [39] examines the local behaviours of a model, we do that and compute the complexity of the values.

We treat the entries \(S_{j,q}\) as a set or random variable s (8) because we are measuring model interpretability overall, across features and instances, not within a feature nor within an instance.

We note that instead of Shannon entropy, it may be possible to apply other types of entropy, such as Renyi entropy, Tsallis entropy, effective entropy or total information [19, 45, 56] and/or Kullback-Leibler (K-L) divergence [14], however such a change would require validation. Prior to this study we experimented with discrete Kullback-Leibler (K-L) divergence as implemented by four measures in the ITK toolkit [54, 55], as an alternative to Shannon entropy, however, our experimental results with K-L divergence did not sufficiently match our expectations, so we focused on Shannon entropy as a more popular and credible measure.

We also implemented differential entropy [14], which is the continuous version of entropy and is defined as the K-L divergence from a uniform probability density function (pdf) to the pdf of interest, but put that aside based on the previously mentioned K-L divergence results and also because it was more compute intensive as it required a kernel density estimate.

Finally we note that the sensitivity portion of our measure (i.e., entropy aspect aside) differs from how other authors compute sensitivity globally across both instances and features [27].

Table 2. We identify criteria for model interpretability in the literature and translate these into proposed criteria which are objective rather than subjective.

4 Criteria for Model Transparency and a Measure for SVM

We identify criteria for model transparency from the literature (Table 2) for any model, and propose new criteria in most cases, which are objective, not subjective, and thus suitable for a (quantitative) measure of model transparency.

We apply the proposed criteria (Table 2) for any model, to create a measure specific to kernel methods or support vector machines (SVM).

We use the seven proposed criteria for inherent prior model interpretability (Sect. 4) to define 6 Dirac (binary) measures for SVM (Table 3) meeting each criterion without overlap, except for criterion d (since all SVM kernels are generalized linear models).

We define an overall measure as follows:

$$ \check{U}_{\partial }=\nicefrac {1}{6}\left( \partial _{\text {essep}}+\partial _{\text {fin}}+\partial _{\text {eM}}+\partial _{\times }+\partial _{\text {uni}}+\partial _{\text {adm}}\right) $$

A benefit of this measure is that while independent of the data, it requires little computation and it informs model selection prior to optimization.

Table 3. For kernel methods, e.g., SVM, we propose the following Dirac (binary) measures \(\partial \) of model transparency T. Let \(\mathscr {X}_{\text {T}}\) be the space of transparent features derived from simple transforms of the original features \(\mathscr {X}\) which are not highly engineered: i.e., given data \(\mathscr {X}\,{=}\,\left\{ x\right\} \), let \(\mathscr {X}_{\text {T}}\,{=}\,\left\{ x, -x, \frac{1}{x}, \log \left( x\right) , \tanh \left( x\right) , \min \left( c_{\text {top}},x\right) , \max \left( c_{\text {bottom}},x\right) \right\} \).

5 Creating More Measures Specific to SVM

In this section we propose measures specific to SVM.

Support Vectors: In SVM, a subset of the patients in the data set are key to defining the model. They are known as support vectors since they support the definition of the model’s class boundary and decision surface. For example, the decision regarding whether a patient has a disease or not, is determined by a subset of patients, e.g., 5 out of 200 patients, the model learned/picked as positive and negative examples of disease.

The more support vectors there are, the more complex the model is, with all other things being equal: \(H_{sv}=sv\). SVM models have at least three support vectors in general—at least two to define the line, curve, hyperplane or surface that is the class boundary, and at least one to define the margin, so \(sv\ge 3,\;sv\,{\in }\,\mathbb {N}\).

To select a model for one data set, or to compare results between two data sets, we know the maximum number of patients N, so \(sv\le N\), and we apply (4)iii to obtain a relative measure, \(U_{sv,r}\). Or to obtain an absolute measure \(U_{sv,a}\), to compare against any current or future data set, we assume \(N=\infty \) and apply (4)ii.

Degrees of Freedom: Akaike includes all method and kernel hyperparameters and weights as among the degrees of freedom [50]. We calculate the prior complexity measure \(\check{H}_{dof}\) with three terms comprised of: the number of SVM hyperparameters, e.g., 1 for C, the number of kernel hyperparameters, e.g., 1 for the kernel width for a Gaussian RBF kernel, the number of independent inputs, e.g., 1 for a Gaussian RBF kernel or stationary kernel, 2 otherwise. We calculate the posterior complexity measure \(H_{dof}\) with an additional term for the support vectors and apply the general measure for model interpretability.

$$\begin{aligned} \check{H}_{dof}&=\check{dof}=d_\mathrm{{SVM\_hyp}}+d_\mathrm{{kernel\_hyp}}+d_\mathrm{{input}}\\ H_{dof}&=dof=d_\mathrm{{SVM\_hyp}}+d_\mathrm{{kernel\_hyp}}+d_\mathrm{{input}}+sv \end{aligned}$$

Relevant Dimensionality Estimate: The relevant dimensionality estimate (rde) [9] provides a way to measure the complexity of the SVM feature space induced by a kernel. There are two complexity measures \(H_{rde\text {T}}\) and \(H_{rde\text {L}}\) corresponding to two rde methods: the two-component model and the leave-one-out method, respectively.

6 Validation of Measures

We validate our proposed measures with sanity checks on formulas (not shown) and by agreement with propositions that describe our expectations and knowledge about model complexity and interpretability.

We create propositions based on expected relationships between measures, and check/test the propositions with a statement P and its inverse \(\mathbf {P}^{-1}\) such as the following,

$$\begin{aligned} \mathbf {P}{:}\;\; \check{dof} _{1}\le \check{dof} _{2}\;\;&\overset{\text {usually}}{\rightarrow }U_{rde1}^{*}\ge U_{rde2}^{*}\end{aligned}$$
(10)
$$\begin{aligned} \mathbf {P}^{-1}{:}\;\; \check{dof} _{1}> \check{dof} _{2}\;\;&\overset{\text {usually}}{\rightarrow }U_{rde1}^{*}<U_{rde2}^{*} \end{aligned}$$
(11)

where \(\overset{\text {usually}}{\rightarrow }\) is a notation that means “implies the majority of the time”. For brevity \(\mathbf {P}^{-1}\) is implied but not shown in statements that follow. We measure how much our results agree with these propositions using either Kendall’s W coefficient of rank correlation [26] or matched pair agreement [48], where the latter is applied to control for confounding factors.

If a proposition is robust, then the percentage of the concordance coefficient or matched pair agreement indicates how correct and useful the measure is, from that perspective. A measure has some utility, if it is correct the majority of the time, for different models/kernels and data sets, with a confidence interval that does not include 50%.

We validate our propositions using two types of experiments (#1 and #2 as below). We run each experiment five times on each of three data sets from the University of California at Irvine repository: the Statlog Heart, Hepatitis and Bupa Liver data sets. Missing data in the Hepatitis data set are imputed with Stata, taking one of three multiple imputations with Monte Carlo Markov Chains. Bupa Liver is used with the common target [36] rather than the clinically meaningful target.

  • Experiment Type #1: For each of 90 points chosen randomly in the hyperparameter space, we choose a pair of models, matched pairs [48], that differ by one hyperparameter/dof that is fixed in one and free in the other, and check propositions as the percentage truth of the propositions. We use 3 pairs of kernels that differ by a single dof, e.g., a polynomial kernel of varying degree versus a linear kernel, a Gaussian RBF kernel with/without a fixed kernel width and a Mercer sigmoid kernel [11] with/without a fixed horizontal shift.

  • Experiment Type #2: From the experiment type #1 we identify three points in the hyperparameter space which perform well for each kernel. For each of 3 fixed points, we choose 30 values of C equally spaced (as logarithms) throughout the range from \(10^{-3}\) to \(10^{6}\) and check propositions as the concordance of the left-hand side with the right-hand side in the propositions, using Kendall’s W coefficient of concordance. If the right-hand side should have opposite rank to the left-hand side then we apply a negative to the measure on the right-hand side for concordance to measure agreement of rank. We use the following kernels: linear, polynomial, Gaussian RBF and Mercer sigmoid kernel [11].

6.1 Propositions

Proposition 1

The majority of the time we expect that a model with less degrees of freedom \( \check{dof} _{1}\), with all other things being equal when compared to another model with \( \check{dof} _{2}\), will be simpler and have a relevant dimensionality estimate (rde) [9] that is less than or equal to the other model and therefore be more interpretable/understandable (\(U_{rde}^{*}\)):

$$\begin{aligned} \mathbf {1a}:\;\; \check{dof} _{1}\le \check{dof} _{2}\;\;&\overset{{usually}}{\rightarrow }rde_{1}\le rde_{2}\end{aligned}$$
(12)
$$\begin{aligned} \mathbf {1b}:\;\; \check{dof} _{1}\le \check{dof} _{2}\;\;&\overset{{usually}}{\rightarrow }U_{rde1}^{*}\ge U_{rde2}^{*} \end{aligned}$$
(13)

This applies to rde with the two-component model (rdeT) and the leave-one-out method (rdeL).

Proposition 2

In SVM, the hyperparameter C is called the box constraint or cost of error. Authors have remarked [49, Remark 7.31] that C is not an intuitive parameter, although it has a lower bound for use \(C\ge \frac{1}{N}\) and its behaviour suggests \(C\doteq \frac{1}{\nu N}\), where \(\nu \) is a proportion of support vectors. We therefore expect that a model with a higher value \(C_{1}\) versus a second model with \(C_{2}\) will have less support vectors (sv) and consequently be more interpretable/understandable (\(U_{Hs}\)):

$$\begin{aligned} \mathbf {2a}:\;\;C_{1}\ge C_{2}\;\;&\overset{{usually}}{\rightarrow }\;\;sv_{1}\le sv_{2}\end{aligned}$$
(14)
$$\begin{aligned} \mathbf {2b}:\;\;sv_{1}\le sv_{2}\;\;&\overset{{usually}}{\rightarrow }\;\;U_{Hs1}\ge U_{Hs2}\end{aligned}$$
(15)
$$\begin{aligned} \mathbf {2c}:\;\;C_{1}>C_{2}\;\;&\overset{{usually}}{\rightarrow }\;\;U_{{sv,a1}}\ge U_{\text {sv,a2}}\end{aligned}$$
(16)
$$\begin{aligned} \mathbf {2d}:\;\;C_{1}>C_{2}\;\;&\overset{{usually}}{\rightarrow }\;\;U_{Hs1}\ge U_{Hs2} \end{aligned}$$
(17)

This applies to simplicity of sensitivity \(U_{Hs}\) with any binning method.

Our experiment uses three binning methods: Scott \(U_{H\text {sc}}\), Freedman-Diaconis \(U_{H\text {fd}}\) and Sturges \(U_{H\text {st}}\).

Proposition 3

The majority of the time we expect that, if a prior measure is useful, then it reflects the same rankings as the posterior measure,

$$\begin{aligned} \mathbf {3}:\;\;U_{Hs1}^{*}\le U_{Hs2}^{*}&\;\;\overset{{usually}}{\rightarrow }\;\,U_{Hs1}\le U_{Hs2} \end{aligned}$$
(18)

Proposition 4

We expect that the linear kernel is the simplest of all kernels with greater transparency than other kernels such as the polynomial, Gaussian RBF kernel, sigmoid and Mercer sigmoid kernels, whereby,

$$\begin{aligned} \mathbf {4}:\text {isLinear}\left( k_{1}\right)>\mathrm {isLinear\left( k_{2}\right) }\;\;\overset{}{\rightarrow }\;\,\check{U}_{\partial 1}>\check{U}_{\partial 2} \end{aligned}$$
(19)

7 Results

We summarize the results of our validation tests (Tables 4 and 5) as follows: we recommend \(\check{U}_{\partial }\) and \(U_{sv}\) as good measures. We find that \(U_{rdeT}^{*}\), \(U_{rdeL}^{*}\) and \(U_{H\text {st}}\) are measures which are of limited use, because they may be wrong one third of the time when providing guidance on decisions. \(U_{H\text {sc}}\) and \(U_{H\text {fd}}\) are not distinguished from chance by our propositions and are therefore not recommended. If \(U_{H\text {st}}\) is validated to a greater degree in the future, then the initial measure \(U_{H\text {st}}^{*}\) has been shown to be a good proxy for it, incurring some loss of information (Table 5).

Table 4. The results from propositions using experiment type #2 validate the support vector measure \(U_{\text {sv}}\) and simplicity of sensitivity measure with Sturges binning \(U_{H\text {st}}\).
Table 5. The results from propositions using experiment #1 validate the relevant dimensionality measures \(rde\text {T}\) and \(rde\text {L}\), the initial model interpretability measures based on relevant dimensionality \(U_{rde\text {T}}^{*}\) and \(U_{rde\text {L}}^{*}\), the use of prior measures of simplicity of sensitivity as proxies for posterior measures, and the measure of kernel transparency \(\check{U}_{\partial }\).
Table 6. Result for \({\check{U}}_{\partial }\) confirm that the linear kernel is more transparent than other kernels.

Our proposed measure of kernel transparency \(\check{U}_{\partial }\), a prior measure, scored 100% agreement. This is a good measure that may be used a priori, but it is high-level and not specific to the match between a model and data. No surprises or complexities arose regarding the attributes of kernels.

The general measure based on the number of support vectors, \(U_{sv}\), scored \(81\pm 2.3\%\) agreement—this is a good measure.

Our proposed simplicity of sensitivity measure with Sturges binning \(U_{H\text {st}}\) scored \(64\pm 3.2\)% and \(62\pm 3.5\)%, which is of limited use—we are interested in agreement that is sufficiently greater than chance (50%), enough to be reliable.

The same measure with Scott binning (\(U_{H\text {sc}}\)), however, is barely distinguishable from chance in one test, and not distinguishable in another, and with Freedman-Diaconis binning (\(U_{H\text {fd}}\)) it is not distinguishable from chance in both tests. We recommend further validation to examine the role of confounding factors such as kernel width/scale along with C per [6, 16].

If the simplicity of sensitivity measure \(U_{H\text {st}}\) can be validated to a greater degree in the future, then the initial measure \(U_{H\text {st}}^{*}\) which scores \(80\pm 3.2\%\) agreement with it, may be used in its place to avoid optimization, or to gain an initial estimate prior to optimization.

The general measure based on the relevant dimensionality of the feature space, \(U_{rdeT}^{*}\) and \(U_{rdeL}^{*}\) scored \(62\pm 5.0\%\) and \(59\pm 5.2\%\) agreement, respectively. These are of some use. We did not include Braun’s noise estimate, which in hindsight should improve the measure.

8 Application

We apply model interpretability to results in a toy problem. When we select results for maximum accuracy with the Gaussian RBF kernel, we find that the top result in our sorted list of results achieves 100% accuracy (rounded to no decimal places) with 51 support vectors, while the second best result also achieves 100% accuracy with 40 support vectors and the fifth best result according to the list also achieves 100% accuracy with 25 support vectors.

Selecting results for maximum interpretability \(U_{sv,r}\), we find the top result uses 9 support vectors for 99% accuracy and the fourth best result uses 10 support vectors for the same accuracy.

We plot the results (Fig. 3) of accuracy versus interpretability \(U_{sv,r}\) (above 80% in each) and find that there are many results which are highly accurate and highly interpretable, i.e., above 96% in both. These results indicate that there is not a trade-off between accuracy and model interpretability based on support vectors in this data set.

We also plot the results of accuracy versus interpretability \(U_{sv,r}\) for other data sets (Figs. 4 and 5) and it is clear that there is no trend in all points showing a trade-off between accuracy and model interpretability, although this trend may be present at the pareto front. A trade-off trend would show as an inverse correlation, a trend line running from the top left to the bottom right—instead, high interpretability is consistently achievable with high accuracy, i.e., there are points toward the top right of a bounding box for all points.

Fig. 3.
figure 3

In classification for the toy problem, there are many results with high accuracy and high model interpretability, with almost no sacrifice in the latter for maximum accuracy.

Fig. 4.
figure 4

In classification with the Hepatitis data set there is a less than 5% sacrifice in interpretability for the highest accuracy.

Fig. 5.
figure 5

In classification with Statlog Heart data there are points with high accuracy and interpretability, with minimal sacrifice, 1% and 2%, respectively.

9 Related Work

Lipton [30] provides a good taxonomy for model interpretability with concepts falling into two broad categories: transparency (the opposite of a black box) and post-hoc interpretability.

Post-hoc interpretability involves an explanatory model separate from the predictive model, or visuals that transform data where the transformation is also a separate explanatory model. Liang [28] cautions against explaining a black box predictive model with another black box explanatory model.

Riberio et al. [46] create an external local linear model to approximate the prediction model in a post-hoc approach called LIME. They jointly optimize accuracy and model complexity but they do not elucidate much about model complexity as in our work. LIME perturbs features in a separate binary representation of features, which sometimes map to non-local features in the original space of data. In their examples they use the binary model output, only referring in passing to the possibility of using a continuous output for classifiers, as we do.

Transparency, on the other hand, focuses on the predictive model itself, and has three aspects: decomposability, simulatability and algorithmic transparency [30].

Decomposability refers to being able to see and understand the parts of the model of the model, e.g., kernels and parameters and the parts of the data, i.e., features and instances—and how they contribute to a result from the predictive model. Some authors refer to the output from decomposition as an interpretation, e.g., initial understanding, separate from an explanation [24, 39] that may require analysis, selection or perhaps synthesis. Miller adds that explanations are selected and social [38].

Since the social and synthesis tasks are more suitable to a person than a computer—it is reasonable for our work to focus on inherent measures of interpretability, rather than explanations.

[34] express that some types of models are more intelligible (i.e., decomposable) than others. We include categories for generalized linear and generalized additive models in our measures as a result of their work.

Simulatability, as another aspect of transparency, refers to a model that a person can mentally simulate or manually compute in reasonable time [30] and is correlated, for example, with the number of features in a linear model, or the depth of the tree in a decision tree. Model complexity is implied Lipton’s examples but the term is not invoked although other authors refer to it [10, 35, 42].

Ockham’s razor, also called the principle of parsimony [50], is a well known principle related to model complexity. Regarding models, it says that among sufficient explanations (e.g., equally accurateFootnote 1 models), the simplestFootnote 2 should be preferred. A quick note on sufficiency: for multiple equally accurate models, none are necessary, because any one of them is sufficient. Model accuracy is sought first, then simplicity. Using our proposed measure one can search for the model with highest interpretability among equally accurate models.

Backhaus et al. propose a quantitative measure of model interpretability [3]—but that is for a different meaning or definition—the ability for a model to interpret data, with relevance in relevance vector machines as the context.

Related to our work, sensitivity analysis of model outputs (SAMO) [2, 23] describe how sensitive a model output is to a change in feature values, one at a time—which is the approach of our proposed general measure.

In variance-based sensitivity analysis, Sobol [51] finds the variance in the output explained by an input feature. Liu et al. [32] performs entropy-based sensitivity analysis, called global response probabilistic sensitivity analysis (GRPSA), to find the influence of input features—where entropy is used to compute the effect as information loss. Lemaire et al. [27] apply sensitivity analysis but their perturbations are non-local and could easily create points outside of any known clusters of instances and true states of nature. Poulin et al. [43] provides effective visualization and analysis tools but for SVM they only apply their method to linear SVM and its binary output.

Automatic model selection methods have been proposed for accuracy [1, 40]—these are based on rules computed from many data sets. The rule-based approach is brittle in comparison to our measures, since it only works with a fixed set of candidate kernels.

10 Conclusions

We developed and validated measures for inherent model interpretability to enable automatic model selection and ongoing research. Two measures are recommended: our proposed kernel transparency measure \(\check{U}_{\partial }\) which is an inexpensive prior measure, and a posterior measure based on support vectors \(U_{sv}\). Three other measures, \(U_{rdeT}^{*}\), \(U_{rdeL}^{*}\) and \(U_{H\text {st}}\) were found to be of limited use but may be further validated by future work.

We also contributed ideas as a foundation for these measures: the concept of inherent model interpretability, a general measure, a simplicity of sensitivity measure, and measurement of interpretability at different points in the learning process, i.e., via prior, initial and posterior models.

We applied our measure to model selection and demonstrated that choosing a model based on a sorted list of accuracy alone can result in models with substantively less inherent model interpretability despite the consistent availability of models with high accuracy and high interpretability in multiple data sets. The notion of a trade-off between accuracy and interpretability does not hold for these data sets.