# Correction to: Efficient feature selection using shrinkage estimators

## Correction to: Machine Learning (2019) 108:1261–1286 https://doi.org/10.1007/s10994-019-05795-1

There was a mistake in the proof of the optimal shrinkage intensity for our estimator presented in Section 3.1. The main theorem still holds, and the shrinkage intensity presented in the corrected version is the optimal in the sense of minimizing the mean squared error (MSE). In this document, apart from correcting the proof for the optimal shrinkage intensity, we provide empirical verification on the correctness via simulations. The third term of Theorem 1 needs to be corrected as follows:

\begin{aligned} \widehat{\mathbb {E}}\left[ (\hat{p}^{\mathrm{Ind}}(xy))^2\right]&= \frac{1}{N^3}\bigg ( (N-1)(N-2)(N-3)\big ( \hat{p}^\mathrm{ML}(x) \hat{p}^\mathrm{ML}(y) \big )^2 \nonumber \\&\qquad \qquad + (N-1) (N-2)\hat{p}^\mathrm{ML}(x) \hat{p}^\mathrm{ML}(y) \big ((\hat{p}^{\mathrm{ML}}(x)+\hat{p}^{\mathrm{ML}}(y)+4\hat{p}^{\mathrm{ML}}(xy))\big ) \nonumber \\&\qquad \qquad +(N-1)\big (2\hat{p}^{\mathrm{ML}}(xy)(\hat{p}^{\mathrm{ML}}(x)+\hat{p}^{\mathrm{ML}}(y))+2(\hat{p}^{\mathrm{ML}}(xy))^2\nonumber \\&\qquad \qquad +\hat{p}^\mathrm{ML}(x) \hat{p}^\mathrm{ML}(y)\big ) + \hat{p}^{\mathrm{ML}}(xy) \bigg ). \end{aligned}
(1)

Parts of supplementary material’s pages 4–6, where the above term is derived, need the following corrections. In page 4 the term A(xy) needs to be corrected as follows:

\begin{aligned} {A(xy)} ={\sum _{\begin{array}{c} x',x'' \in \mathcal {X}\\ x'\ne x'' \ne x \end{array}}\sum _{\begin{array}{c} y', y'' \in \mathcal {Y}\\ y'\ne y'' \end{array}}{\mathbb {E}} \left[ {N_{xy'}N_{xy''} N_{x'y}N_{x''y}}\right] +2\sum _{\begin{array}{c} x' \in \mathcal {X}\\ x'\ne x \end{array}}\sum _{\begin{array}{c} y', y'' \in \mathcal {Y}\\ y'\ne y'' \ne y \end{array}}{\mathbb {E}} \left[ {N_{xy'}N_{xy''} N_{x'y}N_{xy}}\right] }. \end{aligned}

As a consequence in page 5 the same term needs correction:

\begin{aligned} {A(xy)}=&{N^{(4)} \Bigg [\bigg (p(x)^2-\sum _{y' \in \mathcal {Y}}p(xy')^2\bigg )\bigg (p(y)^2-\sum _{x' \in \mathcal {X}}p(x'y)^2\bigg )}\\&{-4 \big (p(x)-p(xy)\big )p(xy)^2\big (p(y)-p(xy)\big )\Bigg ]}. \end{aligned}

Finally, the first equation in page 6 needs the following correction:

\begin{aligned}&{\sum _{x',x'' \in \mathcal {X}}\sum _{y', y'' \in \mathcal {Y}}{\mathbb {E}} \left[ {N_{xy'}N_{xy''} N_{x'y}N_{x''y}}\right] =}{N^{(4)}p(x)^2p(y)^2}\\&\quad {+N^{(3)}p(x)p(y)(p(x)+p(y)+4p(xy))}\\&\quad {+N^{(2)}\big [2p(xy)(p(x)+p(y))+2p(xy)^2+p(x)p(y)\big ]}\\&\quad {+Np(xy),} \end{aligned}

which will result in the estimate for $$\widehat{\mathbb {E}}\left[ (\hat{p}^{\mathrm{Ind}}(xy))^2\right]$$ presented in Eq. (1).

Apart from correcting the proof, we also provide some simulation results that validate the correctness of the optimal shrinkage intensity. To this end we followed the procedure described in the main paper’s Section 3.2, to generate probabilities that lead to different types of effect size, i.e. different population values for the mutual information I(XY). The squared error of our shrinkage estimator for the probabilities is defined as $$\sum _{x \in \mathcal {X}}\sum _{' \in \mathcal {Y}} \left( p(xy) - \hat{p}^{\mathrm{Ind-JS}}(xy) \right) ^{2}.$$ We estimated the MSE by averaging over 1000 simulation runs. In Fig. 1 we present the results for three different effect sizes: I(X;Y) = 0.01, 0.05 and 0.15. In each graph we plot the MSE for all possible values of the shrinkage intensity [0, 1] and we also point out the optimal intensity using the corrected value $$\lambda ^{*}$$ and the value we erroneously used in the previous version of the paper $$\lambda _{e}^{*}$$. As we see, the corrected value leads to the minimum MSE.

## Acknowledgements

We would like to thank Prof. Jan Mielniczuk and Małgorzata Łazȩcka for bringing this issue to our attention and for their detailed and insightful comments.

## Author information

Authors

### Corresponding author

Correspondence to Konstantinos Sechidis.