Abstract
It is well-known from parametric statistics that there can be a goal conflict between efficiency and robustness. However, in so-called ill-posed problems, there is even a goal conflict between consistency and robustness. This particularly applies to certain nonparametric statistical problems such as nonparametric classification and regression problems which are often ill-posed. As an example in statistical machine learning, support vector machines are considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
If nothing else is stated, we always use the Borel-σ-algebras.
- 2.
Note that it is not appropriate to consider \({\mathcal{R}}_{L,Q}\) instead of \({\mathcal{R}}_{L,P}\) in (3) because we have to evaluate the risk with respect to the true distribution P and not with respect to the erroneous Q.
References
Cuevas, A. (1988). Qualitative robustness in abstract inference. Journal of Statistical Planning and Inference, 18, 277–289.
Dey, A. K., & Ruymgaart, F. H. (1999). Direct density estimation as an ill-posed inverse estimation problem. Statistica Neerlandica, 53(3), 309–326.
Hable, R., & Christmann, A. (2011). On qualitative robustness of support vector machines. Journal of Multivariate Analysis, 102, 993–1007.
Hampel, F. R. (1968). Contributions to the theory of robust estimation. Ph.D. thesis, University of California, Berkeley.
Hampel, F. R. (1971). A general qualitative definition of robustness. Annals of Mathematical Statistics, 42, 1887–1896.
Mukherjee, S., Niyogi, P., Poggio, T., & Rifkin, R. (2006). Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25, 161–193.
Poggio, T., Rifkin, R., Mukherjee, S., & Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature, 428, 419–422.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT.
Steinwart, I. (2002). Support vector machines are universally consistent. Journal of Complexity, 18, 768–791.
Steinwart, I., & Christmann, A. (2008). Support vector machines. New York: Springer.
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Proof (Theorem 2).Â
In order to prove Theorem 2, we assume that T n , \(n \in \mathbb{N}\), is a risk-consistent learning algorithm and we show that T n , \(n \in \mathbb{N}\), is not qualitatively risk-robust. According to the assumptions on \(\varphi \), for every \(m \in \mathbb{N}\), there is a \({c}_{m} \in [0,\infty )\) such that \(\varphi ({c}_{m}) \geq m\). For every \(t \in \mathbb{R}\), let δ t denote the Dirac-measure at t; let U [0, 1] denote the uniform distribution on \(\mathcal{X} = [0,1]\). Define
for \(\,{g}_{m} :\, [0,1] \rightarrow \mathbb{R},\;\;x\mapsto \left (-m{c}_{m}x + 4{c}_{m}\right ){I}_{[0,4/m]}(x)\) for every \(m \in \mathbb{N}\).Note that, for every Borel-measurable set \(B \subset \mathcal{X}\!\!\times \mathcal{Y}\),
Obviously, \({\inf }_{f\in {\mathcal{L}}_{0}(\mathcal{X})}{\mathcal{R}}_{L,P}(f) = 0\) and \({\inf }_{f\in {\mathcal{L}}_{0}(\mathcal{X})}{\mathcal{R}}_{L,{Q}_{m}}(f) = 0\) for every \(m \in \mathbb{N}\). Then, risk-consistency implies: for every \(m \in \mathbb{N}\), there is an \({n}_{m} \in \mathbb{N}\) such that, for every n ≥ n m ,
For every \(m,n \in \mathbb{N}\), define \({B}_{m}^{(n)} := \left \{{D}_{n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\,\left \vert \,\,{\mathcal{R}}_{L,{Q}_{m}}({f}_{{D}_{n}})\, <\, \frac{1} {3}\right.\right \}\) and
Note that the definitions imply
Hence, for every \(m \in \mathbb{N}\), n ≥ n m , and \({D}_{n} \in {B}_{m}^{(n)}\),
Next, it follows for every \(m \in \mathbb{N}\), n ≥ n m , and \({D}_{n} \in {B}_{m}^{(n)}\) that
and, therefore,
Define \(C := [1,\infty )\). Then, for every \(m \in \mathbb{N}\) and n ≥ n m ,
where \({C}^{\frac{1} {3} } =\{ z \in \mathbb{R}{\vert \,\inf }_{{z}^{{\prime}}\in \mathbb{R}}\vert z - {z}^{{\prime}}\vert < \frac{1} {3}\}\) as in the definition of d { Pro}. This implies
However, for every \(m \in \mathbb{N}\) and every measurable \(B \subset \mathcal{X}\!\!\times \mathcal{Y}\), we have
and, therefore,
Inequalities (11) and (12) imply that T n , \(n \in \mathbb{N}\), is not qualitatively risk-robust.
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hable, R., Christmann, A. (2013). Robustness Versus Consistency in Ill-Posed Classification and Regression Problems. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-28894-4_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28893-7
Online ISBN: 978-3-642-28894-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)