Robustness Versus Consistency in Ill-Posed Classification and Regression Problems

Hable, Robert; Christmann, Andreas

doi:10.1007/978-3-642-28894-4_4

Robert Hable⁴ &
Andreas Christmann⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3453 Accesses
2 Citations

Abstract

It is well-known from parametric statistics that there can be a goal conflict between efficiency and robustness. However, in so-called ill-posed problems, there is even a goal conflict between consistency and robustness. This particularly applies to certain nonparametric statistical problems such as nonparametric classification and regression problems which are often ill-posed. As an example in statistical machine learning, support vector machines are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
If nothing else is stated, we always use the Borel-σ-algebras.
2.
Note that it is not appropriate to consider ${\mathcal{R}}_{L,Q}$ instead of ${\mathcal{R}}_{L,P}$ in (3) because we have to evaluate the risk with respect to the true distribution P and not with respect to the erroneous Q.

References

Cuevas, A. (1988). Qualitative robustness in abstract inference. Journal of Statistical Planning and Inference, 18, 277–289.
Article MathSciNet MATH Google Scholar
Dey, A. K., & Ruymgaart, F. H. (1999). Direct density estimation as an ill-posed inverse estimation problem. Statistica Neerlandica, 53(3), 309–326.
Article MathSciNet MATH Google Scholar
Hable, R., & Christmann, A. (2011). On qualitative robustness of support vector machines. Journal of Multivariate Analysis, 102, 993–1007.
Article MathSciNet MATH Google Scholar
Hampel, F. R. (1968). Contributions to the theory of robust estimation. Ph.D. thesis, University of California, Berkeley.
Google Scholar
Hampel, F. R. (1971). A general qualitative definition of robustness. Annals of Mathematical Statistics, 42, 1887–1896.
Article MathSciNet MATH Google Scholar
Mukherjee, S., Niyogi, P., Poggio, T., & Rifkin, R. (2006). Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25, 161–193.
Article MathSciNet MATH Google Scholar
Poggio, T., Rifkin, R., Mukherjee, S., & Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature, 428, 419–422.
Article Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT.
Google Scholar
Steinwart, I. (2002). Support vector machines are universally consistent. Journal of Complexity, 18, 768–791.
Article MathSciNet MATH Google Scholar
Steinwart, I., & Christmann, A. (2008). Support vector machines. New York: Springer.
MATH Google Scholar
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Bayreuth, D-95440, Bayreuth, Germany
Robert Hable & Andreas Christmann

Authors

Robert Hable
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Christmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Hable .

Editor information

Editors and Affiliations

Department of Statistics, Università degli Studi di Firenze, Viale G.B. Morgagni 59, Firenze, 50134, Italy
Antonio Giusti
Fakultät für Informatik, und Mathematik, Universität Passau, Innstr. 33, Passau, 94030, Germany
Gunter Ritter
Sapienza", Department of Statistics, University of Rome "La, Piazzale Aldo Moro 5, Rome, 00185, Italy
Maurizio Vichi

Appendix

Proof (Theorem 2).

In order to prove Theorem 2, we assume that T _n, $n \in \mathbb{N}$, is a risk-consistent learning algorithm and we show that T _n, $n \in \mathbb{N}$, is not qualitatively risk-robust. According to the assumptions on $\varphi $, for every $m \in \mathbb{N}$, there is a ${c}_{m} \in [0,\infty )$ such that $\varphi ({c}_{m}) \geq m$. For every $t \in \mathbb{R}$, let δ_t denote the Dirac-measure at t; let U _[0, 1] denote the uniform distribution on $\mathcal{X} = [0,1]$. Define

$$P\left (d(x,y)\right )\,\, :=\,\, {\delta }_{0}(dy){U}_{[0,1]}(dx)\quad \;\text{ and}\;\quad {Q}_{m}\left (d(x,y)\right )\,\, :=\,\, {\delta }_{{g}_{m}(x)}(dy){U}_{[0,1]}(dx)$$

for $\,{g}_{m} :\, [0,1] \rightarrow \mathbb{R},\;\;x\mapsto \left (-m{c}_{m}x + 4{c}_{m}\right ){I}_{[0,4/m]}(x)$ for every $m \in \mathbb{N}$.Note that, for every Borel-measurable set $B \subset \mathcal{X}\!\!\times \mathcal{Y}$,

$$\begin{array}{rcl} & & {Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right ) \\ & & \quad = P\left (\left \{(x,y) \in \mathcal{X}\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right ).\end{array}$$

(4)

Obviously, ${\inf }_{f\in {\mathcal{L}}_{0}(\mathcal{X})}{\mathcal{R}}_{L,P}(f) = 0$ and ${\inf }_{f\in {\mathcal{L}}_{0}(\mathcal{X})}{\mathcal{R}}_{L,{Q}_{m}}(f) = 0$ for every $m \in \mathbb{N}$. Then, risk-consistency implies: for every $m \in \mathbb{N}$, there is an ${n}_{m} \in \mathbb{N}$ such that, for every n ≥ n _m,

$$\begin{array}{rcl}{ Q}_{m}^{n}\left (\left \{{D}_{ n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\,\left \vert \;\,{\mathcal{R}}_{ L,{Q}_{m}}({f}_{{D}_{n}})\, <\, \frac{1} {3}\right.\right \}\right )\,& \geq & \,\frac{2} {3}\end{array}$$

(5)

$$\begin{array}{rcl}{ P}^{n}\left (\left \{{D}_{ n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\,\left \vert \;\,{\mathcal{R}}_{ L,P}({f}_{{D}_{n}})\, <\, \frac{1} {3}\right.\right \}\right )\,& \geq & \,\frac{2} {3}.\end{array}$$

(6)

For every $m,n \in \mathbb{N}$, define ${B}_{m}^{(n)} := \left \{{D}_{n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\,\left \vert \,\,{\mathcal{R}}_{L,{Q}_{m}}({f}_{{D}_{n}})\, <\, \frac{1} {3}\right.\right \}$ and

$${A}_{m}({D}_{n}) := \left \{x \in \mathcal{X}\,\left \vert x \leq \frac{2} {m},{f}_{{D}_{n}}(x) \leq {c}_{m}\right.\right \}\qquad \forall \,{D}_{n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}.$$

Note that the definitions imply

$$\begin{array}{rcl}{ g}_{m}(x) - {f}_{{D}_{n}}(x)\; \geq \; 2{c}_{m} - {c}_{m}\; =\; {c}_{m}\; \geq \; 0\qquad \forall \,x \in {A}_{m}({D}_{n}).& &\end{array}$$

(7)

Hence, for every $m \in \mathbb{N}$, n ≥ n _m, and ${D}_{n} \in {B}_{m}^{(n)}$,

$$\begin{array}{rcl} \frac{1} {3}& >& {\mathcal{R}}_{L,{Q}_{m}}({f}_{{D}_{n}})\; \geq \;{\int }_{{A}_{m}({D}_{n})}{ \int }_{\mathbb{R}}\varphi \left (\left \vert y - {f}_{{D}_{n}}(x)\right \vert \right )\,{\delta }_{{g}_{m}(x)}(dy)\,{U}_{[0,1]}(dx) = \\ & =& {\int }_{{A}_{m}({D}_{n})}\varphi \left (\left \vert {g}_{m}(x) - {f}_{{D}_{n}}(x)\right \vert \right )\,{U}_{[0,1]}(dx)\;{ (7) \atop \geq } \;\varphi ({c}_{m}) \cdot {U}_{[0,1]}\left ({A}_{m}({D}_{n})\right ) \geq \\ &\geq & m \cdot {U}_{[0,1]}\left ({A}_{m}({D}_{n})\right ) \end{array}$$

(8)

Next, it follows for every $m \in \mathbb{N}$, n ≥ n _m, and ${D}_{n} \in {B}_{m}^{(n)}$ that

$$\begin{array}{rcl} & & {U}_{[0,1]}\left (\left \{x \in \mathcal{X}\,\left \vert \,{f}_{{D}_{n}}(x) > {c}_{m}\right.\right \}\right ) \geq {U}_{[0,1]}\left (\left \{x \in \mathcal{X}\,\left \vert x \leq \frac{2} {m},{f}_{{D}_{n}}(x) > {c}_{m}\right.\right \}\right ) \\ & & \quad =\; {U}_{[0,1]}\left (\left \{x \in \mathcal{X}\,\left \vert \,\,x \leq \frac{2} {m}\right.\right \}\right )\, -\, {U}_{[0,1]}\left ({A}_{m}({D}_{n})\right )\;{ (8) \atop \geq } \;\frac{2} {m} - \frac{1} {3m}\; >\; \frac{1} {m} \end{array}$$

(9)

and, therefore,

$$\begin{array}{rcl}{ \mathcal{R}}_{L,P}\left ({f}_{{D}_{n}}\right )& \geq & {\int }_{\{x\in \mathcal{X}\vert \,{f}_{{D}_{ n}}(x)>{c}_{m}\}}{ \int }_{\mathbb{R}}\varphi \left (\left \vert y - {f}_{{D}_{n}}(x)\right \vert \right )\,{\delta }_{0}(dy)\,{U}_{[0,1]}(dx) = \\ & =& {\int}_{\{x\in \mathcal{X}\vert \,{f}_{{D}_{ n}}(x)>{c}_{m}\}}\varphi \left (\left \vert {f}_{{D}_{n}}(x)\right \vert \right )\,{U}_{[0,1]}(dx) \geq \\ &\geq & m \cdot {U}_{[0,1]}\left (\{x \in \mathcal{X}\vert \,{f}_{{D}_{n}}(x) > {c}_{m}\}\right )\;\stackrel{(9)}{\geq }1. \end{array}$$

(10)

Define $C := [1,\infty )$. Then, for every $m \in \mathbb{N}$ and n ≥ n _m,

$$\begin{array}{rcl} & & \left [{\mathcal{R}}_{L,P} \circ {T}_{n}({Q}_{m}^{n})\right ](C) = {Q}_{ m}^{n}\left (\left \{{D}_{ n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\left \vert \;{\mathcal{R}}_{ L,P}({f}_{{D}_{n}}) \geq 1\right.\right \}\right )\; \geq \\ & &\quad \;{ (10) \atop \geq } \;{Q}_{m}^{n}\left ({B}_{ m}^{(n)}\right )\,{ (5) \atop \geq } \;\frac{2} {3}\; =\; \frac{1} {3} + \frac{1} {3} \\ & & \quad { (6) \atop \geq } \,\,{P}^{n}\left (\!\left \{{D}_{ n} \in {(\mathcal{X}\!\times \mathcal{Y})}^{n}\left \vert \;\,{\mathcal{R}}_{ L,P}({f}_{{D}_{n}})\, \geq \, \frac{1} {3}\right.\right \}\!\right ) + \frac{1} {3} \\ & & \quad \geq \;\left [{\mathcal{R}}_{L,P} \circ {T}_{n}({P}^{n})\right ]({C}^{\frac{1} {3} }) + \frac{1} {3} \\ \end{array}$$

where ${C}^{\frac{1} {3} } =\{ z \in \mathbb{R}{\vert \,\inf }_{{z}^{{\prime}}\in \mathbb{R}}\vert z - {z}^{{\prime}}\vert < \frac{1} {3}\}$ as in the definition of d _{{ Pro}}. This implies

$$\begin{array}{rcl}{ d}_{\text{ Pro}}\left ({\mathcal{R}}_{L,P} \circ {T}_{n}({Q}_{m}^{n}),{\mathcal{R}}_{ L,P} \circ {T}_{n}({P}^{n})\right )\; \geq \; \frac{1} {3}\qquad \forall \,n \geq {n}_{m}\;\;\forall \,m \in \mathbb{N}.& &\end{array}$$

(11)

However, for every $m \in \mathbb{N}$ and every measurable $B \subset \mathcal{X}\!\!\times \mathcal{Y}$, we have

$$\begin{array}{rcl}{ Q}_{m}(B)& =& {Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x \leq \frac{4} {m}\right.\right \} \cap B\right ) \\ & & +\,{Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right ) \\ & \leq & \frac{4} {m} + {Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right )\; = \\ & =& \frac{4} {m} + P\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right )\; \leq \; \frac{4} {m} + P\left ({B}^{ \frac{4} {m} }\right ) \\ \end{array}$$

and, therefore,

$$\begin{array}{rcl}{ d}_{\text{ Pro}}\left ({Q}_{m},P\right )\; \leq \; \frac{4} {m}\qquad \forall \,m \in \mathbb{N}.& &\end{array}$$

(12)

Inequalities (11) and (12) imply that T _n, $n \in \mathbb{N}$, is not qualitatively risk-robust.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hable, R., Christmann, A. (2013). Robustness Versus Consistency in Ill-Posed Classification and Regression Problems. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-28894-4_4
Published: 06 September 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28893-7
Online ISBN: 978-3-642-28894-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Robustness Versus Consistency in Ill-Posed Classification and Regression Problems

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Proof (Theorem 2).

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation