Skip to main content

Robustness Versus Consistency in Ill-Posed Classification and Regression Problems

  • Conference paper
  • First Online:
Classification and Data Mining

Abstract

It is well-known from parametric statistics that there can be a goal conflict between efficiency and robustness. However, in so-called ill-posed problems, there is even a goal conflict between consistency and robustness. This particularly applies to certain nonparametric statistical problems such as nonparametric classification and regression problems which are often ill-posed. As an example in statistical machine learning, support vector machines are considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If nothing else is stated, we always use the Borel-σ-algebras.

  2. 2.

    Note that it is not appropriate to consider \({\mathcal{R}}_{L,Q}\) instead of \({\mathcal{R}}_{L,P}\) in (3) because we have to evaluate the risk with respect to the true distribution P and not with respect to the erroneous Q.

References

  • Cuevas, A. (1988). Qualitative robustness in abstract inference. Journal of Statistical Planning and Inference, 18, 277–289.

    Article  MathSciNet  MATH  Google Scholar 

  • Dey, A. K., & Ruymgaart, F. H. (1999). Direct density estimation as an ill-posed inverse estimation problem. Statistica Neerlandica, 53(3), 309–326.

    Article  MathSciNet  MATH  Google Scholar 

  • Hable, R., & Christmann, A. (2011). On qualitative robustness of support vector machines. Journal of Multivariate Analysis, 102, 993–1007.

    Article  MathSciNet  MATH  Google Scholar 

  • Hampel, F. R. (1968). Contributions to the theory of robust estimation. Ph.D. thesis, University of California, Berkeley.

    Google Scholar 

  • Hampel, F. R. (1971). A general qualitative definition of robustness. Annals of Mathematical Statistics, 42, 1887–1896.

    Article  MathSciNet  MATH  Google Scholar 

  • Mukherjee, S., Niyogi, P., Poggio, T., & Rifkin, R. (2006). Learning theory: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25, 161–193.

    Article  MathSciNet  MATH  Google Scholar 

  • Poggio, T., Rifkin, R., Mukherjee, S., & Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature, 428, 419–422.

    Article  Google Scholar 

  • Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT.

    Google Scholar 

  • Steinwart, I. (2002). Support vector machines are universally consistent. Journal of Complexity, 18, 768–791.

    Article  MathSciNet  MATH  Google Scholar 

  • Steinwart, I., & Christmann, A. (2008). Support vector machines. New York: Springer.

    MATH  Google Scholar 

  • Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Hable .

Editor information

Editors and Affiliations

Appendix

Appendix

Proof (Theorem 2). 

In order to prove Theorem 2, we assume that T n , \(n \in \mathbb{N}\), is a risk-consistent learning algorithm and we show that T n , \(n \in \mathbb{N}\), is not qualitatively risk-robust. According to the assumptions on \(\varphi \), for every \(m \in \mathbb{N}\), there is a \({c}_{m} \in [0,\infty )\) such that \(\varphi ({c}_{m}) \geq m\). For every \(t \in \mathbb{R}\), let δ t denote the Dirac-measure at t; let U [0, 1] denote the uniform distribution on \(\mathcal{X} = [0,1]\). Define

$$P\left (d(x,y)\right )\,\, :=\,\, {\delta }_{0}(dy){U}_{[0,1]}(dx)\quad \;\text{ and}\;\quad {Q}_{m}\left (d(x,y)\right )\,\, :=\,\, {\delta }_{{g}_{m}(x)}(dy){U}_{[0,1]}(dx)$$

for \(\,{g}_{m} :\, [0,1] \rightarrow \mathbb{R},\;\;x\mapsto \left (-m{c}_{m}x + 4{c}_{m}\right ){I}_{[0,4/m]}(x)\) for every \(m \in \mathbb{N}\).Note that, for every Borel-measurable set \(B \subset \mathcal{X}\!\!\times \mathcal{Y}\),

$$\begin{array}{rcl} & & {Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right ) \\ & & \quad = P\left (\left \{(x,y) \in \mathcal{X}\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right ).\end{array}$$
(4)

Obviously, \({\inf }_{f\in {\mathcal{L}}_{0}(\mathcal{X})}{\mathcal{R}}_{L,P}(f) = 0\) and \({\inf }_{f\in {\mathcal{L}}_{0}(\mathcal{X})}{\mathcal{R}}_{L,{Q}_{m}}(f) = 0\) for every \(m \in \mathbb{N}\). Then, risk-consistency implies: for every \(m \in \mathbb{N}\), there is an \({n}_{m} \in \mathbb{N}\) such that, for every n ≥ n m ,

$$\begin{array}{rcl}{ Q}_{m}^{n}\left (\left \{{D}_{ n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\,\left \vert \;\,{\mathcal{R}}_{ L,{Q}_{m}}({f}_{{D}_{n}})\, <\, \frac{1} {3}\right.\right \}\right )\,& \geq & \,\frac{2} {3}\end{array}$$
(5)
$$\begin{array}{rcl}{ P}^{n}\left (\left \{{D}_{ n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\,\left \vert \;\,{\mathcal{R}}_{ L,P}({f}_{{D}_{n}})\, <\, \frac{1} {3}\right.\right \}\right )\,& \geq & \,\frac{2} {3}.\end{array}$$
(6)

For every \(m,n \in \mathbb{N}\), define \({B}_{m}^{(n)} := \left \{{D}_{n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\,\left \vert \,\,{\mathcal{R}}_{L,{Q}_{m}}({f}_{{D}_{n}})\, <\, \frac{1} {3}\right.\right \}\) and

$${A}_{m}({D}_{n}) := \left \{x \in \mathcal{X}\,\left \vert x \leq \frac{2} {m},{f}_{{D}_{n}}(x) \leq {c}_{m}\right.\right \}\qquad \forall \,{D}_{n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}.$$

Note that the definitions imply

$$\begin{array}{rcl}{ g}_{m}(x) - {f}_{{D}_{n}}(x)\; \geq \; 2{c}_{m} - {c}_{m}\; =\; {c}_{m}\; \geq \; 0\qquad \forall \,x \in {A}_{m}({D}_{n}).& &\end{array}$$
(7)

Hence, for every \(m \in \mathbb{N}\), n ≥ n m , and \({D}_{n} \in {B}_{m}^{(n)}\),

$$\begin{array}{rcl} \frac{1} {3}& >& {\mathcal{R}}_{L,{Q}_{m}}({f}_{{D}_{n}})\; \geq \;{\int }_{{A}_{m}({D}_{n})}{ \int }_{\mathbb{R}}\varphi \left (\left \vert y - {f}_{{D}_{n}}(x)\right \vert \right )\,{\delta }_{{g}_{m}(x)}(dy)\,{U}_{[0,1]}(dx) = \\ & =& {\int }_{{A}_{m}({D}_{n})}\varphi \left (\left \vert {g}_{m}(x) - {f}_{{D}_{n}}(x)\right \vert \right )\,{U}_{[0,1]}(dx)\;{ (7) \atop \geq } \;\varphi ({c}_{m}) \cdot {U}_{[0,1]}\left ({A}_{m}({D}_{n})\right ) \geq \\ &\geq & m \cdot {U}_{[0,1]}\left ({A}_{m}({D}_{n})\right ) \end{array}$$
(8)

Next, it follows for every \(m \in \mathbb{N}\), n ≥ n m , and \({D}_{n} \in {B}_{m}^{(n)}\) that

$$\begin{array}{rcl} & & {U}_{[0,1]}\left (\left \{x \in \mathcal{X}\,\left \vert \,{f}_{{D}_{n}}(x) > {c}_{m}\right.\right \}\right ) \geq {U}_{[0,1]}\left (\left \{x \in \mathcal{X}\,\left \vert x \leq \frac{2} {m},{f}_{{D}_{n}}(x) > {c}_{m}\right.\right \}\right ) \\ & & \quad =\; {U}_{[0,1]}\left (\left \{x \in \mathcal{X}\,\left \vert \,\,x \leq \frac{2} {m}\right.\right \}\right )\, -\, {U}_{[0,1]}\left ({A}_{m}({D}_{n})\right )\;{ (8) \atop \geq } \;\frac{2} {m} - \frac{1} {3m}\; >\; \frac{1} {m} \end{array}$$
(9)

and, therefore,

$$\begin{array}{rcl}{ \mathcal{R}}_{L,P}\left ({f}_{{D}_{n}}\right )& \geq & {\int }_{\{x\in \mathcal{X}\vert \,{f}_{{D}_{ n}}(x)>{c}_{m}\}}{ \int }_{\mathbb{R}}\varphi \left (\left \vert y - {f}_{{D}_{n}}(x)\right \vert \right )\,{\delta }_{0}(dy)\,{U}_{[0,1]}(dx) = \\ & =& {\int}_{\{x\in \mathcal{X}\vert \,{f}_{{D}_{ n}}(x)>{c}_{m}\}}\varphi \left (\left \vert {f}_{{D}_{n}}(x)\right \vert \right )\,{U}_{[0,1]}(dx) \geq \\ &\geq & m \cdot {U}_{[0,1]}\left (\{x \in \mathcal{X}\vert \,{f}_{{D}_{n}}(x) > {c}_{m}\}\right )\;\stackrel{(9)}{\geq }1. \end{array}$$
(10)

Define \(C := [1,\infty )\). Then, for every \(m \in \mathbb{N}\) and n ≥ n m ,

$$\begin{array}{rcl} & & \left [{\mathcal{R}}_{L,P} \circ {T}_{n}({Q}_{m}^{n})\right ](C) = {Q}_{ m}^{n}\left (\left \{{D}_{ n} \in {(\mathcal{X}\!\!\times \mathcal{Y})}^{n}\left \vert \;{\mathcal{R}}_{ L,P}({f}_{{D}_{n}}) \geq 1\right.\right \}\right )\; \geq \\ & &\quad \;{ (10) \atop \geq } \;{Q}_{m}^{n}\left ({B}_{ m}^{(n)}\right )\,{ (5) \atop \geq } \;\frac{2} {3}\; =\; \frac{1} {3} + \frac{1} {3} \\ & & \quad { (6) \atop \geq } \,\,{P}^{n}\left (\!\left \{{D}_{ n} \in {(\mathcal{X}\!\times \mathcal{Y})}^{n}\left \vert \;\,{\mathcal{R}}_{ L,P}({f}_{{D}_{n}})\, \geq \, \frac{1} {3}\right.\right \}\!\right ) + \frac{1} {3} \\ & & \quad \geq \;\left [{\mathcal{R}}_{L,P} \circ {T}_{n}({P}^{n})\right ]({C}^{\frac{1} {3} }) + \frac{1} {3} \\ \end{array}$$

where \({C}^{\frac{1} {3} } =\{ z \in \mathbb{R}{\vert \,\inf }_{{z}^{{\prime}}\in \mathbb{R}}\vert z - {z}^{{\prime}}\vert < \frac{1} {3}\}\) as in the definition of d { Pro}. This implies

$$\begin{array}{rcl}{ d}_{\text{ Pro}}\left ({\mathcal{R}}_{L,P} \circ {T}_{n}({Q}_{m}^{n}),{\mathcal{R}}_{ L,P} \circ {T}_{n}({P}^{n})\right )\; \geq \; \frac{1} {3}\qquad \forall \,n \geq {n}_{m}\;\;\forall \,m \in \mathbb{N}.& &\end{array}$$
(11)

However, for every \(m \in \mathbb{N}\) and every measurable \(B \subset \mathcal{X}\!\!\times \mathcal{Y}\), we have

$$\begin{array}{rcl}{ Q}_{m}(B)& =& {Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x \leq \frac{4} {m}\right.\right \} \cap B\right ) \\ & & +\,{Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right ) \\ & \leq & \frac{4} {m} + {Q}_{m}\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right )\; = \\ & =& \frac{4} {m} + P\left (\left \{(x,y) \in \mathcal{X}\!\!\times \mathcal{Y}\left \vert \,x > \frac{4} {m}\right.\right \} \cap B\right )\; \leq \; \frac{4} {m} + P\left ({B}^{ \frac{4} {m} }\right ) \\ \end{array}$$

and, therefore,

$$\begin{array}{rcl}{ d}_{\text{ Pro}}\left ({Q}_{m},P\right )\; \leq \; \frac{4} {m}\qquad \forall \,m \in \mathbb{N}.& &\end{array}$$
(12)

Inequalities (11) and (12) imply that T n , \(n \in \mathbb{N}\), is not qualitatively risk-robust.

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hable, R., Christmann, A. (2013). Robustness Versus Consistency in Ill-Posed Classification and Regression Problems. In: Giusti, A., Ritter, G., Vichi, M. (eds) Classification and Data Mining. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28894-4_4

Download citation

Publish with us

Policies and ethics