Localization of VC Classes: Beyond Local Rademacher Complexities

Zhivotovskiy, Nikita; Hanneke, Steve

doi:10.1007/978-3-319-46379-7_2

Nikita Zhivotovskiy^16,17 &
Steve Hanneke¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9925))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1259 Accesses

Abstract

In statistical learning the excess risk of empirical risk minimization (ERM) is controlled by $\left( \frac{\text {COMP}_{n}(\mathcal {F})}{n}\right) ^{\alpha }$, where n is a size of a learning sample, $\text {COMP}_{n}(\mathcal {F})$ is a complexity term associated with a given class $\mathcal {F}$ and $\alpha \in [\frac{1}{2}, 1]$ interpolates between slow and fast learning rates. In this paper we introduce an alternative localization approach for binary classification that leads to a novel complexity measure: fixed points of the local empirical entropy. We show that this complexity measure gives a tight control over $\text {COMP}_{n}(\mathcal {F})$ in the upper bounds under bounded noise. Our results are accompanied by a novel minimax lower bound that involves the same quantity. In particular, we practically answer the question of optimality of ERM under bounded noise for general VC classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexander, K.S.: Rates of growth and sample moduli for weighted empirical processes indexed by sets. Probab. Theory Relat. Fields 75, 379–423 (1987)
Article MathSciNet MATH Google Scholar
Balcan, M.F., Long, P.M.: Active and passive learning of linear separators under log-concave distributions. In: 26th Conference on Learning Theory (2013)
Google Scholar
Bartlett, P.L., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33(4), 1497–1537 (2005)
Article MathSciNet MATH Google Scholar
Boucheron, S., Bousquet, O., Lugosi, G.: Theory of classification: a survey of recent advances. ESAIM: Probab. Stat. 9, 323–375 (2005)
Article MathSciNet MATH Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Applications of Mathematics, vol. 31. Springer, New York (1996)
MATH Google Scholar
Giné, E., Koltchinskii, V.: Concentration inequalities and asymptotic results for ratio type empirical processes. Ann. Probab. 34(3), 1143–1216 (2006)
Article MathSciNet MATH Google Scholar
Hanneke, S.: Theory of disagreement-based active learning. Found. Trends Mach. Learn. 7(2–3), 131–309 (2014)
Article MATH Google Scholar
Hanneke, S., Yang, L.: Minimax analysis of active learning. J. Mach. Learn. Res. 16(12), 3487–3602 (2015)
MathSciNet MATH Google Scholar
Hanneke, S.: Refined error bounds for several learning algorithms (2015). http://arXiv.org/abs/1512.07146
Haussler, D., Littlestone, N., Warmuth, M.: Predicting $\{0, 1\}$-functions on randomly drawn points. Inf. Comput. 115, 248–292 (1994)
Article MathSciNet MATH Google Scholar
Haussler, D.: Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69, 217–232 (1995)
Article MathSciNet MATH Google Scholar
Koltchinskii, V.: Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Stat. 34(6), 2593–2656 (2006)
Article MathSciNet MATH Google Scholar
Le Cam, L.M.: Convergence of estimates under dimensionality restrictions. Ann. Statist. 1, 38–53 (1973)
Article MathSciNet MATH Google Scholar
Lecué, G., Mitchell, C.: Oracle inequalities for cross-validation type procedures. Electron. J. Stat. 6, 1803–1837 (2012)
Article MathSciNet MATH Google Scholar
Lecué, G., Mendelson, S.: Learning subgaussian classes: upper and minimax bounds (2013). http://arXiv.org/abs/1305.4825
Liang, T., Rakhlin, A., Sridharan, K.: Learning with square loss: localization through offset Rademacher complexity. In: Proceedings of The 28th Conference on Learning Theory (2015)
Google Scholar
Massart, P.: Concentration Inequalties and Model Selection. Ecole dEtè de Probabilités, Saint Flour. Springer, New York (2003)
Google Scholar
Massart, P., Nédélec, E.: Risk bounds for statistical learning. Ann. Stat. 34(5), 2326–2366 (2006)
Article MathSciNet MATH Google Scholar
Mendelson, S.: ‘Local’ vs. ‘global’ parameters – breaking the Gaussian complexity barrier (2015). http://arXiv.org/abs/1504.02191
Raginsky, M., Rakhlin, A.: Lower bounds for passive and active learning. In: Advances in Neural Information Processing Systems 24, NIPS (2011)
Google Scholar
Rakhlin, A., Sridharan, K., Tsybakov, A.B.: Empirical entropy, minimax regret and minimax risk. Bernoulli (2015, forthcoming)
Google Scholar
Talagrand, M.: Upper and Lower Bounds for Stochastic Processes. Springer, Heidelberg (2014)
Book MATH Google Scholar
Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Proc. USSR Acad. Sci. 181(4), 781–783 (1968). English tranlation: Soviet Math. Dokl. 9, 915–918
MathSciNet MATH Google Scholar
Vidyasagar, M.: Learning and Generalization with Applications to Neural Networks, 2nd edn. Springer, Heidelberg (2003)
MATH Google Scholar
Yang, Y., Barron, A.: Information-theoretic determination of minimax rates of convergence. Ann. Stat. 27, 1564–1599 (1999)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank Sasha Rakhlin for his suggestion to use offset Rademacher processes to analyze binary classification under Tsybakov noise conditions and anonymous reviewers for their helpful comments. NZ was supported solely by the Russian Science Foundation grant (project 14-50-00150).

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Moscow, Russia
Nikita Zhivotovskiy
Institute for Information Transmission Problems, Moscow, Russia
Nikita Zhivotovskiy
Princeton, NJ, 08542, USA
Steve Hanneke

Authors

Nikita Zhivotovskiy
View author publications
You can also search for this author in PubMed Google Scholar
Steve Hanneke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikita Zhivotovskiy .

Editor information

Editors and Affiliations

Montanuniversität Leoben , Leoben, Austria
Ronald Ortner
Ruhr-Uni-Bochum , Bochum, Germany
Hans Ulrich Simon
University of Regina , Regina, Saskatchewan, Canada
Sandra Zilles

Appendix

Proof

(Theorem 1 ). Let $\text {DIS}_{0}$ be a disagreement set of the version space of first $\lfloor n/2 \rfloor $ instances of the learning sample. The random error set will be denoted by $E_{1} = \{x \in \mathcal {X}|\hat{f}(x) \ne f^{*}(x)\}$. Using symmetrization Lemmas 2 and 1 we have $ \mathbb {E}P(E_{1}) = \mathbb {E}R(\hat{f}) \le \mathbb {E}\sup \limits _{g \in \mathcal {G}_{f^*}}(Pg - (1 + c)P_{n}g) \le \frac{2\left( 1 + \frac{c}{2}\right) ^2}{c}\frac{\log \left( \mathcal {S}_{\mathcal {F}}\left( n\right) \right) }{n} $ for $c > 0$. We fix $c = 2$ and prove that for any distribution $\mathbb {E}P(E_{1}) \le \frac{4\log \left( \mathcal {S}_{\mathcal {F}}\left( n\right) \right) }{n}$. Now we use $ R(\hat{f}) = P(E_{1}|\text {DIS}_{0})P(\text {DIS}_{0}). $ Let $\xi = |\text {DIS}_{0} \cap \{X_{\lfloor n/2 \rfloor + 1}, \ldots , X_{n}\}|$. Conditionally on the first $\lfloor n/2 \rfloor $ instances $\xi $ has binomial distribution. Expectations with respect to the first and the last parts of the sample will be denoted respectfully by $\mathbb {E}$ and $\mathbb {E}'$. Conditionally on $\{x_{1}, \ldots , x_{\lfloor n/2 \rfloor }\}$ we introduce two events: $ A_{1}: \xi < \frac{nP(\text {DIS}_{0})}{4}$ and $ A_{2}: \xi > \frac{3nP(\text {DIS}_{0})}{4}. $ Using Chernoff bounds we have $P(A_{1}) \le \exp \left( -\frac{nP(\text {DIS}_{0})}{16}\right) $ and $P(A_{2}) \le \exp \left( -\frac{nP(\text {DIS}_{0})}{16}\right) $. Denote $A = A_{1}\cup A_{2}$. Then $ \mathbb {E}'P(E_{1}|\text {DIS}_{0}) = \mathbb {E}'\left[ P(E_{1}|\text {DIS}_{0})\Big |\overline{A}\right] P(\overline{A}) + \mathbb {E}'\left[ P(E_{1}|\text {DIS}_{0})\Big |A\right] P(A). $ For the first term we have $ \mathbb {E}'\left[ P(E_{1}|\text {DIS}_{0})\Big |\overline{A}\right] P(\overline{A}) \le \frac{16\log \left( \mathcal {S}_{\mathcal {F}}\left( \frac{3nP(\text {DIS}_{0})}{4}\right) \right) }{nP(\text {DIS}_{0})} $ We can directly prove for the second term that $\mathbb {E}'\left[ P(E_{1}|\text {DIS}_{0})\Big |A\right] P(\text {DIS}_{0})P(A) \le \frac{12}{n}$. It easy to see, that for all natural k, r we have $\left( \mathcal {S}_{\mathcal {F}}(kr)\right) ^{\frac{1}{r}} \le \mathcal {S}_{\mathcal {F}}(k)$. Finally, $ \mathbb {E}R(\hat{f}) \le \mathbb {E}\tfrac{16\log \left( \mathcal {S}_{\mathcal {F}}\left( \tfrac{3nP(\text {DIS}_{0})}{4}\right) \right) }{n} + \tfrac{12}{n} \le \tfrac{40\log \left( \mathcal {S}_{\mathcal {F}}\left( \mathbf {s}\right) \right) }{n} + \tfrac{12}{n}. $ $\square $

Proof

(Lemma 4 ). Once again, given $X_1,\ldots ,X_n$, let $V = \{ (g(X_1),\ldots ,g(X_n)) : g \in \mathcal {G}\}$ denote the set of binary vectors corresponding to the values of functions in $\mathcal {G}$. As above, for a fixed $\gamma $ and fixed minimal $\gamma $-covering subset $\mathcal {N}_{\gamma } \subseteq V$, for each $v \in V$, p(v) will denote the closest vector to v in $\mathcal {N}_{\gamma }$. We will denote by $\mathbb {E}_{\xi }$ the conditional expectation over the $\xi _i$ variables, given $X_1,\ldots ,X_n$. We follow the decomposition proposed by Liang, Rakhlin, and Sridharan [16]:

$$\begin{aligned}&\frac{1}{n}\mathbb {E}_{\xi }\max \limits _{v \in V}\left( \sum \limits _{i = 1}^{n}\xi _{i}v_{i}- cv_{i}\right) \le \frac{1}{n}\mathbb {E}_{\xi }\max \limits _{v \in V}\left( \sum \limits _{i = 1}^{n}\xi _{i}\!\left( v_{i} \!-\! p(v)_{i}\right) \right) \\&+ \frac{1}{n}\mathbb {E}_{\xi }\max \limits _{v \in V}\left( \sum \limits _{i = 1}^{n}\frac{c}{4}p(v)_{i} \!-\! cv_{i}\right) + \frac{1}{n}\mathbb {E}_{\xi }\max \limits _{v \in V}\left( \sum \limits _{i = 1}^{n}\xi _{i}p(v)_{i} \!-\! \frac{c}{4}p(v)_{i}\right) . \end{aligned}$$

The first term is $\lesssim \frac{\gamma }{n}$ by the $\gamma $-cover property and the fact that $|\xi _i| \lesssim 1$. Furthermore it is easy to show that the second term is at most $\frac{c}{4} \frac{\gamma }{n}$. Now we analyze the last term carefully. First we use the standard peeling argument. Given a set W of binary vectors we define $W[a, b] = \{w \in W|a \le \rho _{H}(w, 0) < b\}$.

$$\begin{aligned}&\mathbb {E}_{\xi }\max \limits _{v \in V}\left( \sum \limits _{i = 1}^{n}\xi _{i}p(v)_{i}- \frac{c}{4}p(v)_{i}\right) =\mathbb {E}_{\xi }\max \limits _{v \in \mathcal {N}_{\gamma }}\left( \sum \limits _{i = 1}^{n}\xi _{i}v_{i}- \frac{c}{4}v_{i}\right) \\&\le \mathbb {E}_{\xi }\max \limits _{v \in \mathcal {N}_{\gamma }[0, 2\gamma /c]}\left( \xi _{i}v_{i}- \frac{c}{4}v_{i}\right) + \sum \limits _{k = 1}^{\infty }\mathbb {E}_{\xi }\max \limits _{\mathcal {N}_{\gamma }\left[ 2^{k}\gamma /c, 2^{k + 1}\gamma /c\right] }\left( \sum \limits _{i = 1}^{n}\xi _{i}v_{i}- \frac{c}{4}v_{i}\right) _{+}. \end{aligned}$$

The first term is upper bounded by $\frac{2\log (\mathcal {M}^{\text {loc}}_{1}(V, \gamma , n,c))}{cn}$ by Lemma 1 and by noting that $|\mathcal {N}_{\gamma }[0, 2\gamma /c]| \!\!\le \!\! \mathcal {M}_{1}(\mathcal {B}_{H}(0, (2\gamma )/c,\{X_1,\ldots ,X_n\}), (2\gamma )/2)\!\le \! \mathcal {M}^{\text {loc}}_{1}(V,\gamma ,n,c)$. Now we upper-bound the second term. We start with an arbitrary summand. For any $\lambda > 0$, we have

$$\begin{aligned}&\mathbb {E}_{\xi }\max \limits _{v \in \{0\} \cup \mathcal {N}_{\gamma }\left[ 2^{k}\gamma /c, 2^{k + 1}\gamma /c\right] }\left( \sum \limits _{i = 1}^{n}\xi _{i}v_{i}- \frac{c}{4}v_{i}\right) \\&\le \frac{1}{\lambda }\ln \left( \sum \limits _{v \in \mathcal {N}_{\gamma }\left[ 2^{k}\gamma /c, 2^{k + 1}\gamma /c\right] }\mathbb {E}_{\xi }\exp \left\{ \sum \limits _{i = 1}^{n}\lambda \xi _{i}v_{i} - \frac{\lambda c}{4} v_{i}\right\} + 1\right) \\&\le \frac{1}{\lambda }\ln \left( \left| \mathcal {N}_{\gamma }\left[ 2^{k}\gamma /c, 2^{k + 1}\gamma /c\right] \right| \exp \left\{ 2^{k-2}\gamma (4\lambda ^2 - \lambda c)/c\right\} + 1\right) \\&\le \frac{1}{\lambda }\ln \left( \left( \mathcal {M}^{\text {loc}}_{1}(\mathcal {G}, 2\gamma , n, c)\right) ^{2^{k + 1}}\exp \left\{ 2^{k-2}\gamma (4\lambda ^2 - \lambda c)/c\right\} + 1\right) . \end{aligned}$$

Here we used that $\left| \mathcal {M}_{\gamma }\left[ 0, 2^{k + 1}\gamma /c\right] \right| \le \left| \mathcal {M}^{\text {loc}}_{1}(\mathcal {G}, 2\gamma , n, c)\right| ^{2^{k + 1}}$ and that any minimal covering is also a packing. We fix $\gamma = K\gamma ^{\text {loc}}_{c, c}(n)$ for some $K > 2$. Observe that local entropy is nonincreasing and $K\gamma ^{\text {loc}}_{c, c}(n) > 2\gamma ^{\text {loc}}_{c,c}(n) \ge \gamma ^{\text {loc}}_{c, c}(n) + 1$. Thus,

$$\begin{aligned}&\ln \left( \exp \left\{ 2^{k + 1}\log \left( \mathcal {M}^{\text {loc}}_{1}(V, 2K\gamma ^{\text {loc}}_{c, c}(n), n, c)\right) + 2^{k-2}K\gamma ^{\text {loc}}_{c, c}(n)(4\lambda ^2 - \lambda c)/c\right\} + 1\right) \\&\le \ln \left( \exp \left\{ 2^{k + 1}c(\gamma ^{\text {loc}}_{c, c}(n) + 1) + 2^{k-2}K\gamma ^{\text {loc}}_{c, c}(n)(4\lambda ^2 - \lambda c)/c\right\} + 1\right) . \end{aligned}$$

Then we have for $\lambda = \frac{c}{8}$,

$$\begin{aligned}&\sum \limits _{k = 1}^{\infty }\frac{8}{c}\ln \left( \exp \left( 2^{k + 1}\log \left( \mathcal {M}^{\text {loc}}_{1}(\mathcal {G}, 2K\gamma ^{\text {loc}}_{c, c}(n), n)\right) \right) \exp \left( -2^{k - 6}Kc\gamma ^{\text {loc}}_{c, c}(n)\right) + 1\right) \\&\le \sum \limits _{k = 1}^{\infty }\frac{8}{c}\ln \left( \exp \left( 2^{k + 2}c\gamma ^{\text {loc}}_{c, c}(n) -2^{k - 6}Kc\gamma ^{\text {loc}}_{c, c}(n)\right) + 1\right) . \end{aligned}$$

We set $K = 2^{9}$ and have $\sum \limits _{k = 1}^{\infty }\ln \left( \exp \left( 2^{k + 2}c\gamma ^{\text {loc}}_{c, c}(n) -2^{k - 6}Kc\gamma ^{\text {loc}}_{c, c}(n)\right) + 1\right) \le C, $ where $C > 0$ is an absolute constant. Here we used that $\ln (x + 1) \le x$ for $x > 0$ and $c\gamma ^{\text {loc}}_{c, c} \gtrsim 1$. Combining with the first two terms we finish the proof. $\square $

Proof

(Proposition 1). The first part of the proof closely follows the proof of Theorem 17 in [8], with slight modifications, to arrive at an upper bound on $\mathcal {M}^{\text {loc}}_{1}(\mathcal {F}, \gamma , n, h)$. The suprema in the definition of local empirical entropy are achieved at some set $\{x_{1}, \ldots , x_{n}\}$, some function $f \in \mathcal {F}$, and some $\varepsilon \in [\gamma ,n]$. Letting $r = \varepsilon /n$, denote by $\mathcal {M}_r$ the maximal (rn / 2)-packing (under $\rho _H$) of $\mathcal {B}_{H}(f,rn/h,\{x_1,\ldots ,x_n\})$, so that $|\mathcal {M}_r| = \mathcal {M}^{\text {loc}}_{1}(\mathcal {F},\gamma ,n,h)$. Also introduce a uniform probability measure $P_X$ on $\{x_1,\ldots ,x_n\}$ and fix $m = \left\lceil \frac{4}{r}\log (|\mathcal {M}_r|)\right\rceil $. Let $X_{1}, \ldots , X_{m}$ be m independent $P_X$-distributed random variables, and let A denote the event that, for all $g,g' \in \mathcal {M}_{r}$ with $g \ne g'$, there exists an $i \in \{1, \ldots , n\}$ such that $g(X_{i}) \ne g'(X_{i})$. For a given pair of distinct functions $g,g' \in \mathcal {M}_r$, they disagree on some $X_i$ with probability $ 1 - (1 - P_X(g(X) \ne g'(X)))^m > 1 - \exp (-rm/2) \ge 1 - \frac{1}{|\mathcal {M}_r|^2}. $ Using a union bound and summing over all possible unordered pairs $g, g' \in \mathcal {M}_r$ will give us that $\mathbb {P}(A) > \frac{1}{2}$. On the event A, functions in $\mathcal {M}_{r}$ realize distinct classifications of $X_{1}, \ldots , X_{m}$. For any $X_i \notin \text {DIS}(\mathcal {B}_{H}(f, rn/h, \{x_{1}, \ldots , x_{n}\})$, all classifiers in $\mathcal {M}_{r}$ agree. Thus, $|\mathcal {M}_r|$ is bounded by the number of classifications $\{X_{1}, \ldots , X_{m}\} \cap \text {DIS}(\mathcal {B}_{H}(f, rn/h))$ realized by classifiers in $\mathcal {F}$. By the Chernoff bound, on an event B with $\mathbb {P}(B) \ge \frac{1}{2}$ we have $ |\{X_{1}, \ldots , X_{m}\} \cap \text {DIS}(\mathcal {B}_{H}(f, rn/h))| \le 1 + 2eP_X(\text {DIS}(\mathcal {B}_{H}(f, rn/h))m. $ Using the definition of $\tau (\cdot )$ (Definition 2) we have $ 1 + 2eP_X(\text {DIS}(\mathcal {B}_{H}(f, rn/h)))m \le 1 + 2e \tau \left( \frac{r}{h}\right) \frac{r}{h} m \le 11e\tau \left( \frac{r}{h}\right) \frac{\log (|\mathcal {M}_{r}|)}{h}. $ With probability at least $\frac{1}{2}$, $ |\{X_{1}, \ldots , X_{m}\} \cap \text {DIS}(\mathcal {B}_{H}(f, rn/h))| \le 11e\tau \left( \frac{r}{h}\right) \frac{\log (|\mathcal {M}_{r}|)}{h}. $ Using the union bound, we have that with positive probability there exists a sequence of at most $11e\tau \left( \frac{r}{h}\right) \frac{\log (|\mathcal {M}_r|)}{h}$ elements, such that all functions in $\mathcal {M}_r$ classify this sequence distinctly. By the VC lemma [23], we therefore have that $ |\mathcal {M}_{r}| \le \left( \frac{11e^2\tau \left( \frac{r}{h}\right) \frac{\log (|\mathcal {M}_{r}|)}{h}}{d}\right) ^{d}. $ Using Corollary 4.1 from [24] we have $ \log (|\mathcal {M}_{r}|) \le 2d\log \left( 11e^2\tau \left( \frac{r}{h}\right) \frac{1}{h}\right) . $ Using $\tau \left( \frac{r}{h}\right) \le \mathbf {s} \wedge \frac{h}{r} \le \mathbf {s} \wedge \frac{nh}{\gamma }$ (Theorem 10 in [8]) we finally have $ \log (\mathcal {M}^{\text {loc}}_{1}(\mathcal {F}, \gamma , n, h)) \le 2d\log \left( 11e^2\left( \frac{n}{\gamma } \wedge \frac{\mathbf {s}}{h}\right) \right) . $ Observe that $ h\gamma ^{\text {loc}}_{h, h}(n) \le 2d\log \left( 11e^2\left( \frac{n}{\gamma ^{\text {loc}}_{h, h}(n)} \wedge \frac{\mathbf {s}}{h}\right) \right) . $ We have $\gamma ^{\text {loc}}_{h, h}(n) \le \frac{2d\log \left( 11e^2\frac{\mathbf {s}}{h}\right) }{h}$. If $\gamma = \frac{2d\log \left( 11e^2\frac{nh}{d}\right) }{h}$, then $h\gamma = 2d\log \left( 11e^2\frac{nh}{d}\right) $, but $2d\log \left( 11e^2\frac{n}{\gamma }\right) \le 2d\log \left( 11e^2\frac{nh}{d}\right) $ if $h > \frac{d}{11en}$. Finally, we have $ \gamma ^{\text {loc}}_{h, h}(n) \le \frac{2d\log \left( 11e^2\left( \frac{nh}{d} \wedge \frac{\mathbf {s}}{h}\right) \right) }{h}. $ Now we prove the lower bound. From (2) established above, we know that $\frac{\gamma ^{\text {loc}}_{h, h}(n)}{n}$ is, up to an absolute constant, a distribution-free upper bound for $\mathbb {E}(R(\hat{f}) - R(f^{*}))$, holding for all ERM learners $\hat{f}$. Then a lower bound on $\sup \limits _{P \in \mathcal {P}(h,\mathcal {F})} \mathbb {E}(R(\hat{f}) - R(f^{*}))$ holding for any ERM learner is also a lower bound for $\frac{\gamma ^{\text {loc}}_{h, h}(n)}{n}$. In particular, it is known [9, 18] that for any learning procedure $\tilde{f}$, if $h \ge \sqrt{\frac{d}{n}}$, then $\sup \limits _{P \in \mathcal {P}(h,\mathcal {F})} \mathbb {E}(R(\tilde{f}) - R(f^{*})) \gtrsim \frac{d + (1-h)\log (n h^2 \wedge \mathbf {s})}{nh}$, while if $h < \sqrt{\frac{d}{n}}$ then $\sup \limits _{P \in \mathcal {P}(h,\mathcal {F})} \mathbb {E}(R(\tilde{f}) - R(f^{*})) \gtrsim \sqrt{\frac{d}{n}}$. Furthermore, in the particular case of ERM, [9] proves that any upper bound on $\sup \limits _{P \in \mathcal {P}(1,\mathcal {F})} \mathbb {E}(R(\hat{f}) - R(f^*))$ holding for all ERM learners $\hat{f}$ must have size, up to an absolute constant, at least $\frac{\log (n \wedge \mathbf {s})}{n}$. Together, these lower bounds imply $\gamma ^{\text {loc}}_{h, h}(n) \gtrsim \frac{d + \log (n h^2 \wedge \mathbf {s})}{h} \wedge \sqrt{d n}$. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhivotovskiy, N., Hanneke, S. (2016). Localization of VC Classes: Beyond Local Rademacher Complexities. In: Ortner, R., Simon, H., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2016. Lecture Notes in Computer Science(), vol 9925. Springer, Cham. https://doi.org/10.1007/978-3-319-46379-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-46379-7_2
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46378-0
Online ISBN: 978-3-319-46379-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Localization of VC Classes: Beyond Local Rademacher Complexities

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Proof

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation