On Algorithmic Statistics for Space-Bounded Algorithms

Milovanov, Alexey

doi:10.1007/978-3-319-58747-9_21

On Algorithmic Statistics for Space-Bounded Algorithms

Alexey Milovanov^14,15,16

Conference paper
First Online: 06 May 2017

613 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10304))

Abstract

Algorithmic statistics studies explanations of observed data that are good in the algorithmic sense: an explanation should be simple i.e. should have small Kolmogorov complexity and capture all the algorithmically discoverable regularities in the data. However this idea can not be used in practice because Kolmogorov complexity is not computable.

In this paper we develop algorithmic statistics using space-bounded Kolmogorov complexity. We prove an analogue of one of the main result of ‘classic’ algorithmic statistics (about the connection between optimality and randomness deficiences). The main tool of our proof is the Nisan-Wigderson generator.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The definition and basic properties of Kolmogorov complexity can be found in the textbooks [5, 13], for a short survey see [11].
2.
Kolmogorov complexity of A is defined as follows. We fix any computable bijection $A \mapsto [A]$ from the family of finite sets to the set of binary strings, called encoding. Then we define ${{\mathrm{\mathrm {C}}}}(A)$ as the complexity ${{\mathrm{\mathrm {C}}}}([A])$ of the code [A] of A.
3.
The randomness deficiency of a string x with respect to a distribution P is defined as $d(x |P):= -\log P(x) - {{\mathrm{\mathrm {C}}}}(x |P)$, the optimality deficiency is defined as $\delta (x,P):= {{\mathrm{\mathrm {C}}}}(P) - \log P(x) - {{\mathrm{\mathrm {C}}}}(x)$.
4.
Such an universal machine does exist – see [5].
5.
Theorem 1.2 in [8] has another formulation: it does not contain any information about $|\widehat{f}|$. However, from the proof of the theorem it follows that a needed program (denote it as $\widehat{f}_1$) is got from f by using an algorithmic transformation. Therefore there exists a program $\widehat{f}$ that works functionally like $\widehat{f}_1$ such that $|\widehat{f}| \le |f| + O(1)$.
Also, Theorem 1.2 does not assume that $\Pr [f(x)]$ can belong to $[\frac{1}{3}; \frac{2}{3}]$. However, this assumption does not used in the proof of Theorem 1.2.

References

Ajtai, M.: Approximate counting with uniform constant-depth circuits. In: Advanced in Computational Complexity Theory, pp. 1–20. American Mathematical Society (1993)
Google Scholar
Buhrman, H., Fortnow, L., Laplante, S.: Resource-Bounded Kolmogorov complexity revisited. SIAM J. Comput. 31(3), 887–905 (2002)
Article MathSciNet MATH Google Scholar
Furst, M., Saxe, J.B., Sipser, M.: Math. Syst. Theory 17(1), 13–27 (1984)
Article Google Scholar
Kolmogorov, A.N.: Approaches, three approaches to the quantitative definition of information. Problems Inf. Transmission 1(1), 4–11 (1965). English translation published in Int. J. Comput. Math. 2, 157–168 (1968)
Google Scholar
Li, P., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn, p. 792. Springer, Heidelberg (1993). 1st edn. 1993; 2nd edn. 1997
Book MATH Google Scholar
Longpré, L.: Resource bounded kolmogorov complexity, a link between computational complexity and information theory. Ph. D. Thesis, Cornell University, Ithaca, NY (1986)
Google Scholar
Musatov, D.: Improving the space-bounded version of muchnik’s conditional complexity theorem via “naive” derandomization. Theory Comput. Syst. 55(2), 299–312 (2014)
Article MathSciNet MATH Google Scholar
Nisan, N.: $RL \subseteq SC$. J. Comput. Complex. 4, 1–11 (1994)
Article MathSciNet Google Scholar
Nisan, N.: Pseudorandom bits for constant depth circuits. Combinatorica 11, 63–70 (1991)
Article MathSciNet MATH Google Scholar
Nisan, N., Wigderson, A.: Hardness vs randomness. J. Comput. Syst. Sci. 49(2), 149–167 (1994)
Article MathSciNet MATH Google Scholar
Shen, A., Kolmogorov, A.: Around kolmogorov complexity: basic notions and results. In: Vovk, V., Papadoupoulos, H., Gammerman, A. (eds.) Measures of Complexity: Festschrift for Alexey Chervonenkis. Springer, Heidelberg (2015)
Google Scholar
Shen, A.: The concept of $(\alpha , \beta )$-stochasticity in the Kolmogorov sense, and its properties. Sov. Math. Doklady 271(1), 295–299 (1983)
Google Scholar
Shen, A., Uspensky, V., Vereshchagin, N.: Kolmogorov complexity and algorithmic randomness. In: MCCME 2013 (Russian). English translation http://www.lirmm.fr/~ashen/kolmbook-eng.pdf
Sipser, M.: A complexity theoretic approach to randomness. In: Proceedings of the 15th ACM Symposium on the Theory of Computing, pp. 330–335 (1983)
Google Scholar
Vereshchagin, N., Vitányi, P.: Kolmogorov’s Structure Functions with an Application to the foundations of model selection. IEEE Trans. Inf. Theory 50(12), 3265–3290 (2004). Preliminary version: Proceedings of 47th IEEE Symposium on the Foundations of Computer Science, pp. 751–760 (2002)
Google Scholar
Vereshchagin, N.K., Vitányi, P.M.B.: Rate distortion a nd denoising of individual data using kolmogorov complexity. IEEE Trans. Inf. Theory 56(7), 3438–3454 (2010)
Article Google Scholar

Download references

Acknowledgments

I would like to thank Nikolay Vereshchagin and Alexander Shen for useful discussions, advice and remarks.

This work is supported by RFBR grant 16-01-00362 and supported in part by Young Russian Mathematics award and RaCAF ANR-15-CE40-0016-01 grant. The study has been funded by the Russian Academic Excellence Project ‘5-100’.

Author information

Authors and Affiliations

National Research University Higher School of Economics, Moscow, Russia
Alexey Milovanov
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Alexey Milovanov
Moscow State University, Moscow, Russia
Alexey Milovanov

Authors

Alexey Milovanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexey Milovanov .

Editor information

Editors and Affiliations

University of Bordeaux , Talence, France
Pascal Weil

Appendix

Symmetry of Information

Define ${{\mathrm{\mathrm {CD}}}}^m(A,B)$ as the minimal length of a program that inputs a pair of strings (a, b) and outputs a pair of boolean values $(a \in A, b \in B)$ using space at most m for every input.

Lemma 4

(Symmetry of information). Assume $A, B \subseteq \{0,1\}^n$. Then

$$ (a) \forall m \text { } {{\mathrm{\mathrm {CD}}}}^p(A, B) \le {{\mathrm{\mathrm {CD}}}}^m(A) + {{\mathrm{\mathrm {CD}}}}^m(B |A) + O(\log ({{\mathrm{\mathrm {CD}}}}^m(A,B)+m + n))$$

for $p = m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))$.

$$ (b) \forall m \text { } {{\mathrm{\mathrm {CD}}}}^p(A) + {{\mathrm{\mathrm {CD}}}}^p(B |A) \le {{\mathrm{\mathrm {CD}}}}^m(A, B) + O(\log ({{\mathrm{\mathrm {CD}}}}^m(A,B)+m + n) )$$

for $p = 2m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))$.

Proof

(of Lemma 4(a)). The proof is similar to the proof of Theorem 4(a).

Proof

(of Lemma 4(b)). Let $k:= {{\mathrm{\mathrm {CD}}}}^m(A, B)$. Denote by $\mathcal {D}$ the family of sets (U, V) such that ${{\mathrm{\mathrm {CD}}}}^m(U,V) \le k$ and $U,V \subseteq \{0,1\}^n$. It is clear that $|\mathcal {D}| < 2^{k+1}$. Denote by $\mathcal {D}_{A}$ the pairs of $\mathcal {D}$ whose the first element is equal to A. Let t satisfy the inequalities $2^t \le |\mathcal {D}_{A}| < 2^{t+1}$.

Let us prove that

${{\mathrm{\mathrm {CD}}}}^p(B |A)$ does not exceed t significantly;
${{\mathrm{\mathrm {CD}}}}^p(A)$ does not exceed $k - t$ significantly.

Here $p=m + O(n)$.

We start with the first statement. There exists a program that enumerates all sets from $\mathcal {D}_{A}$ using A as an oracle and that works in space $2m + O(n)$. Indeed, such enumeration can be done in the following way: enumerate all programs of length k and verify the following condition for every pair of n-bit strings. First, a program uses at most m space on this input. Second, if a second n-bit string belongs to A then the program outputs 1, and 0 otherwise. Since some program loops we need additional $m + O(n)$ space to take it into account.

Append to this program the ordinal number of a program that distinguishes (A, B). This number is not greater than $t+1$. Therefore we have ${{\mathrm{\mathrm {CD}}}}^p(B |A) \le t + O(\log ({{\mathrm{\mathrm {CD}}}}^m(A,B) + m + n))$.

Now let us prove the second statement. Note that there exist at most $2^{k-t +1}$ sets U such that $|\mathcal {D}_U| \ge 2^t$ (including A). Hence, if we construct a program that enumerates all sets with such property (and does not use much space) then we will win—the set A can be described by the ordinal number of this enumeration.

Let us construct such a program. It works as follows:

enumerate all sets U that are the first elements from $\mathcal {D}$, i.e. we enumerate programs that distinguish the corresponding sets (say, lexicographically). We go to the next step if the following properties holds. First, $|\mathcal {D}_U| \ge 2^t$, and second: we did not meet set U earlier (i.e. every program whose the lexicographical number is smaller does not distinguish U or is not the first element from a set from $\mathcal {D}$).

This program works in $2m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))$ space (that we want) and has length $O(\log ({{\mathrm{\mathrm {CD}}}}^m(A)+ n +m))$.

Proof

(of Lemma 3 ). Let us show that $\mathcal {B}$ satisfies property $(1)^*$ with probability at most $2^{-n}$. Since $\mathcal {B}$ satisfies property (2) with probability at most $\frac{1}{4}$ (see the proof of Lemma 2) it would be enough for us.

For this let us show that every part is ‘bad’ (i.e. has at least $(n + k)^2 + 1$ sets from $\mathcal {B}$) with probability at most $2^{-2n}$. The probability of such event is equal to the probability of the following event: a binomial random variable with parameters $(2^k, 2^{-k}(n + 2)\ln 2)$ is greater than $(n + k)^2$. To get the needed upper bound for this probability is not difficult however the correspondent formulas are cumbersome. Take $w:=2^k$, $p:=2^{-k}(n + 2)\ln 2$ and $v:=(n + k)^2$. We need to estimate

$$\sum _{i=v}^{w} {{w}\atopwithdelims (){i}} p^i(1-p)^{w-i}< w \cdot {{w}\atopwithdelims (){v}} p^v(1-p)^{w-v}< w \cdot {{w}\atopwithdelims (){v}} p^v < w \frac{(wp)^{v}}{v!}.$$

The first inequality holds since $wp = (n+2) \ln 2 \le (n+k)^2 = v$. Now note that $wp= (n+2) \ln 2 < 10 n$. So

$$w \frac{(wp)^{v}}{v!} < \frac{2^k (10n)^{(n+k)^2}}{((n+k)^2)!} \ll 2^{-2n}. $$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Milovanov, A. (2017). On Algorithmic Statistics for Space-Bounded Algorithms. In: Weil, P. (eds) Computer Science – Theory and Applications. CSR 2017. Lecture Notes in Computer Science(), vol 10304. Springer, Cham. https://doi.org/10.1007/978-3-319-58747-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-58747-9_21
Published: 06 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58746-2
Online ISBN: 978-3-319-58747-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 4

Proof

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation