Skip to main content

On Algorithmic Statistics for Space-Bounded Algorithms

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10304))

Abstract

Algorithmic statistics studies explanations of observed data that are good in the algorithmic sense: an explanation should be simple i.e. should have small Kolmogorov complexity and capture all the algorithmically discoverable regularities in the data. However this idea can not be used in practice because Kolmogorov complexity is not computable.

In this paper we develop algorithmic statistics using space-bounded Kolmogorov complexity. We prove an analogue of one of the main result of ‘classic’ algorithmic statistics (about the connection between optimality and randomness deficiences). The main tool of our proof is the Nisan-Wigderson generator.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The definition and basic properties of Kolmogorov complexity can be found in the textbooks [5, 13], for a short survey see [11].

  2. 2.

    Kolmogorov complexity of A is defined as follows. We fix any computable bijection \(A \mapsto [A]\) from the family of finite sets to the set of binary strings, called encoding. Then we define \({{\mathrm{\mathrm {C}}}}(A)\) as the complexity \({{\mathrm{\mathrm {C}}}}([A])\) of the code [A] of A.

  3. 3.

    The randomness deficiency of a string x with respect to a distribution P is defined as \(d(x |P):= -\log P(x) - {{\mathrm{\mathrm {C}}}}(x |P)\), the optimality deficiency is defined as \(\delta (x,P):= {{\mathrm{\mathrm {C}}}}(P) - \log P(x) - {{\mathrm{\mathrm {C}}}}(x)\).

  4. 4.

    Such an universal machine does exist – see [5].

  5. 5.

    Theorem 1.2 in [8] has another formulation: it does not contain any information about \(|\widehat{f}|\). However, from the proof of the theorem it follows that a needed program (denote it as \(\widehat{f}_1\)) is got from f by using an algorithmic transformation. Therefore there exists a program \(\widehat{f}\) that works functionally like \(\widehat{f}_1\) such that \(|\widehat{f}| \le |f| + O(1)\).

    Also, Theorem 1.2 does not assume that \(\Pr [f(x)]\) can belong to \([\frac{1}{3}; \frac{2}{3}]\). However, this assumption does not used in the proof of Theorem 1.2.

References

  1. Ajtai, M.: Approximate counting with uniform constant-depth circuits. In: Advanced in Computational Complexity Theory, pp. 1–20. American Mathematical Society (1993)

    Google Scholar 

  2. Buhrman, H., Fortnow, L., Laplante, S.: Resource-Bounded Kolmogorov complexity revisited. SIAM J. Comput. 31(3), 887–905 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  3. Furst, M., Saxe, J.B., Sipser, M.: Math. Syst. Theory 17(1), 13–27 (1984)

    Article  Google Scholar 

  4. Kolmogorov, A.N.: Approaches, three approaches to the quantitative definition of information. Problems Inf. Transmission 1(1), 4–11 (1965). English translation published in Int. J. Comput. Math. 2, 157–168 (1968)

    Google Scholar 

  5. Li, P., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn, p. 792. Springer, Heidelberg (1993). 1st edn. 1993; 2nd edn. 1997

    Book  MATH  Google Scholar 

  6. Longpré, L.: Resource bounded kolmogorov complexity, a link between computational complexity and information theory. Ph. D. Thesis, Cornell University, Ithaca, NY (1986)

    Google Scholar 

  7. Musatov, D.: Improving the space-bounded version of muchnik’s conditional complexity theorem via “naive” derandomization. Theory Comput. Syst. 55(2), 299–312 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Nisan, N.: \(RL \subseteq SC\). J. Comput. Complex. 4, 1–11 (1994)

    Article  MathSciNet  Google Scholar 

  9. Nisan, N.: Pseudorandom bits for constant depth circuits. Combinatorica 11, 63–70 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  10. Nisan, N., Wigderson, A.: Hardness vs randomness. J. Comput. Syst. Sci. 49(2), 149–167 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  11. Shen, A., Kolmogorov, A.: Around kolmogorov complexity: basic notions and results. In: Vovk, V., Papadoupoulos, H., Gammerman, A. (eds.) Measures of Complexity: Festschrift for Alexey Chervonenkis. Springer, Heidelberg (2015)

    Google Scholar 

  12. Shen, A.: The concept of \((\alpha , \beta )\)-stochasticity in the Kolmogorov sense, and its properties. Sov. Math. Doklady 271(1), 295–299 (1983)

    Google Scholar 

  13. Shen, A., Uspensky, V., Vereshchagin, N.: Kolmogorov complexity and algorithmic randomness. In: MCCME 2013 (Russian). English translation http://www.lirmm.fr/~ashen/kolmbook-eng.pdf

  14. Sipser, M.: A complexity theoretic approach to randomness. In: Proceedings of the 15th ACM Symposium on the Theory of Computing, pp. 330–335 (1983)

    Google Scholar 

  15. Vereshchagin, N., Vitányi, P.: Kolmogorov’s Structure Functions with an Application to the foundations of model selection. IEEE Trans. Inf. Theory 50(12), 3265–3290 (2004). Preliminary version: Proceedings of 47th IEEE Symposium on the Foundations of Computer Science, pp. 751–760 (2002)

    Google Scholar 

  16. Vereshchagin, N.K., Vitányi, P.M.B.: Rate distortion a nd denoising of individual data using kolmogorov complexity. IEEE Trans. Inf. Theory 56(7), 3438–3454 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

I would like to thank Nikolay Vereshchagin and Alexander Shen for useful discussions, advice and remarks.

This work is supported by RFBR grant 16-01-00362 and supported in part by Young Russian Mathematics award and RaCAF ANR-15-CE40-0016-01 grant. The study has been funded by the Russian Academic Excellence Project ‘5-100’.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexey Milovanov .

Editor information

Editors and Affiliations

Appendix

Appendix

Symmetry of Information

Define \({{\mathrm{\mathrm {CD}}}}^m(A,B)\) as the minimal length of a program that inputs a pair of strings (ab) and outputs a pair of boolean values \((a \in A, b \in B)\) using space at most m for every input.

Lemma 4

(Symmetry of information). Assume \(A, B \subseteq \{0,1\}^n\). Then

$$ (a) \forall m \text { } {{\mathrm{\mathrm {CD}}}}^p(A, B) \le {{\mathrm{\mathrm {CD}}}}^m(A) + {{\mathrm{\mathrm {CD}}}}^m(B |A) + O(\log ({{\mathrm{\mathrm {CD}}}}^m(A,B)+m + n))$$

for \(p = m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))\).

$$ (b) \forall m \text { } {{\mathrm{\mathrm {CD}}}}^p(A) + {{\mathrm{\mathrm {CD}}}}^p(B |A) \le {{\mathrm{\mathrm {CD}}}}^m(A, B) + O(\log ({{\mathrm{\mathrm {CD}}}}^m(A,B)+m + n) )$$

for \(p = 2m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))\).

Proof

(of Lemma 4(a)). The proof is similar to the proof of Theorem 4(a).

Proof

(of Lemma 4(b)). Let \(k:= {{\mathrm{\mathrm {CD}}}}^m(A, B)\). Denote by \(\mathcal {D}\) the family of sets (UV) such that \({{\mathrm{\mathrm {CD}}}}^m(U,V) \le k\) and \(U,V \subseteq \{0,1\}^n\). It is clear that \(|\mathcal {D}| < 2^{k+1}\). Denote by \(\mathcal {D}_{A}\) the pairs of \(\mathcal {D}\) whose the first element is equal to A. Let t satisfy the inequalities \(2^t \le |\mathcal {D}_{A}| < 2^{t+1}\).

Let us prove that

  • \({{\mathrm{\mathrm {CD}}}}^p(B |A)\) does not exceed t significantly;

  • \({{\mathrm{\mathrm {CD}}}}^p(A)\) does not exceed \(k - t\) significantly.

Here \(p=m + O(n)\).

We start with the first statement. There exists a program that enumerates all sets from \(\mathcal {D}_{A}\) using A as an oracle and that works in space \(2m + O(n)\). Indeed, such enumeration can be done in the following way: enumerate all programs of length k and verify the following condition for every pair of n-bit strings. First, a program uses at most m space on this input. Second, if a second n-bit string belongs to A then the program outputs 1, and 0 otherwise. Since some program loops we need additional \(m + O(n)\) space to take it into account.

Append to this program the ordinal number of a program that distinguishes (AB). This number is not greater than \(t+1\). Therefore we have \({{\mathrm{\mathrm {CD}}}}^p(B |A) \le t + O(\log ({{\mathrm{\mathrm {CD}}}}^m(A,B) + m + n))\).

Now let us prove the second statement. Note that there exist at most \(2^{k-t +1}\) sets U such that \(|\mathcal {D}_U| \ge 2^t\) (including A). Hence, if we construct a program that enumerates all sets with such property (and does not use much space) then we will win—the set A can be described by the ordinal number of this enumeration.

Let us construct such a program. It works as follows:

enumerate all sets U that are the first elements from \(\mathcal {D}\), i.e. we enumerate programs that distinguish the corresponding sets (say, lexicographically). We go to the next step if the following properties holds. First, \(|\mathcal {D}_U| \ge 2^t\), and second: we did not meet set U earlier (i.e. every program whose the lexicographical number is smaller does not distinguish U or is not the first element from a set from \(\mathcal {D}\)).

This program works in \(2m + \text {poly}(n + {{\mathrm{\mathrm {CD}}}}^m(A,B))\) space (that we want) and has length \(O(\log ({{\mathrm{\mathrm {CD}}}}^m(A)+ n +m))\).

Proof

(of Lemma 3 ). Let us show that \(\mathcal {B}\) satisfies property \((1)^*\) with probability at most \(2^{-n}\). Since \(\mathcal {B}\) satisfies property (2) with probability at most \(\frac{1}{4}\) (see the proof of Lemma 2) it would be enough for us.

For this let us show that every part is ‘bad’ (i.e. has at least \((n + k)^2 + 1\) sets from \(\mathcal {B}\)) with probability at most \(2^{-2n}\). The probability of such event is equal to the probability of the following event: a binomial random variable with parameters \((2^k, 2^{-k}(n + 2)\ln 2)\) is greater than \((n + k)^2\). To get the needed upper bound for this probability is not difficult however the correspondent formulas are cumbersome. Take \(w:=2^k\), \(p:=2^{-k}(n + 2)\ln 2\) and \(v:=(n + k)^2\). We need to estimate

$$\sum _{i=v}^{w} {{w}\atopwithdelims (){i}} p^i(1-p)^{w-i}< w \cdot {{w}\atopwithdelims (){v}} p^v(1-p)^{w-v}< w \cdot {{w}\atopwithdelims (){v}} p^v < w \frac{(wp)^{v}}{v!}.$$

The first inequality holds since \(wp = (n+2) \ln 2 \le (n+k)^2 = v\). Now note that \(wp= (n+2) \ln 2 < 10 n\). So

$$w \frac{(wp)^{v}}{v!} < \frac{2^k (10n)^{(n+k)^2}}{((n+k)^2)!} \ll 2^{-2n}. $$

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Milovanov, A. (2017). On Algorithmic Statistics for Space-Bounded Algorithms. In: Weil, P. (eds) Computer Science – Theory and Applications. CSR 2017. Lecture Notes in Computer Science(), vol 10304. Springer, Cham. https://doi.org/10.1007/978-3-319-58747-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58747-9_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58746-2

  • Online ISBN: 978-3-319-58747-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics