On Hierarchical Compression and Power Laws in Nature

Franz, Arthur

doi:10.1007/978-3-319-63703-7_8

On Hierarchical Compression and Power Laws in Nature

Arthur Franz¹⁶

Conference paper
First Online: 15 July 2017

1676 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10414))

Abstract

Since compressing data incrementally by a non-branching hierarchy has resulted in substantial efficiency gains for performing induction in previous work, we now explore branching hierarchical compression as a means for solving induction problems for generally intelligent systems. Even though assuming the compositionality of data generation and the locality of information may result in a loss of the universality of induction, it has still the potential to be general in the sense of reflecting the inherent structure of real world data imposed by the laws of physics. We derive a proof that branching compression hierarchies (BCHs) create power law functions of mutual algorithmic information between two strings as a function of their distance – a ubiquitous characteristic of natural data, which opens the possibility of efficient natural data compression by BCHs. Further, we show that such hierarchies guarantee the existence of short features in the data which in turn increases the efficiency of induction even more.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For notation and definitions please consult the Preliminaries section below.

References

Solomonoff, R.J.: A formal theory of inductive inference. Part I. Inf. Control 7(1), 1–22 (1964)
Article MathSciNet MATH Google Scholar
Solomonoff, R.J.: A formal theory of inductive inference. Part II. Inf. Control 7(2), 224–254 (1964)
Article MathSciNet MATH Google Scholar
Lin, H.W., Tegmark, M.: Why does deep and cheap learning work so well? arXiv preprint arXiv:1608.08225 (2016)
Franz, A.: Some theorems on incremental compression. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS, vol. 9782, pp. 74–83. Springer, Cham (2016). doi:10.1007/978-3-319-41649-6_8
Google Scholar
Bak, P.: How Nature Works: The Science of Self-organized Criticality. Copernicus, New York (1996)
Book MATH Google Scholar
Saremi, S., Sejnowski, T.J.: Hierarchical model of natural images and the origin of scale invariance. Proc. Natl. Acad. Sci. 110(8), 3071–3076 (2013)
Article MathSciNet MATH Google Scholar
Lin, H.W., Tegmark, M.: Critical behavior from deep dynamics: a hidden dimension in natural language. arXiv preprint arXiv:1606.06737 (2016)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, New York (2009)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Independent Researcher, Odessa, Ukraine
Arthur Franz

Authors

Arthur Franz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Franz .

Editor information

Editors and Affiliations

Australian National University , Canberra, Aust Capital Terr, Australia
Tom Everitt
OpenCog Foundation , Hong Kong, China
Ben Goertzel
St. Petersburg State University , St. Petersburg, Russia
Alexey Potapov

A Proofs

Proof

(Lemma 1 ). Recall that $f_{l}$ and $p_{l}$ are the shortest feature and parameter of $q_{l-1}$ and therefore independent, $K(q_{l-1})\mathop {=}\limits ^{+}l(f_{l})+K(p_{l})$, as was proven in [4, Corrolary 2]. From Eq. (3.1) we obtain

$$\begin{aligned} \begin{aligned} K(q_{0})&\mathop {=}\limits ^{+}l(f_{1})+K(p_{1})\mathop {=}\limits ^{+}l(f_{1})+\alpha _{1}K(q_{1})\mathop {=}\limits ^{+}l(f_{1})+\alpha _{1}\left( l(f_{2})+\alpha _{2}K(q_{2})\right) \\&\mathop {=}\limits ^{+} K(q_{h})\prod _{l=1}^{h}\alpha _{l}+\sum _{m=1}^{h}l(f_{m})\prod _{l=1}^{m-1}\alpha _{l} \end{aligned} \end{aligned}$$

(A.1)

Since $f_{l}$ and $p_{l}$ cannot be made dependent by conditioning, we get $K(q_{l-1}|q_{h})\mathop {=}\limits ^{+}K(f_{l}|q_{h})+K(p_{l}|q_{h})$. Due to assumption (2), the first term becomes $K(f_{l}|q_{h})=K(f_{l})\mathop {=}\limits ^{+}l(f_{l})$. Therefore, the conditional version can be computed analogously to Eq. (A.1):

$$\begin{aligned} K(q_{0}|q_{h})\mathop {=}\limits ^{+}K(q_{h}|q_{h})\prod _{l=1}^{h}\alpha _{l}+\sum _{m=1}^{h}l(f_{m})\prod _{l=1}^{m-1}\alpha _{l} \end{aligned}$$

(A.2)

However, since $K(q_{h}|q_{h})=O(1)$ we obtain for the information in $q_{h}$ about $q_{0}$:

$$\begin{aligned} I(q_{h}:q_{0})\equiv K(q_{0})-K(q_{0}|q_{h})\mathop {=}\limits ^{+}K(q_{h})\prod _{l=1}^{h}\alpha _{l} \end{aligned}$$

$\square $

Proof

(Lemma 2 ). We can in general expand [8, Theorem 3.9.1, p. 247]

$$\begin{aligned} K(y,z|a)\mathop {=}\limits ^{+}K(y|a)+K(z|y,K(y),a) \end{aligned}$$

and insert it into the independence relation Eq. 3.4. This leads to

$$\begin{aligned} K(z|a)\mathop {=}\limits ^{+}K(z|y,K(y),a)\mathop {\le }\limits ^{+}K(z|y) \end{aligned}$$

where the last inequality follows from the fact that conditioning can only reduce the description length of z [8, Theorem 2.1.2, p. 108]. Subtracting this inequality from K(z) yields $K(z)-K(z|a)\mathop {\ge }\limits ^{+}K(z)-K(z|y)$. Now we insert the definition of mutual information $I(a:z)\equiv K(z)-K(z|a)$ on both sides from which the claim follows. $\square $

Proof

(Theorem 1 ). First, from the result in Eq. (3.3) and Lemma 2 it follows that $I(x_{i}:x_{j})$ decays exponentially with the height h of their common ancestor $q_{h}$

$$\begin{aligned} I(x_{i}:x_{j})\mathop {\le }\limits ^{+}K(q_{h})\cdot \prod _{l=1}^{h}\alpha _{l} \end{aligned}$$

(A.3)

under our assumptions. Consider that the maximal index distance between leaves in a perfect tree increases exponentially with the height h of the common ancestor:

$$\begin{aligned} d_{ij}<\prod _{l=1}^{h}\hat{b}_{l} \end{aligned}$$

(A.4)

where $\hat{b}_{l}$ is the average branching factor at level l of the tree. By defining the total average branching factor $\bar{b}\equiv \left( \prod _{l=1}^{h}\hat{b}_{l}\right) ^{1/h}>d_{ij}^{1/h}$, we can solve for $h>\log _{\bar{b}}(d_{ij})$ and compute:

$$\begin{aligned} \log _{\bar{b}}\left( \prod _{l=1}^{h}\alpha _{l}\right) <\sum _{l=1}^{\log _{\bar{b}}(d_{ij})}\log _{\bar{b}}(\alpha _{l})=-\sum _{l=1}^{\log _{\bar{b}}(d_{ij})}\nu _{l}=-\left\langle v\right\rangle \log _{\bar{b}}(d_{ij})=\log _{\bar{b}}\left( d_{ij}^{-\left\langle \nu \right\rangle }\right) \end{aligned}$$

where $\nu _{l}\equiv \log _{\bar{b}}(1/\alpha _{l})>0$. Inserting this into Eq. A.3 concludes the proof. $\square $

Proof

(Lemma 3 ). Consider the general expansion [8, Theorem 3.9.1, p. 247]

$$\begin{aligned} K(xy)\mathop {=}\limits ^{+}K(x)+K(y|x,K(x)) \end{aligned}$$

I is defined by $I(x:y)\equiv K(y)-K(y|x)$ and is larger than zero by assumption. Since in general $K(y|x,K(x))\mathop {\le }\limits ^{+}K(y|x)$ we obtain

$$\begin{aligned} \begin{aligned} K(xy)&\mathop {=}\limits ^{+}K(x)+K(y)+K(y|x,K(x))-K(y|x)-I(x:y)\\&\mathop {<}\limits ^{+}K(x)+K(y)\mathop {\le }\limits ^{+}l(x)+l(y)=l(xy) \end{aligned} \end{aligned}$$

$\square $

Proof

(Theorem 2 ). Since y is $l(\lambda )$-compressible by q, $\lambda (q,p)=U\left( \left\langle \lambda ,q,p\right\rangle \right) =x$ and $l(x)=l(y)+l(p)$, x is compressible as well:

$$\begin{aligned} K(x)\le l(\lambda )+l(q)+l(p)=l(\lambda )+K(y)+l(x)-l(y)<l(x) \end{aligned}$$

We define $f\equiv \left\langle \lambda ,q\right\rangle $ and obtain $U(\left\langle f,p\right\rangle )=f(p)=x$ – the main feature equation. We can define the descriptive map $f'$ by a function that removes y from x to obtain the remainder p: $f'(x)=p$. It suffices if it does so for that particular x and y, not in general.

From fs definition, we get $l(f)=l(\lambda )+l(q)=l(\lambda )+K(y)<l(y)$ since y is $l(\lambda )$-compressible by assumption. It follows that the (f, p)-pair compresses x at least to some extent, $l(f)+l(p)<l(y)+l(p)=l(x)$. Therefore, f is indeed a feature of x and its length is bounded by l(y). $\square $

Proof

(Theorem 3 ). In general, the relation $K(p)\mathop {\le }\limits ^{+}K(p|z)+K(z)$ is valid, since if p is computable by a detour via z, its shortest program without the detour can only be shorter. Setting $z=K(x)$ and conditioning on x leads to

$$\begin{aligned} K(p|x)\mathop {\le }\limits ^{+}K(p|K(x),x)+K(K(x)|x) \end{aligned}$$

(A.5)

The conditioning operation is not valid in general, however the detour argument is still valid in this case. Since $K(p|x)=l(f')$ [4, Lemma 1(2)] and $K(p|K(x),x)=O(1)$ [4, Theorem 3(3)], we get

$$\begin{aligned} l(f')\mathop {\le }\limits ^{+}K(K(x)|x) \end{aligned}$$

(A.6)

We now insert the “complexity of the complexity” expression in [8, Lemma 3.9.2, Eq. (3.18)] $K(K(x)|x)\mathop {\le }\limits ^{+}\log K(x)+2\log \log K(x)$ and the first claim follows. The second claim is a property of K(K(x)|x) [8, Eq. (3.13)] and therefore also holds for $l(f')$. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Franz, A. (2017). On Hierarchical Compression and Power Laws in Nature. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-63703-7_8
Published: 15 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63702-0
Online ISBN: 978-3-319-63703-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proofs

A Proofs

Proof

Proof

Proof

Proof

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation