Complexity and Information Theory

Gros, Claudius

doi:10.1007/978-3-642-04706-0_3

Claudius Gros²

919 Accesses

Abstract

What do we mean when by saying that a given system shows “complex behavior”, can we provide precise measures for the degree of complexity? This chapter offers an account of several common measures of complexity and the relation of complexity to predictability and emergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In some areas, like the neurosciences or artificial intelligence, the term “Bayesian” is used for approaches using statistical methods, in particular in the context of hypothesis building, when estimates of probability distribution functions are derived from observations.
2.
The expression $p(x_i)$ is therefore context specific and can denote both a properly normalized discrete distribution function as well as the value of a continuous probability distribution function.
3.
In formal texts on statistics and information theory the notation $\mu=E(X)$ is often used for the mean μ, the expectation value $E(X)$ and a random variable X, where X represents the abstract random variable, whereas x denotes its particular value and $p_X(x)$ the probability distribution.
4.
Please take note of the difference between a cumulative stochastic process, when adding the results of individual trials, and the “cumulative PDF” $F(x)$ defined by $F(x)=\int_{-\infty}^x p(x')dx'$.
5.
For continuous-time data, as for an electrocardiogram, an additional symbolization step is necessary, the discretization of time. Here we consider however only discrete-time series.
6.
Remember, that $\textrm{XOR}(0,0)=0=\textrm{XOR}(1,1)$ and $\textrm{XOR}(0,1)=1=\textrm{XOR}(1,0)$.
7.
A function $f(x)$ is a function of a variable x; a functional $F[f]$ is, on the other hand, functionally dependent on a function $f(x)$. In formal texts on information theory the notation $H(X)$ is often used for the Shannon entropy and a random variable X with probability distribution $p_X(x)$.
8.
For a proof consider the generic substitution $x\to q(x)$ and a transformation of variables $x\to q$ via $dx=dq/q'$, with $q'=dq(x)/dx$, for the integration in Eq. (3.43).

Author information

Authors and Affiliations

Universität Frankfurt, Institut für Theoretische Physik, Max-von-Laue-Str. 1, 60438, Frankfurt, Germany
Prof. Dr. Claudius Gros

Authors

Prof. Dr. Claudius Gros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudius Gros .

Exercises

3.1.1 The Law of Large Numbers

Generalize the derivation for the law of large numbers given in Sect. 3.1.1 for the case of $i=1,\dots,N$ independent discrete stochastic processes $p^{(i)}_k$, described by their respective generating functionals $G_i(x)=\sum_k p^{(i)}_k x^k$.

3.1.2 Symbolization of Financial Data

Generalize the symbolization procedure defined for the joint probabilities $p_{\pm\pm}$ defined by Eq. (3.15) to joint probabilities $p_{\pm\pm\pm}$. E.g. $p_{+++}$ would measure the probability of three consecutive increases. Download from the Internet the historical data for your favorite financial asset, like the Dow Jones or the Nasdaq stock indices, and analyze it with this symbolization procedure. Discuss, whether it would be possible, as a matter of principle, to develop in this way a money-making scheme.

3.1.3 The OR Time Series with Noise

Consider the time series generated by a logical OR, akin to Eq. (3.16). Evaluate the probability $p(1)$ for finding a 1, with and without averaging over initial conditions, both without and in presence of noise. Discuss the result.

3.1.4 Maximal Entropy Distribution Function

Determine the probability distribution function $p(x)$, having a given mean μ and a given variance σ ², compare Eq. (3.32), which maximizes the Shannon entropy.

3.1.5 Two-Channel Markov Process

Consider, in analogy to Eq. (3.34) the two-channel Markov process $\{\sigma_t,\tau_t\}$,

$$\sigma_{t+1}\ =\ \mathit{AND}(\sigma_t,\tau_t), \qquad \tau_{t+1}\ =\ \left\{ \begin{array}{rcl} OR(\sigma_t,\tau_t) &\quad& \textrm{probability}\ 1-\alpha \\ \neg OR(\sigma_t,\tau_t) &\quad& \textrm{probability}\ \alpha \end{array} \right..$$

Evaluate the joint and marginal distribution functions, the respective entropies and the resulting mutual information. Discuss the result as a function of noise strength α.

3.1.6 Kullback-Leibler Divergence

Try to approximate an exponential distribution function by a scale-invariant PDF, considering the Kullback-Leibler divergence $K[p;q]$, Eq. (3.45), for the two normalized PDFs

$$p(x)\ =\ \textrm{e}^{-(x-1)},\qquad q(x)\ =\ \frac{\gamma-1}{x^\gamma},\qquad x,\,\gamma\,>\,1.$$

Which exponent γ minimizes $K[p;q]$? How many times do the graphs for $p(x)$ and $q(x)$ cross?

3.1.7 Chi-Squared Test

The quantity

$$\chi^2[p;q]\ =\ \sum_{i=1}^N \frac{(p_i-q_i)^2}{p_i}$$

((3.54))

measures the similarity of two normalized probability distribution functions p _i and q _i. Show, that the Kullback-Leibler divergence $K[p;q]$, Eq. (3.45), reduces to $\chi^2[p;q]/2$ if the two distributions are quite similar.

3.1.8 Excess Entropy

Use the representation

$$E = \lim_{n\to\infty} E_n,\qquad E_n\ \approx\ H[p_n]\,-\,n\big(H[p_{n+1}]-H[p_{n}]\big)$$

to prove that $E\ge0$, compare Eqs. (3.51) and (3.53), as long as $H[p_n]$ is concave as a function of n.

3.1.9 Tsallis Entropy

The “Tsallis Entropy”

$$H_q[p] \ =\ \frac{1}{1-q}\sum_k\left[ \big(p_k\big)^q-p_k\right], \qquad 0 <q\le 1$$

of a probability distribution function p is a popular non-extensive generalization of the Shannon entropy $H[p]$. Prove that

$$\lim_{q\to1} H_q[p] \ =\ H[p], \qquad H_q[p]\ \ge\ 0,$$

and the non-extensiveness

$$H_q[p]\ =\ H_q[p_X]+ H_q[p_Y]+(1-q)\,H_q[p_X]\, H_q[p_Y], \qquad p = p_Xp_Y$$

for two statistically independent systems X and Y. For which distribution function p is $H_q[p]$ maximal?

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gros, C. (2011). Complexity and Information Theory. In: Complex and Adaptive Dynamical Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04706-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-04706-0_3
Published: 26 August 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04705-3
Online ISBN: 978-3-642-04706-0
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics

Complexity and Information Theory

Abstract

Access this chapter

Notes

Further Reading

Author information

Authors and Affiliations

Corresponding author

Exercises

3.1.1 The Law of Large Numbers

3.1.2 Symbolization of Financial Data

3.1.3 The OR Time Series with Noise

3.1.4 Maximal Entropy Distribution Function

3.1.5 Two-Channel Markov Process

3.1.6 Kullback-Leibler Divergence

3.1.7 Chi-Squared Test

3.1.8 Excess Entropy

3.1.9 Tsallis Entropy

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Complexity and Information Theory

Abstract

Access this chapter

Notes

Further Reading

Author information

Authors and Affiliations

Corresponding author

Exercises

Exercises

3.1.1 The Law of Large Numbers

3.1.2 Symbolization of Financial Data

3.1.3 The OR Time Series with Noise

3.1.4 Maximal Entropy Distribution Function

3.1.5 Two-Channel Markov Process

3.1.6 Kullback-Leibler Divergence

3.1.7 Chi-Squared Test

3.1.8 Excess Entropy

3.1.9 Tsallis Entropy

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation