Skip to main content

Complexity and Information Theory

  • Chapter
  • First Online:
Complex and Adaptive Dynamical Systems
  • 919 Accesses

Abstract

What do we mean when by saying that a given system shows “complex behavior”, can we provide precise measures for the degree of complexity? This chapter offers an account of several common measures of complexity and the relation of complexity to predictability and emergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In some areas, like the neurosciences or artificial intelligence, the term “Bayesian” is used for approaches using statistical methods, in particular in the context of hypothesis building, when estimates of probability distribution functions are derived from observations.

  2. 2.

    The expression \(p(x_i)\) is therefore context specific and can denote both a properly normalized discrete distribution function as well as the value of a continuous probability distribution function.

  3. 3.

    In formal texts on statistics and information theory the notation \(\mu=E(X)\) is often used for the mean μ, the expectation value \(E(X)\) and a random variable X, where X represents the abstract random variable, whereas x denotes its particular value and \(p_X(x)\) the probability distribution.

  4. 4.

    Please take note of the difference between a cumulative stochastic process, when adding the results of individual trials, and the “cumulative PDF” \(F(x)\) defined by \(F(x)=\int_{-\infty}^x p(x')dx'\).

  5. 5.

    For continuous-time data, as for an electrocardiogram, an additional symbolization step is necessary, the discretization of time. Here we consider however only discrete-time series.

  6. 6.

    Remember, that \(\textrm{XOR}(0,0)=0=\textrm{XOR}(1,1)\) and \(\textrm{XOR}(0,1)=1=\textrm{XOR}(1,0)\).

  7. 7.

    A function \(f(x)\) is a function of a variable x; a functional \(F[f]\) is, on the other hand, functionally dependent on a function \(f(x)\). In formal texts on information theory the notation \(H(X)\) is often used for the Shannon entropy and a random variable X with probability distribution \(p_X(x)\).

  8. 8.

    For a proof consider the generic substitution \(x\to q(x)\) and a transformation of variables \(x\to q\) via \(dx=dq/q'\), with \(q'=dq(x)/dx\), for the integration in Eq. (3.43).

Further Reading

  • We recommend for further readings introductions to information theory (Cover and Thomas, 2006), to Bayesian statistics (Bolstad, 2004), to complex system theory in general (Boccara, 2003), and to algorithmic complexity (Li and Vitanyi, 1997)

    Google Scholar 

  • For further studies we recommend several review articles, on evolutionary development of complexity in organisms (Adami, 2002), on complexity and predictability (Boetta et al., 2003), a critical assessement of various complexity measures (Olbrich et al., 2008) and a thoughtful discussion on various approaches to the notion of complexity (Manson, 2001).

    Google Scholar 

  • For some further, somewhat more specialized topics, we recommend Binder (2008) for a perspective on the interplay between dynamical frustration and complexity, Binder (2009) for the question of decidability in complex systems, and Tononi and Edelman (1998) on possible interrelations between consciousness and complexity.

    Google Scholar 

  • Adami, C. 2002 What is complexity? BioEssays 24, 1085–1094.

    Article  Google Scholar 

  • Binder, P.-M. 2008 Frustration in complexity. Science 320, 322–323.

    Article  Google Scholar 

  • Binder, P.-M. 2009 The edge of reductionism. Nature 459, 332–334.

    Article  ADS  Google Scholar 

  • Boccara, N. 2003 Modeling Complex Systems. Springer, Berlin.

    Google Scholar 

  • Boetta, G., Cencini, M., Falcioni, M., Vulpiani, A. 2002 Predictability: A way to characterize complexity. Physics Reports 356, 367–474.

    Article  MathSciNet  ADS  Google Scholar 

  • Bolstad, W.M. 2004 Introduction to Bayesian Statistics. Wiley-IEEE, Hoboken, NJ.

    Book  MATH  Google Scholar 

  • Cover, T.M., Thomas, J.A. 2006 Elements of Information Theory. Wiley-Interscience, Hoboken, NJ.

    MATH  Google Scholar 

  • Li, M., Vitanyi, P.M.B. 1997 An introduction to Kolmogorov Complexity and its Applications. Springer, Berlin.

    MATH  Google Scholar 

  • Manson, S.M. 2001 Simplifying complexity: A review of complexity theory. Geoforum 32, 405–414.

    Article  Google Scholar 

  • Olbrich, E., Bertschinger, N., Ay, N., Jost, J. 2008 How should complexity scale with system size? The European Physical Journal B 63, 407–415.

    Article  MathSciNet  ADS  MATH  Google Scholar 

  • Tononi, G., Edelman, G.M. 1998 Consciousness and complexity. Science 282, 1846.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudius Gros .

Exercises

Exercises

3.1.1 The Law of Large Numbers

Generalize the derivation for the law of large numbers given in Sect. 3.1.1 for the case of \(i=1,\dots,N\) independent discrete stochastic processes \(p^{(i)}_k\), described by their respective generating functionals \(G_i(x)=\sum_k p^{(i)}_k x^k\).

3.1.2 Symbolization of Financial Data

Generalize the symbolization procedure defined for the joint probabilities \(p_{\pm\pm}\) defined by Eq. (3.15) to joint probabilities \(p_{\pm\pm\pm}\). E.g. \(p_{+++}\) would measure the probability of three consecutive increases. Download from the Internet the historical data for your favorite financial asset, like the Dow Jones or the Nasdaq stock indices, and analyze it with this symbolization procedure. Discuss, whether it would be possible, as a matter of principle, to develop in this way a money-making scheme.

3.1.3 The OR Time Series with Noise

Consider the time series generated by a logical OR, akin to Eq. (3.16). Evaluate the probability \(p(1)\) for finding a 1, with and without averaging over initial conditions, both without and in presence of noise. Discuss the result.

3.1.4 Maximal Entropy Distribution Function

Determine the probability distribution function \(p(x)\), having a given mean μ and a given variance σ 2, compare Eq. (3.32), which maximizes the Shannon entropy.

3.1.5 Two-Channel Markov Process

Consider, in analogy to Eq. (3.34) the two-channel Markov process \(\{\sigma_t,\tau_t\}\),

$$\sigma_{t+1}\ =\ \mathit{AND}(\sigma_t,\tau_t), \qquad \tau_{t+1}\ =\ \left\{ \begin{array}{rcl} OR(\sigma_t,\tau_t) &\quad& \textrm{probability}\ 1-\alpha \\ \neg OR(\sigma_t,\tau_t) &\quad& \textrm{probability}\ \alpha \end{array} \right..$$

Evaluate the joint and marginal distribution functions, the respective entropies and the resulting mutual information. Discuss the result as a function of noise strength α.

3.1.6 Kullback-Leibler Divergence

Try to approximate an exponential distribution function by a scale-invariant PDF, considering the Kullback-Leibler divergence \(K[p;q]\), Eq. (3.45), for the two normalized PDFs

$$p(x)\ =\ \textrm{e}^{-(x-1)},\qquad q(x)\ =\ \frac{\gamma-1}{x^\gamma},\qquad x,\,\gamma\,>\,1.$$

Which exponent γ minimizes \(K[p;q]\)? How many times do the graphs for \(p(x)\) and \(q(x)\) cross?

3.1.7 Chi-Squared Test

The quantity

$$\chi^2[p;q]\ =\ \sum_{i=1}^N \frac{(p_i-q_i)^2}{p_i}$$
((3.54))

measures the similarity of two normalized probability distribution functions p i and q i . Show, that the Kullback-Leibler divergence \(K[p;q]\), Eq. (3.45), reduces to \(\chi^2[p;q]/2\) if the two distributions are quite similar.

3.1.8 Excess Entropy

Use the representation

$$E = \lim_{n\to\infty} E_n,\qquad E_n\ \approx\ H[p_n]\,-\,n\big(H[p_{n+1}]-H[p_{n}]\big)$$

to prove that \(E\ge0\), compare Eqs. (3.51) and (3.53), as long as \(H[p_n]\) is concave as a function of n.

3.1.9 Tsallis Entropy

The “Tsallis Entropy”

$$H_q[p] \ =\ \frac{1}{1-q}\sum_k\left[ \big(p_k\big)^q-p_k\right], \qquad 0 <q\le 1$$

of a probability distribution function p is a popular non-extensive generalization of the Shannon entropy \(H[p]\). Prove that

$$\lim_{q\to1} H_q[p] \ =\ H[p], \qquad H_q[p]\ \ge\ 0,$$

and the non-extensiveness

$$H_q[p]\ =\ H_q[p_X]+ H_q[p_Y]+(1-q)\,H_q[p_X]\, H_q[p_Y], \qquad p = p_Xp_Y$$

for two statistically independent systems X and Y. For which distribution function p is \(H_q[p]\) maximal?

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gros, C. (2011). Complexity and Information Theory. In: Complex and Adaptive Dynamical Systems. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04706-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04706-0_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04705-3

  • Online ISBN: 978-3-642-04706-0

  • eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics