# Directed Information Flow and Causality in Neural Systems

**DOI:**https://doi.org/10.1007/978-1-4614-7320-6_141-1

## Keywords

Mutual Information Directed Information Granger Causality Information Measure Normal Random Variable## Definition

In the human experience, information typically flows from one place to another. By contrast, the notion of mutual information introduced by Shannon (1948) is perfectly symmetric in its arguments and does not distinguish between “input” and “output.” In this sense, it is perhaps surprising that this very measure of information indeed captures the capacity of any communication channel – though we should recall that the proof of this fundamental fact is not merely a simple consequence of Shannon’s definition.

In spite of Shannon’s strong and fundamental results, it has been tempting to define a notion of *directed* information. This was first proposed in (Marko 1973) for stationary processes. The more general and useful definition was given in the brief and beautiful note by Massey (1990). Moreover, Massey (1990), Kramer (1998) and subsequent work revealed that directed information has a natural place in the study of information transmission with feedback from the output to the input.

In the present note, a different aspect of directed information is illuminated: that of identifying causal relationships. The basic idea is that if there is a causal relationship from one process to another, the directed information in the forward direction should be large. Additionally, one may also require the directed information in the reverse direction to be small.

It is important, however, to observe that inferring causality from observations is an ill-posed question. This has been discussed in depth in the context of Granger causality (Granger 1969), where a rich literature exposes several fundamental issues. The same qualitative issues apply to any causality argument based on directed information measures. The difference is that Granger causality uses correlation measures, whereas here, we consider directed information.

## Detailed Description

### Directed Information

The usual mutual information can be defined between any two random variables. For directed information, however, we need to consider *ordered sequences* of random variables.

*X*= (

*X*

_{1},

*X*

_{2}, …,

*X*

_{ N }) and

*Y*= (

*Y*

_{1},

*Y*

_{2}, …,

*Y*

_{ N }) of equal length

*N*, the directed information is defined as

*I*(·;·) denotes the regular Shannon mutual information. Let us first record that this is always nonnegative:

*I*(

*X*→

*Y*) ≥ 0, which follows directly from the definition and the fact that Shannon mutual information is always nonnegative. Second, we note that the directed information is never larger than the regular Shannon mutual information between the two sequences.

*I*(

*X*→

*Y*) and the reverse directed information

*I*(

*Y*→

*X*). For general probabilistic models, there does not appear to be an intuitively pleasing connection between these two. However, if one considers instead the following new random sequence

*Ỹ*= (0,

*Y*

_{1},

*Y*

_{2}, …,

*Y*

_{ N−1}), then the following conservation law applies (Massey et al. 2005):

*X*and

*Y*.

*X*

_{1},

*X*

_{2}, …,

*X*

_{ N }) are independent normal (Gaussian) random variables of mean zero and variance 1. Let

*Y*

_{1}also be a mean-zero unit-variance normal. For

*n*≥ 2, we set

*Y*

_{ n }=

*αX*

_{ n−1}+

*W*

_{ n }, where

*W*

_{ n }are independent normal random variables of mean zero and variance 1 −

*α*

^{2}and 0 ≤

*α*≤ 1. Then, the directed information evaluates to

In this example, the reverse directed information vanishes: *I*(*Y* → *X*) = 0.

Finally, we also note that another 10 years later, directed information was again discovered (interestingly again with a stationarity assumption) under the name of transfer entropy (Schreiber et al. 2000).

### Directed Information in Networks

*causal*processing of the output signal. That is, if

*Z*is a sequence obtained by processing the sequence

*Y*in a causal fashion, possibly involving additional independent randomness, then we have that

As with all information measures, the reverse is not true. More precisely, let us reconsider the scenario just discussed involving the sequences *X*, *Y*, and *Z*. Then, even if both *I*(*X* → *Y*) and *I*(*Y* → *Z*) are large, this *does not* imply any lower bound on *I*(*X* → *Z*); in fact, the latter might even be zero.

### Information Measure of Causality

Directed information can be postulated to be a measure of causality in the following sense: One claims a causal relationship from *X* to *Y* if the directed information *I*(*X* → *Y*) is large. In the explicit example involving normal random variables discussed above, this is easy to see: The directed information from *X* to *Y* is large, and indeed, for this example, we would expect any rationale to conclude that the sequence *X* drives the sequence *Y* in a causal fashion (at least as long as *α* is close to one). Moreover, in this example, *I*(*Y* → *X*) = 0, ruling out any causal relationship in the reverse direction.

The remaining issue is to define a threshold on the directed information above which the relationship is claimed to be causal. There is no intuitive a priori rule, and in most cases, the threshold must be selected arbitrarily or using additional knowledge of existing causal connections, e.g., from physiological insight into the considered connection. To make matters more complicated, it is also important to notice that due to the nonnegativity of directed information, most classical estimators must be expected to have a positive bias.

It should also be noted that this is closely related in spirit to the notion of Granger causality (Granger 1969). In the latter, *correlation* (i.e., second-order statistics) is exploited to claim causality. Directed information, by contrast, is sensitive to the full probability distribution in the usual entropy sense. Note that when directed information is considered with normal distributions, it is closely related to Granger causality.

### Experimental Studies

Directed information measures have been recently applied to simultaneous recordings in the primary motor cortex of rodents and macaque monkeys (Quinn et al. 2011; So et al. 2012), leading to conjectured causality maps (directed graphs) between the observed neurons. An additional rich literature concerns the transfer entropy mentioned above, but is discussed elsewhere.

## References

- Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424438CrossRefGoogle Scholar
- Kramer G (1998) Directed information for channels with feedback, vol 11, ETH series in information processing. HartungGorre, KonstanzGoogle Scholar
- Marko H (1973) The bidirectional communication theory a generalization of information theory. IEEE Trans Commun 21:1345–1351CrossRefGoogle Scholar
- Massey JL (1990) Causality, feedback and directed information. In: Proceedings of the 1990 international symposium on information theory and its applications, Hawaii, pp 303–305Google Scholar
- Massey JL, Massey PC (2005) Conservation of mutual and directed information. In: Proceedings of the 2005 international symposium on information theory, Adelaide, pp 157–158Google Scholar
- Quinn C, Coleman TP, Kiyavash N, Hatsopoulos NG (2011) Estimating the directed information to infer causal relationships in ensemble neural spike train recordings. J Comput Neurosci. doi:10.1007/s10827-010-0247-2PubMedCentralPubMedGoogle Scholar
- Schreiber T (2000) Measuring information transfer. Phys Rev Lett 85:461–464PubMedCrossRefGoogle Scholar
- Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, and 623–656CrossRefGoogle Scholar
- So K, Koralek AC, Ganguly K, Gastpar MC, Carmena JM (2012) Assessing functional connectivity of neural ensembles using directed information. J Neural Eng 9:026004PubMedCrossRefGoogle Scholar