Directed Information Flow and Causality in Neural Systems
KeywordsMutual Information Directed Information Granger Causality Information Measure Normal Random Variable
In the human experience, information typically flows from one place to another. By contrast, the notion of mutual information introduced by Shannon (1948) is perfectly symmetric in its arguments and does not distinguish between “input” and “output.” In this sense, it is perhaps surprising that this very measure of information indeed captures the capacity of any communication channel – though we should recall that the proof of this fundamental fact is not merely a simple consequence of Shannon’s definition.
In spite of Shannon’s strong and fundamental results, it has been tempting to define a notion of directed information. This was first proposed in (Marko 1973) for stationary processes. The more general and useful definition was given in the brief and beautiful note by Massey (1990). Moreover, Massey (1990), Kramer (1998) and subsequent work revealed that directed information has a natural place in the study of information transmission with feedback from the output to the input.
In the present note, a different aspect of directed information is illuminated: that of identifying causal relationships. The basic idea is that if there is a causal relationship from one process to another, the directed information in the forward direction should be large. Additionally, one may also require the directed information in the reverse direction to be small.
It is important, however, to observe that inferring causality from observations is an ill-posed question. This has been discussed in depth in the context of Granger causality (Granger 1969), where a rich literature exposes several fundamental issues. The same qualitative issues apply to any causality argument based on directed information measures. The difference is that Granger causality uses correlation measures, whereas here, we consider directed information.
The usual mutual information can be defined between any two random variables. For directed information, however, we need to consider ordered sequences of random variables.
In this example, the reverse directed information vanishes: I(Y → X) = 0.
Finally, we also note that another 10 years later, directed information was again discovered (interestingly again with a stationarity assumption) under the name of transfer entropy (Schreiber et al. 2000).
Directed Information in Networks
As with all information measures, the reverse is not true. More precisely, let us reconsider the scenario just discussed involving the sequences X, Y, and Z. Then, even if both I(X → Y) and I(Y → Z) are large, this does not imply any lower bound on I(X → Z); in fact, the latter might even be zero.
Information Measure of Causality
Directed information can be postulated to be a measure of causality in the following sense: One claims a causal relationship from X to Y if the directed information I(X → Y) is large. In the explicit example involving normal random variables discussed above, this is easy to see: The directed information from X to Y is large, and indeed, for this example, we would expect any rationale to conclude that the sequence X drives the sequence Y in a causal fashion (at least as long as α is close to one). Moreover, in this example, I(Y → X) = 0, ruling out any causal relationship in the reverse direction.
The remaining issue is to define a threshold on the directed information above which the relationship is claimed to be causal. There is no intuitive a priori rule, and in most cases, the threshold must be selected arbitrarily or using additional knowledge of existing causal connections, e.g., from physiological insight into the considered connection. To make matters more complicated, it is also important to notice that due to the nonnegativity of directed information, most classical estimators must be expected to have a positive bias.
It should also be noted that this is closely related in spirit to the notion of Granger causality (Granger 1969). In the latter, correlation (i.e., second-order statistics) is exploited to claim causality. Directed information, by contrast, is sensitive to the full probability distribution in the usual entropy sense. Note that when directed information is considered with normal distributions, it is closely related to Granger causality.
Directed information measures have been recently applied to simultaneous recordings in the primary motor cortex of rodents and macaque monkeys (Quinn et al. 2011; So et al. 2012), leading to conjectured causality maps (directed graphs) between the observed neurons. An additional rich literature concerns the transfer entropy mentioned above, but is discussed elsewhere.
- Kramer G (1998) Directed information for channels with feedback, vol 11, ETH series in information processing. HartungGorre, KonstanzGoogle Scholar
- Massey JL (1990) Causality, feedback and directed information. In: Proceedings of the 1990 international symposium on information theory and its applications, Hawaii, pp 303–305Google Scholar
- Massey JL, Massey PC (2005) Conservation of mutual and directed information. In: Proceedings of the 2005 international symposium on information theory, Adelaide, pp 157–158Google Scholar