Information Theory of Decisions and Actions

  • Naftali TishbyEmail author
  • Daniel Polani
Part of the Springer Series in Cognitive and Neural Systems book series (SSCNS)


The perception–action cycle is often defined as “the circular flow of information between an organism and its environment in the course of a sensory guided sequence of actions towards a goal” (Fuster, Neuron 30:319–333, 2001; International Journal of Psychophysiology 60(2):125–132, 2006). The question we address in this chapter is in what sense this “flow of information” can be described by Shannon’s measures of information introduced in his mathematical theory of communication. We provide an affirmative answer to this question using an intriguing analogy between Shannon’s classical model of communication and the perception–action cycle. In particular, decision and action sequences turn out to be directly analogous to codes in communication, and their complexity – the minimal number of (binary) decisions required for reaching a goal – directly bounded by information measures, as in communication. This analogy allows us to extend the standard reinforcement learning framework. The latter considers the future expected reward in the course of a behaviour sequence towards a goal (value-to-go). Here, we additionally incorporate a measure of information associated with this sequence: the cumulated information processing cost or bandwidth required to specify the future decision and action sequence (information-to-go). Using a graphical model, we derive a recursive Bellman optimality equation for information measures, in analogy to reinforcement learning; from this, we obtain new algorithms for calculating the optimal trade-off between the value-to-go and the required information-to-go, unifying the ideas behind the Bellman and the Blahut–Arimoto iterations. This trade-off between value-to-go and information-to-go provides a complete analogy with the compression–distortion trade-off in source coding. The present new formulation connects seemingly unrelated optimization problems. The algorithm is demonstrated on grid world examples.


Mutual Information Bayesian Network Optimal Policy Reinforcement Learning Action Cycle 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to thank Jonathan Rubin for carrying out the simulations and the preparation of the corresponding diagrams.


  1. Ashby, W. R., (1956). An Introduction to Cybernetics. London: Chapman & Hall Ltd.Google Scholar
  2. Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E., (2008). Predictive Information and Explorative Behavior of Autonomous Robots. European Journal of Physics B, 63:329–339.CrossRefGoogle Scholar
  3. Ay, N., and Polani, D., (2008). Information Flows in Causal Networks. Advances in Complex Systems, 11(1):17–41.CrossRefGoogle Scholar
  4. Ay, N., and Wennekers, T., (2003). Dynamical Properties of Strongly Interacting Markov Chains. Neural Networks, 16(10):1483–1497.CrossRefPubMedGoogle Scholar
  5. Berger, T., (2003). Living Information Theory – The 2002 Shannon Lecture. IEEE Information Theory Society Newsletter, 53(1):1,6–19.Google Scholar
  6. Bialek, W., de Ruyter van Steveninck, R. R., and Tishby, N., (2007). Efficient representation as a design principle for neural coding and computation. [q-bio.NC].Google Scholar
  7. Bialek, W., Nemenman, I., and Tishby, N., (2001). Predictability, complexity and learning. Neural Computation, 13:2409–2463.CrossRefPubMedGoogle Scholar
  8. Brenner, N., Bialek, W., and de Ruyter van Steveninck, R., (2000). Adaptive rescaling optimizes information transmission. Neuron, 26:695–702.Google Scholar
  9. Cover, T. M., and Thomas, J. A., (1991). Elements of Information Theory. New York: Wiley.CrossRefGoogle Scholar
  10. Csiszár, I., and Körner, J., (1986). Information Theory: Coding Theorems for Discrete Memoryless Systems. Budapest: Academiai Kiado.Google Scholar
  11. Der, R., Steinmetz, U., and Pasemann, F., (1999). Homeokinesis – A new principle to back up evolution with learning. In Mohammadian, M., editor, Computational Intelligence for Modelling, Control, and Automation, vol. 55 of Concurrent Systems Engineering Series, 43–47. Amsterdam: IOS.Google Scholar
  12. Ellison, C., Mahoney, J., and Crutchfield, J., (2009). Prediction, Retrodiction, and the Amount of Information Stored in the Present. Journal of Statistical Physics, 136(6):1005–1034.CrossRefGoogle Scholar
  13. Engel, Y., Mannor, S., and Meir, R., (2003). Bayes meets Bellman: The Gaussian Process Approach to Temporal Difference Learning. In Proceedings of ICML 20, 154–161.Google Scholar
  14. Friston, K., (2009). The free-energy principle: a rough guide to the brain? Trends in Cognitive Sciences, 13(7):293–301.CrossRefPubMedGoogle Scholar
  15. Friston, K., Kilner, J., and Harrison, L., (2006). A free energy principle for the brain. Journal of Physiology-Paris, 100:70–87.CrossRefGoogle Scholar
  16. Fry, R. L., (2008). Computation by Neural and Cortical Systems. Presentation at the Workshop at CNS*2008, Portland, OR: Methods of Information Theory in Computational Neuroscience.Google Scholar
  17. Fuster, J. M., (2001). The Prefrontal Cortex – An Update: Time Is of the Essence. Neuron, 30:319–333.CrossRefPubMedGoogle Scholar
  18. Fuster, J. M., (2006). The cognit: A network model of cortical representation. International Journal of Psychophysiology, 60(2):125–132.CrossRefPubMedGoogle Scholar
  19. Gastpar, M., Rimoldi, B., and Vetterli, M., (2003). To Code, or Not to Code: Lossy Source-Channel Communication Revisited. IEEE Transactions on Information Theory, 49(5):1147– 1158.CrossRefGoogle Scholar
  20. Globerson, A., Stark, E., Vaadia, E., and Tishby, N., (2009). The Minimum Information principle and its application to neural code analysis. PNAS, 106(9):3490–3495.CrossRefPubMedGoogle Scholar
  21. Haken, H., (1983). Advanced synergetics. Berlin: Springer.Google Scholar
  22. Howard, R. A., (1966). Information value theory. IEEE Transactions on Systems Science and Cybernetics, SSC-2:22–26.Google Scholar
  23. Jung, T., and Polani, D., (2007). Kernelizing LSPE(λ). In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 1–5, Hawaii, 338–345.Google Scholar
  24. Kappen, B., Gomez, V., and Opper, M., (2009). Optimal control as a graphical model inference problem. arXiv:0901.0633v2 [cs.AI].Google Scholar
  25. Kelly, J. L., (1956). A New Interpretation of Information Rate. Bell System Technical Journal, 35:917–926.Google Scholar
  26. Klyubin, A., Polani, D., and Nehaniv, C., (2007). Representations of Space and Time in the Maximization of Information Flow in the Perception-Action Loop. Neural Computation, 19(9):2387–2432.CrossRefPubMedGoogle Scholar
  27. Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2004). Organization of the Information Flow in the Perception-Action Loop of Evolved Agents. In Proceedings of 2004 NASA/DoD Conference on Evolvable Hardware, 177–180. IEEE Computer Society.Google Scholar
  28. Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005a). All Else Being Equal Be Empowered. In Advances in Artificial Life, European Conference on Artificial Life (ECAL 2005), vol. 3630 of LNAI, 744–753. Berlin: Springer.Google Scholar
  29. Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005b). Empowerment: A Universal Agent-Centric Measure of Control. In Proceedings of the IEEE Congress on Evolutionary Computation, 2–5 September 2005, Edinburgh, Scotland (CEC 2005), 128–135. IEEE.Google Scholar
  30. Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2008). Keep Your Options Open: An Information-Based Driving Principle for Sensorimotor Systems. PLoS ONE, 3(12):e4018. URL:, Dec 2008.
  31. Laughlin, S. B., (2001). Energy as a constraint on the coding and processing of sensory information. Current Opinion in Neurobiology, 11:475–480.CrossRefPubMedGoogle Scholar
  32. Lizier, J., Prokopenko, M., and Zomaya, A., (2007). Detecting non-trivial computation in complex dynamics. In Almeida e Costa, F., Rocha, L. M., Costa, E., Harvey, I., and Coutinho, A., editors, Advances in Artificial Life (Proceedings of the ECAL 2007, Lisbon), vol. 4648 of LNCS, 895–904. Berlin: Springer.Google Scholar
  33. Lungarella, M., and Sporns, O., (2005). Information Self-Structuring: Key Principle for Learning and Development. In Proceedings of 4th IEEE International Conference on Development and Learning, 25–30. IEEE.Google Scholar
  34. Lungarella, M., and Sporns, O., (2006). Mapping Information Flow in Sensorimotor Networks. PLoS Computational Biology, 2(10):e144.CrossRefPubMedGoogle Scholar
  35. Massey, J., (1990). Causality, feedback and directed information. In Proceedings of the International Symposium on Information Theory and its Applications (ISITA-90), 303–305.Google Scholar
  36. McAllester, D. A., (1999). PAC-Bayesian model averaging. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, Santa Cruz, CA, 164–170. New York: ACM.Google Scholar
  37. Pearl, J., (2000). Causality: Models, Reasoning and Inference. Cambridge, UK: Cambridge University Press.Google Scholar
  38. Pfeifer, R., and Bongard, J., (2007). How the Body Shapes the Way We think: A New View of Intelligence. Bradford Books.Google Scholar
  39. Polani, D., (2009). Information: Currency of Life?. HFSP Journal, 3(5):307–316. URL:, Nov 2009.Google Scholar
  40. Polani, D., Martinetz, T., and Kim, J., (2001). An Information-Theoretic Approach for the Quantification of Relevance. In Kelemen, J., and Sosik, P., editors, Advances in Artificial Life (Proceedings of the 6th European Conference on Artificial Life), vol. 2159 of LNAI, 704–713. Berlin: Springer.Google Scholar
  41. Polani, D., Nehaniv, C., Martinetz, T., and Kim, J. T., (2006). Relevant Information in Optimized Persistence vs. Progeny Strategies. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 337–343.Google Scholar
  42. Prokopenko, M., Gerasimov, V., and Tanev, I., (2006). Evolving Spatiotemporal Coordination in a Modular Robotic System. In Nolfi, S., Baldassarre, G., Calabretta, R., Hallam, J. C. T., Marocco, D., Meyer, J.-A., Miglino, O., and Parisi, D., editors, From Animals to Animats 9: 9th International Conference on the Simulation of Adaptive Behavior (SAB 2006), Rome, Italy, vol. 4095 of Lecture Notes in Computer Science, 558–569. Berlin: Springer.Google Scholar
  43. Rubin, J., Shamir, O., and Tishby, N., (2010). A PAC-Bayesian Analysis of Reinforcement Learning. In Proceedings of AISTAT 2010.Google Scholar
  44. Saerens, M., Achbany, Y., Fuss, F., and Yen, L., (2009). Randomized Shortest-Path Problems: Two Related Models. Neural Computation, 21:2363–2404.CrossRefPubMedGoogle Scholar
  45. Seldin, Y., and Tishby, N., (2009). PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AIStats 2009), vol. 5 of JMLR Workshop and Conference Proceedings.Google Scholar
  46. Shalizi, C. R., and Crutchfield, J. P., (2002). Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction. Advances in Complex Systems, 5:1–5.CrossRefGoogle Scholar
  47. Shannon, C. E., (1949). The Mathematical Theory of Communication. In Shannon, C. E., and Weaver, W., editors, The Mathematical Theory of Communication. Urbana: The University of Illinois Press.Google Scholar
  48. Slonim, N., Friedman, N., and Tishby, N., (2006). Multivariate Information Bottleneck. Neural Computation, 18(8):1739–1789.CrossRefPubMedGoogle Scholar
  49. Sporns, O., and Lungarella, M., (2006). Evolving coordinated behavior by maximizing information structure. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 323–329.Google Scholar
  50. Still, S., (2009). Information-theoretic approach to interactive learning. EPL (Europhysics Letters), 85(2):28005–28010.CrossRefGoogle Scholar
  51. Strens, M., (2000). A Bayesian Framework for Reinforcement Learning. In Langley, P., editor, Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 – July 2, 2000. Morgan Kaufmann.Google Scholar
  52. Sutton, R. S., and Barto, A. G., (1998). Reinforcement Learning. Cambridge, Mass.: MIT.Google Scholar
  53. Taylor, S. F., Tishby, N., and Bialek, W., (2007). Information and Fitness. [q-bio.PE].Google Scholar
  54. Tishby, N., Pereira, F. C., and Bialek, W., (1999). The Information Bottleneck Method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Illinois. Urbana-Champaign.Google Scholar
  55. Todorov, E., (2009). Efficient computation of optimal actions. PNAS, 106(28):11478–11483.CrossRefPubMedGoogle Scholar
  56. Touchette, H., and Lloyd, S., (2000). Information-Theoretic Limits of Control. Physical Review Letters, 84:1156.CrossRefPubMedGoogle Scholar
  57. Touchette, H., and Lloyd, S., (2004). Information-theoretic approach to the study of control systems. Physica A, 331:140–172.CrossRefGoogle Scholar
  58. van Dijk, S. G., Polani, D., and Nehaniv, C. L., (2009). Hierarchical Behaviours: Getting the Most Bang for your Bit. In Kampis, G., and Szathmáry, E., editors, Proceedings of the European Conference on Artificial Life 2009, Budapest. Springer.Google Scholar
  59. Vergassola, M., Villermaux, E., and Shraiman, B. I., (2007). ‘Infotaxis’ as a strategy for searching without gradients. Nature, 445:406–409.CrossRefPubMedGoogle Scholar
  60. Wennekers, T., and Ay, N., (2005). Finite State Automata Resulting From Temporal Information Maximization. Neural Computation, 17(10):2258–2290.CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.School of Engineering and Computer Science, Interdisciplinary Center for Neural Computation, The Suadrsky Center for Computational BiologyHebrew University JerusalemJerusalemIsrael

Personalised recommendations