Skip to main content

Information Theory of Decisions and Actions

  • Chapter
  • First Online:
Perception-Action Cycle

Part of the book series: Springer Series in Cognitive and Neural Systems ((SSCNS))

Abstract

The perception–action cycle is often defined as “the circular flow of information between an organism and its environment in the course of a sensory guided sequence of actions towards a goal” (Fuster, Neuron 30:319–333, 2001; International Journal of Psychophysiology 60(2):125–132, 2006). The question we address in this chapter is in what sense this “flow of information” can be described by Shannon’s measures of information introduced in his mathematical theory of communication. We provide an affirmative answer to this question using an intriguing analogy between Shannon’s classical model of communication and the perception–action cycle. In particular, decision and action sequences turn out to be directly analogous to codes in communication, and their complexity – the minimal number of (binary) decisions required for reaching a goal – directly bounded by information measures, as in communication. This analogy allows us to extend the standard reinforcement learning framework. The latter considers the future expected reward in the course of a behaviour sequence towards a goal (value-to-go). Here, we additionally incorporate a measure of information associated with this sequence: the cumulated information processing cost or bandwidth required to specify the future decision and action sequence (information-to-go). Using a graphical model, we derive a recursive Bellman optimality equation for information measures, in analogy to reinforcement learning; from this, we obtain new algorithms for calculating the optimal trade-off between the value-to-go and the required information-to-go, unifying the ideas behind the Bellman and the Blahut–Arimoto iterations. This trade-off between value-to-go and information-to-go provides a complete analogy with the compression–distortion trade-off in source coding. The present new formulation connects seemingly unrelated optimization problems. The algorithm is demonstrated on grid world examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    To simplify the derivations, we will always assume convergence of the rewards and not make use of the usual MDP discount factor; in particular, we assume either episodic tasks or non-episodic (continuing) tasks for which the reward converges. For further discussion, see also the remark in Sect. 19.8.2 on soft policies.

  2. 2.

    Valuable information is not to be confused with the value of information introduced in Howard (1966). Serving similar purposes, it is conceptually different, as it measures the value difference attainable in a decision knowing vs. not knowing the outcome of a given random variable. Stated informally, it could be seen as “non information-theoretic conjugate” of valuable information.

  3. 3.

    This is a transition graph and should not be confused with the Bayesian Network Graph.

  4. 4.

    Note that, unless stated otherwise, we always imply that the distributions \(\hat{p}({s}_{t+1}),\hat{p}({s}_{t+2}),\ldots \) as well as \(\hat{\pi }({a}_{t+1}),\hat{\pi }({a}_{t+2}),\ldots \) can be different for different t.

  5. 5.

    The interpretation of a negative information gain is that under the presence/observation of a particular condition, the subsequent distributions are blurred. One caricature example would be that, to solve a crime, one would have a probability distribution sharply concentrated on a particular crime suspect. If now additional evidence would exclude that suspect from consideration and reset the distribution to cover all suspects equally, this would be an example for negative information gain.

  6. 6.

    Alternatively, one could minimize F π by setting the gradient of F π with respect to p(s t + 1 | s t , a t ) to 0 similar to the derivation of (19.27)–(19.29) under the assumption that π is already optimized. This implements the assumption that the adaptation of the environmental channel is “slow”, corresponding to the adaptation of the agent policy.

References

  • Ashby, W. R., (1956). An Introduction to Cybernetics. London: Chapman & Hall Ltd.

    Google Scholar 

  • Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E., (2008). Predictive Information and Explorative Behavior of Autonomous Robots. European Journal of Physics B, 63:329–339.

    Article  CAS  Google Scholar 

  • Ay, N., and Polani, D., (2008). Information Flows in Causal Networks. Advances in Complex Systems, 11(1):17–41.

    Article  Google Scholar 

  • Ay, N., and Wennekers, T., (2003). Dynamical Properties of Strongly Interacting Markov Chains. Neural Networks, 16(10):1483–1497.

    Article  PubMed  Google Scholar 

  • Berger, T., (2003). Living Information Theory – The 2002 Shannon Lecture. IEEE Information Theory Society Newsletter, 53(1):1,6–19.

    Google Scholar 

  • Bialek, W., de Ruyter van Steveninck, R. R., and Tishby, N., (2007). Efficient representation as a design principle for neural coding and computation. arXiv.org:0712.4381 [q-bio.NC].

    Google Scholar 

  • Bialek, W., Nemenman, I., and Tishby, N., (2001). Predictability, complexity and learning. Neural Computation, 13:2409–2463.

    Article  CAS  PubMed  Google Scholar 

  • Brenner, N., Bialek, W., and de Ruyter van Steveninck, R., (2000). Adaptive rescaling optimizes information transmission. Neuron, 26:695–702.

    Google Scholar 

  • Cover, T. M., and Thomas, J. A., (1991). Elements of Information Theory. New York: Wiley.

    Book  Google Scholar 

  • Csiszár, I., and Körner, J., (1986). Information Theory: Coding Theorems for Discrete Memoryless Systems. Budapest: Academiai Kiado.

    Google Scholar 

  • Der, R., Steinmetz, U., and Pasemann, F., (1999). Homeokinesis – A new principle to back up evolution with learning. In Mohammadian, M., editor, Computational Intelligence for Modelling, Control, and Automation, vol. 55 of Concurrent Systems Engineering Series, 43–47. Amsterdam: IOS.

    Google Scholar 

  • Ellison, C., Mahoney, J., and Crutchfield, J., (2009). Prediction, Retrodiction, and the Amount of Information Stored in the Present. Journal of Statistical Physics, 136(6):1005–1034.

    Article  Google Scholar 

  • Engel, Y., Mannor, S., and Meir, R., (2003). Bayes meets Bellman: The Gaussian Process Approach to Temporal Difference Learning. In Proceedings of ICML 20, 154–161.

    Google Scholar 

  • Friston, K., (2009). The free-energy principle: a rough guide to the brain? Trends in Cognitive Sciences, 13(7):293–301.

    Article  PubMed  Google Scholar 

  • Friston, K., Kilner, J., and Harrison, L., (2006). A free energy principle for the brain. Journal of Physiology-Paris, 100:70–87.

    Article  Google Scholar 

  • Fry, R. L., (2008). Computation by Neural and Cortical Systems. Presentation at the Workshop at CNS*2008, Portland, OR: Methods of Information Theory in Computational Neuroscience.

    Google Scholar 

  • Fuster, J. M., (2001). The Prefrontal Cortex – An Update: Time Is of the Essence. Neuron, 30:319–333.

    Article  CAS  PubMed  Google Scholar 

  • Fuster, J. M., (2006). The cognit: A network model of cortical representation. International Journal of Psychophysiology, 60(2):125–132.

    Article  PubMed  Google Scholar 

  • Gastpar, M., Rimoldi, B., and Vetterli, M., (2003). To Code, or Not to Code: Lossy Source-Channel Communication Revisited. IEEE Transactions on Information Theory, 49(5):1147– 1158.

    Article  Google Scholar 

  • Globerson, A., Stark, E., Vaadia, E., and Tishby, N., (2009). The Minimum Information principle and its application to neural code analysis. PNAS, 106(9):3490–3495.

    Article  CAS  PubMed  Google Scholar 

  • Haken, H., (1983). Advanced synergetics. Berlin: Springer.

    Google Scholar 

  • Howard, R. A., (1966). Information value theory. IEEE Transactions on Systems Science and Cybernetics, SSC-2:22–26.

    Google Scholar 

  • Jung, T., and Polani, D., (2007). Kernelizing LSPE(λ). In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 1–5, Hawaii, 338–345.

    Google Scholar 

  • Kappen, B., Gomez, V., and Opper, M., (2009). Optimal control as a graphical model inference problem. arXiv:0901.0633v2 [cs.AI].

    Google Scholar 

  • Kelly, J. L., (1956). A New Interpretation of Information Rate. Bell System Technical Journal, 35:917–926.

    Google Scholar 

  • Klyubin, A., Polani, D., and Nehaniv, C., (2007). Representations of Space and Time in the Maximization of Information Flow in the Perception-Action Loop. Neural Computation, 19(9):2387–2432.

    Article  PubMed  Google Scholar 

  • Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2004). Organization of the Information Flow in the Perception-Action Loop of Evolved Agents. In Proceedings of 2004 NASA/DoD Conference on Evolvable Hardware, 177–180. IEEE Computer Society.

    Google Scholar 

  • Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005a). All Else Being Equal Be Empowered. In Advances in Artificial Life, European Conference on Artificial Life (ECAL 2005), vol. 3630 of LNAI, 744–753. Berlin: Springer.

    Google Scholar 

  • Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005b). Empowerment: A Universal Agent-Centric Measure of Control. In Proceedings of the IEEE Congress on Evolutionary Computation, 2–5 September 2005, Edinburgh, Scotland (CEC 2005), 128–135. IEEE.

    Google Scholar 

  • Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2008). Keep Your Options Open: An Information-Based Driving Principle for Sensorimotor Systems. PLoS ONE, 3(12):e4018. URL: http://dx.doi.org/10.1371/journal.pone.0004018, Dec 2008.

  • Laughlin, S. B., (2001). Energy as a constraint on the coding and processing of sensory information. Current Opinion in Neurobiology, 11:475–480.

    Article  CAS  PubMed  Google Scholar 

  • Lizier, J., Prokopenko, M., and Zomaya, A., (2007). Detecting non-trivial computation in complex dynamics. In Almeida e Costa, F., Rocha, L. M., Costa, E., Harvey, I., and Coutinho, A., editors, Advances in Artificial Life (Proceedings of the ECAL 2007, Lisbon), vol. 4648 of LNCS, 895–904. Berlin: Springer.

    Google Scholar 

  • Lungarella, M., and Sporns, O., (2005). Information Self-Structuring: Key Principle for Learning and Development. In Proceedings of 4th IEEE International Conference on Development and Learning, 25–30. IEEE.

    Google Scholar 

  • Lungarella, M., and Sporns, O., (2006). Mapping Information Flow in Sensorimotor Networks. PLoS Computational Biology, 2(10):e144.

    Article  PubMed  Google Scholar 

  • Massey, J., (1990). Causality, feedback and directed information. In Proceedings of the International Symposium on Information Theory and its Applications (ISITA-90), 303–305.

    Google Scholar 

  • McAllester, D. A., (1999). PAC-Bayesian model averaging. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, Santa Cruz, CA, 164–170. New York: ACM.

    Google Scholar 

  • Pearl, J., (2000). Causality: Models, Reasoning and Inference. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Pfeifer, R., and Bongard, J., (2007). How the Body Shapes the Way We think: A New View of Intelligence. Bradford Books.

    Google Scholar 

  • Polani, D., (2009). Information: Currency of Life?. HFSP Journal, 3(5):307–316. URL: http://link.aip.org/link/?HFS/3/307/1, Nov 2009.

    Google Scholar 

  • Polani, D., Martinetz, T., and Kim, J., (2001). An Information-Theoretic Approach for the Quantification of Relevance. In Kelemen, J., and Sosik, P., editors, Advances in Artificial Life (Proceedings of the 6th European Conference on Artificial Life), vol. 2159 of LNAI, 704–713. Berlin: Springer.

    Google Scholar 

  • Polani, D., Nehaniv, C., Martinetz, T., and Kim, J. T., (2006). Relevant Information in Optimized Persistence vs. Progeny Strategies. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 337–343.

    Google Scholar 

  • Prokopenko, M., Gerasimov, V., and Tanev, I., (2006). Evolving Spatiotemporal Coordination in a Modular Robotic System. In Nolfi, S., Baldassarre, G., Calabretta, R., Hallam, J. C. T., Marocco, D., Meyer, J.-A., Miglino, O., and Parisi, D., editors, From Animals to Animats 9: 9th International Conference on the Simulation of Adaptive Behavior (SAB 2006), Rome, Italy, vol. 4095 of Lecture Notes in Computer Science, 558–569. Berlin: Springer.

    Google Scholar 

  • Rubin, J., Shamir, O., and Tishby, N., (2010). A PAC-Bayesian Analysis of Reinforcement Learning. In Proceedings of AISTAT 2010.

    Google Scholar 

  • Saerens, M., Achbany, Y., Fuss, F., and Yen, L., (2009). Randomized Shortest-Path Problems: Two Related Models. Neural Computation, 21:2363–2404.

    Article  PubMed  Google Scholar 

  • Seldin, Y., and Tishby, N., (2009). PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AIStats 2009), vol. 5 of JMLR Workshop and Conference Proceedings.

    Google Scholar 

  • Shalizi, C. R., and Crutchfield, J. P., (2002). Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction. Advances in Complex Systems, 5:1–5.

    Article  Google Scholar 

  • Shannon, C. E., (1949). The Mathematical Theory of Communication. In Shannon, C. E., and Weaver, W., editors, The Mathematical Theory of Communication. Urbana: The University of Illinois Press.

    Google Scholar 

  • Slonim, N., Friedman, N., and Tishby, N., (2006). Multivariate Information Bottleneck. Neural Computation, 18(8):1739–1789.

    Article  PubMed  Google Scholar 

  • Sporns, O., and Lungarella, M., (2006). Evolving coordinated behavior by maximizing information structure. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 323–329.

    Google Scholar 

  • Still, S., (2009). Information-theoretic approach to interactive learning. EPL (Europhysics Letters), 85(2):28005–28010.

    Article  Google Scholar 

  • Strens, M., (2000). A Bayesian Framework for Reinforcement Learning. In Langley, P., editor, Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 – July 2, 2000. Morgan Kaufmann.

    Google Scholar 

  • Sutton, R. S., and Barto, A. G., (1998). Reinforcement Learning. Cambridge, Mass.: MIT.

    Google Scholar 

  • Taylor, S. F., Tishby, N., and Bialek, W., (2007). Information and Fitness. arXiv.org:0712.4382 [q-bio.PE].

    Google Scholar 

  • Tishby, N., Pereira, F. C., and Bialek, W., (1999). The Information Bottleneck Method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Illinois. Urbana-Champaign.

    Google Scholar 

  • Todorov, E., (2009). Efficient computation of optimal actions. PNAS, 106(28):11478–11483.

    Article  CAS  PubMed  Google Scholar 

  • Touchette, H., and Lloyd, S., (2000). Information-Theoretic Limits of Control. Physical Review Letters, 84:1156.

    Article  CAS  PubMed  Google Scholar 

  • Touchette, H., and Lloyd, S., (2004). Information-theoretic approach to the study of control systems. Physica A, 331:140–172.

    Article  Google Scholar 

  • van Dijk, S. G., Polani, D., and Nehaniv, C. L., (2009). Hierarchical Behaviours: Getting the Most Bang for your Bit. In Kampis, G., and Szathmáry, E., editors, Proceedings of the European Conference on Artificial Life 2009, Budapest. Springer.

    Google Scholar 

  • Vergassola, M., Villermaux, E., and Shraiman, B. I., (2007). ‘Infotaxis’ as a strategy for searching without gradients. Nature, 445:406–409.

    Article  CAS  PubMed  Google Scholar 

  • Wennekers, T., and Ay, N., (2005). Finite State Automata Resulting From Temporal Information Maximization. Neural Computation, 17(10):2258–2290.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Jonathan Rubin for carrying out the simulations and the preparation of the corresponding diagrams.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naftali Tishby .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Tishby, N., Polani, D. (2011). Information Theory of Decisions and Actions. In: Cutsuridis, V., Hussain, A., Taylor, J. (eds) Perception-Action Cycle. Springer Series in Cognitive and Neural Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1452-1_19

Download citation

Publish with us

Policies and ethics