Abstract
The perception–action cycle is often defined as “the circular flow of information between an organism and its environment in the course of a sensory guided sequence of actions towards a goal” (Fuster, Neuron 30:319–333, 2001; International Journal of Psychophysiology 60(2):125–132, 2006). The question we address in this chapter is in what sense this “flow of information” can be described by Shannon’s measures of information introduced in his mathematical theory of communication. We provide an affirmative answer to this question using an intriguing analogy between Shannon’s classical model of communication and the perception–action cycle. In particular, decision and action sequences turn out to be directly analogous to codes in communication, and their complexity – the minimal number of (binary) decisions required for reaching a goal – directly bounded by information measures, as in communication. This analogy allows us to extend the standard reinforcement learning framework. The latter considers the future expected reward in the course of a behaviour sequence towards a goal (value-to-go). Here, we additionally incorporate a measure of information associated with this sequence: the cumulated information processing cost or bandwidth required to specify the future decision and action sequence (information-to-go). Using a graphical model, we derive a recursive Bellman optimality equation for information measures, in analogy to reinforcement learning; from this, we obtain new algorithms for calculating the optimal trade-off between the value-to-go and the required information-to-go, unifying the ideas behind the Bellman and the Blahut–Arimoto iterations. This trade-off between value-to-go and information-to-go provides a complete analogy with the compression–distortion trade-off in source coding. The present new formulation connects seemingly unrelated optimization problems. The algorithm is demonstrated on grid world examples.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
To simplify the derivations, we will always assume convergence of the rewards and not make use of the usual MDP discount factor; in particular, we assume either episodic tasks or non-episodic (continuing) tasks for which the reward converges. For further discussion, see also the remark in Sect. 19.8.2 on soft policies.
- 2.
Valuable information is not to be confused with the value of information introduced in Howard (1966). Serving similar purposes, it is conceptually different, as it measures the value difference attainable in a decision knowing vs. not knowing the outcome of a given random variable. Stated informally, it could be seen as “non information-theoretic conjugate” of valuable information.
- 3.
This is a transition graph and should not be confused with the Bayesian Network Graph.
- 4.
Note that, unless stated otherwise, we always imply that the distributions \(\hat{p}({s}_{t+1}),\hat{p}({s}_{t+2}),\ldots \) as well as \(\hat{\pi }({a}_{t+1}),\hat{\pi }({a}_{t+2}),\ldots \) can be different for different t.
- 5.
The interpretation of a negative information gain is that under the presence/observation of a particular condition, the subsequent distributions are blurred. One caricature example would be that, to solve a crime, one would have a probability distribution sharply concentrated on a particular crime suspect. If now additional evidence would exclude that suspect from consideration and reset the distribution to cover all suspects equally, this would be an example for negative information gain.
- 6.
Alternatively, one could minimize F π by setting the gradient of F π with respect to p(s t + 1 | s t , a t ) to 0 similar to the derivation of (19.27)–(19.29) under the assumption that π is already optimized. This implements the assumption that the adaptation of the environmental channel is “slow”, corresponding to the adaptation of the agent policy.
References
Ashby, W. R., (1956). An Introduction to Cybernetics. London: Chapman & Hall Ltd.
Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E., (2008). Predictive Information and Explorative Behavior of Autonomous Robots. European Journal of Physics B, 63:329–339.
Ay, N., and Polani, D., (2008). Information Flows in Causal Networks. Advances in Complex Systems, 11(1):17–41.
Ay, N., and Wennekers, T., (2003). Dynamical Properties of Strongly Interacting Markov Chains. Neural Networks, 16(10):1483–1497.
Berger, T., (2003). Living Information Theory – The 2002 Shannon Lecture. IEEE Information Theory Society Newsletter, 53(1):1,6–19.
Bialek, W., de Ruyter van Steveninck, R. R., and Tishby, N., (2007). Efficient representation as a design principle for neural coding and computation. arXiv.org:0712.4381 [q-bio.NC].
Bialek, W., Nemenman, I., and Tishby, N., (2001). Predictability, complexity and learning. Neural Computation, 13:2409–2463.
Brenner, N., Bialek, W., and de Ruyter van Steveninck, R., (2000). Adaptive rescaling optimizes information transmission. Neuron, 26:695–702.
Cover, T. M., and Thomas, J. A., (1991). Elements of Information Theory. New York: Wiley.
Csiszár, I., and Körner, J., (1986). Information Theory: Coding Theorems for Discrete Memoryless Systems. Budapest: Academiai Kiado.
Der, R., Steinmetz, U., and Pasemann, F., (1999). Homeokinesis – A new principle to back up evolution with learning. In Mohammadian, M., editor, Computational Intelligence for Modelling, Control, and Automation, vol. 55 of Concurrent Systems Engineering Series, 43–47. Amsterdam: IOS.
Ellison, C., Mahoney, J., and Crutchfield, J., (2009). Prediction, Retrodiction, and the Amount of Information Stored in the Present. Journal of Statistical Physics, 136(6):1005–1034.
Engel, Y., Mannor, S., and Meir, R., (2003). Bayes meets Bellman: The Gaussian Process Approach to Temporal Difference Learning. In Proceedings of ICML 20, 154–161.
Friston, K., (2009). The free-energy principle: a rough guide to the brain? Trends in Cognitive Sciences, 13(7):293–301.
Friston, K., Kilner, J., and Harrison, L., (2006). A free energy principle for the brain. Journal of Physiology-Paris, 100:70–87.
Fry, R. L., (2008). Computation by Neural and Cortical Systems. Presentation at the Workshop at CNS*2008, Portland, OR: Methods of Information Theory in Computational Neuroscience.
Fuster, J. M., (2001). The Prefrontal Cortex – An Update: Time Is of the Essence. Neuron, 30:319–333.
Fuster, J. M., (2006). The cognit: A network model of cortical representation. International Journal of Psychophysiology, 60(2):125–132.
Gastpar, M., Rimoldi, B., and Vetterli, M., (2003). To Code, or Not to Code: Lossy Source-Channel Communication Revisited. IEEE Transactions on Information Theory, 49(5):1147– 1158.
Globerson, A., Stark, E., Vaadia, E., and Tishby, N., (2009). The Minimum Information principle and its application to neural code analysis. PNAS, 106(9):3490–3495.
Haken, H., (1983). Advanced synergetics. Berlin: Springer.
Howard, R. A., (1966). Information value theory. IEEE Transactions on Systems Science and Cybernetics, SSC-2:22–26.
Jung, T., and Polani, D., (2007). Kernelizing LSPE(λ). In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 1–5, Hawaii, 338–345.
Kappen, B., Gomez, V., and Opper, M., (2009). Optimal control as a graphical model inference problem. arXiv:0901.0633v2 [cs.AI].
Kelly, J. L., (1956). A New Interpretation of Information Rate. Bell System Technical Journal, 35:917–926.
Klyubin, A., Polani, D., and Nehaniv, C., (2007). Representations of Space and Time in the Maximization of Information Flow in the Perception-Action Loop. Neural Computation, 19(9):2387–2432.
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2004). Organization of the Information Flow in the Perception-Action Loop of Evolved Agents. In Proceedings of 2004 NASA/DoD Conference on Evolvable Hardware, 177–180. IEEE Computer Society.
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005a). All Else Being Equal Be Empowered. In Advances in Artificial Life, European Conference on Artificial Life (ECAL 2005), vol. 3630 of LNAI, 744–753. Berlin: Springer.
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005b). Empowerment: A Universal Agent-Centric Measure of Control. In Proceedings of the IEEE Congress on Evolutionary Computation, 2–5 September 2005, Edinburgh, Scotland (CEC 2005), 128–135. IEEE.
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2008). Keep Your Options Open: An Information-Based Driving Principle for Sensorimotor Systems. PLoS ONE, 3(12):e4018. URL: http://dx.doi.org/10.1371/journal.pone.0004018, Dec 2008.
Laughlin, S. B., (2001). Energy as a constraint on the coding and processing of sensory information. Current Opinion in Neurobiology, 11:475–480.
Lizier, J., Prokopenko, M., and Zomaya, A., (2007). Detecting non-trivial computation in complex dynamics. In Almeida e Costa, F., Rocha, L. M., Costa, E., Harvey, I., and Coutinho, A., editors, Advances in Artificial Life (Proceedings of the ECAL 2007, Lisbon), vol. 4648 of LNCS, 895–904. Berlin: Springer.
Lungarella, M., and Sporns, O., (2005). Information Self-Structuring: Key Principle for Learning and Development. In Proceedings of 4th IEEE International Conference on Development and Learning, 25–30. IEEE.
Lungarella, M., and Sporns, O., (2006). Mapping Information Flow in Sensorimotor Networks. PLoS Computational Biology, 2(10):e144.
Massey, J., (1990). Causality, feedback and directed information. In Proceedings of the International Symposium on Information Theory and its Applications (ISITA-90), 303–305.
McAllester, D. A., (1999). PAC-Bayesian model averaging. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, Santa Cruz, CA, 164–170. New York: ACM.
Pearl, J., (2000). Causality: Models, Reasoning and Inference. Cambridge, UK: Cambridge University Press.
Pfeifer, R., and Bongard, J., (2007). How the Body Shapes the Way We think: A New View of Intelligence. Bradford Books.
Polani, D., (2009). Information: Currency of Life?. HFSP Journal, 3(5):307–316. URL: http://link.aip.org/link/?HFS/3/307/1, Nov 2009.
Polani, D., Martinetz, T., and Kim, J., (2001). An Information-Theoretic Approach for the Quantification of Relevance. In Kelemen, J., and Sosik, P., editors, Advances in Artificial Life (Proceedings of the 6th European Conference on Artificial Life), vol. 2159 of LNAI, 704–713. Berlin: Springer.
Polani, D., Nehaniv, C., Martinetz, T., and Kim, J. T., (2006). Relevant Information in Optimized Persistence vs. Progeny Strategies. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 337–343.
Prokopenko, M., Gerasimov, V., and Tanev, I., (2006). Evolving Spatiotemporal Coordination in a Modular Robotic System. In Nolfi, S., Baldassarre, G., Calabretta, R., Hallam, J. C. T., Marocco, D., Meyer, J.-A., Miglino, O., and Parisi, D., editors, From Animals to Animats 9: 9th International Conference on the Simulation of Adaptive Behavior (SAB 2006), Rome, Italy, vol. 4095 of Lecture Notes in Computer Science, 558–569. Berlin: Springer.
Rubin, J., Shamir, O., and Tishby, N., (2010). A PAC-Bayesian Analysis of Reinforcement Learning. In Proceedings of AISTAT 2010.
Saerens, M., Achbany, Y., Fuss, F., and Yen, L., (2009). Randomized Shortest-Path Problems: Two Related Models. Neural Computation, 21:2363–2404.
Seldin, Y., and Tishby, N., (2009). PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AIStats 2009), vol. 5 of JMLR Workshop and Conference Proceedings.
Shalizi, C. R., and Crutchfield, J. P., (2002). Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction. Advances in Complex Systems, 5:1–5.
Shannon, C. E., (1949). The Mathematical Theory of Communication. In Shannon, C. E., and Weaver, W., editors, The Mathematical Theory of Communication. Urbana: The University of Illinois Press.
Slonim, N., Friedman, N., and Tishby, N., (2006). Multivariate Information Bottleneck. Neural Computation, 18(8):1739–1789.
Sporns, O., and Lungarella, M., (2006). Evolving coordinated behavior by maximizing information structure. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 323–329.
Still, S., (2009). Information-theoretic approach to interactive learning. EPL (Europhysics Letters), 85(2):28005–28010.
Strens, M., (2000). A Bayesian Framework for Reinforcement Learning. In Langley, P., editor, Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 – July 2, 2000. Morgan Kaufmann.
Sutton, R. S., and Barto, A. G., (1998). Reinforcement Learning. Cambridge, Mass.: MIT.
Taylor, S. F., Tishby, N., and Bialek, W., (2007). Information and Fitness. arXiv.org:0712.4382 [q-bio.PE].
Tishby, N., Pereira, F. C., and Bialek, W., (1999). The Information Bottleneck Method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Illinois. Urbana-Champaign.
Todorov, E., (2009). Efficient computation of optimal actions. PNAS, 106(28):11478–11483.
Touchette, H., and Lloyd, S., (2000). Information-Theoretic Limits of Control. Physical Review Letters, 84:1156.
Touchette, H., and Lloyd, S., (2004). Information-theoretic approach to the study of control systems. Physica A, 331:140–172.
van Dijk, S. G., Polani, D., and Nehaniv, C. L., (2009). Hierarchical Behaviours: Getting the Most Bang for your Bit. In Kampis, G., and Szathmáry, E., editors, Proceedings of the European Conference on Artificial Life 2009, Budapest. Springer.
Vergassola, M., Villermaux, E., and Shraiman, B. I., (2007). ‘Infotaxis’ as a strategy for searching without gradients. Nature, 445:406–409.
Wennekers, T., and Ay, N., (2005). Finite State Automata Resulting From Temporal Information Maximization. Neural Computation, 17(10):2258–2290.
Acknowledgements
The authors would like to thank Jonathan Rubin for carrying out the simulations and the preparation of the corresponding diagrams.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Tishby, N., Polani, D. (2011). Information Theory of Decisions and Actions. In: Cutsuridis, V., Hussain, A., Taylor, J. (eds) Perception-Action Cycle. Springer Series in Cognitive and Neural Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1452-1_19
Download citation
DOI: https://doi.org/10.1007/978-1-4419-1452-1_19
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-1451-4
Online ISBN: 978-1-4419-1452-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)