Information Theory of Decisions and Actions

Tishby, Naftali; Polani, Daniel

doi:10.1007/978-1-4419-1452-1_19

Naftali Tishby⁴ &
Daniel Polani

Part of the book series: Springer Series in Cognitive and Neural Systems ((SSCNS))

2547 Accesses
82 Citations
4 Altmetric

Abstract

The perception–action cycle is often defined as “the circular flow of information between an organism and its environment in the course of a sensory guided sequence of actions towards a goal” (Fuster, Neuron 30:319–333, 2001; International Journal of Psychophysiology 60(2):125–132, 2006). The question we address in this chapter is in what sense this “flow of information” can be described by Shannon’s measures of information introduced in his mathematical theory of communication. We provide an affirmative answer to this question using an intriguing analogy between Shannon’s classical model of communication and the perception–action cycle. In particular, decision and action sequences turn out to be directly analogous to codes in communication, and their complexity – the minimal number of (binary) decisions required for reaching a goal – directly bounded by information measures, as in communication. This analogy allows us to extend the standard reinforcement learning framework. The latter considers the future expected reward in the course of a behaviour sequence towards a goal (value-to-go). Here, we additionally incorporate a measure of information associated with this sequence: the cumulated information processing cost or bandwidth required to specify the future decision and action sequence (information-to-go). Using a graphical model, we derive a recursive Bellman optimality equation for information measures, in analogy to reinforcement learning; from this, we obtain new algorithms for calculating the optimal trade-off between the value-to-go and the required information-to-go, unifying the ideas behind the Bellman and the Blahut–Arimoto iterations. This trade-off between value-to-go and information-to-go provides a complete analogy with the compression–distortion trade-off in source coding. The present new formulation connects seemingly unrelated optimization problems. The algorithm is demonstrated on grid world examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
To simplify the derivations, we will always assume convergence of the rewards and not make use of the usual MDP discount factor; in particular, we assume either episodic tasks or non-episodic (continuing) tasks for which the reward converges. For further discussion, see also the remark in Sect. 19.8.2 on soft policies.
2.
Valuable information is not to be confused with the value of information introduced in Howard (1966). Serving similar purposes, it is conceptually different, as it measures the value difference attainable in a decision knowing vs. not knowing the outcome of a given random variable. Stated informally, it could be seen as “non information-theoretic conjugate” of valuable information.
3.
This is a transition graph and should not be confused with the Bayesian Network Graph.
4.
Note that, unless stated otherwise, we always imply that the distributions \(\hat{p}({s}_{t+1}),\hat{p}({s}_{t+2}),\ldots \) as well as \(\hat{\pi }({a}_{t+1}),\hat{\pi }({a}_{t+2}),\ldots \) can be different for different t.
5.
The interpretation of a negative information gain is that under the presence/observation of a particular condition, the subsequent distributions are blurred. One caricature example would be that, to solve a crime, one would have a probability distribution sharply concentrated on a particular crime suspect. If now additional evidence would exclude that suspect from consideration and reset the distribution to cover all suspects equally, this would be an example for negative information gain.
6.
Alternatively, one could minimize F ^π by setting the gradient of F ^π with respect to p(s _t + 1 | s _t, a _t) to 0 similar to the derivation of (19.27)–(19.29) under the assumption that π is already optimized. This implements the assumption that the adaptation of the environmental channel is “slow”, corresponding to the adaptation of the agent policy.

References

Ashby, W. R., (1956). An Introduction to Cybernetics. London: Chapman & Hall Ltd.
Google Scholar
Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E., (2008). Predictive Information and Explorative Behavior of Autonomous Robots. European Journal of Physics B, 63:329–339.
Article CAS Google Scholar
Ay, N., and Polani, D., (2008). Information Flows in Causal Networks. Advances in Complex Systems, 11(1):17–41.
Article Google Scholar
Ay, N., and Wennekers, T., (2003). Dynamical Properties of Strongly Interacting Markov Chains. Neural Networks, 16(10):1483–1497.
Article PubMed Google Scholar
Berger, T., (2003). Living Information Theory – The 2002 Shannon Lecture. IEEE Information Theory Society Newsletter, 53(1):1,6–19.
Google Scholar
Bialek, W., de Ruyter van Steveninck, R. R., and Tishby, N., (2007). Efficient representation as a design principle for neural coding and computation. arXiv.org:0712.4381 [q-bio.NC].
Google Scholar
Bialek, W., Nemenman, I., and Tishby, N., (2001). Predictability, complexity and learning. Neural Computation, 13:2409–2463.
Article CAS PubMed Google Scholar
Brenner, N., Bialek, W., and de Ruyter van Steveninck, R., (2000). Adaptive rescaling optimizes information transmission. Neuron, 26:695–702.
Google Scholar
Cover, T. M., and Thomas, J. A., (1991). Elements of Information Theory. New York: Wiley.
Book Google Scholar
Csiszár, I., and Körner, J., (1986). Information Theory: Coding Theorems for Discrete Memoryless Systems. Budapest: Academiai Kiado.
Google Scholar
Der, R., Steinmetz, U., and Pasemann, F., (1999). Homeokinesis – A new principle to back up evolution with learning. In Mohammadian, M., editor, Computational Intelligence for Modelling, Control, and Automation, vol. 55 of Concurrent Systems Engineering Series, 43–47. Amsterdam: IOS.
Google Scholar
Ellison, C., Mahoney, J., and Crutchfield, J., (2009). Prediction, Retrodiction, and the Amount of Information Stored in the Present. Journal of Statistical Physics, 136(6):1005–1034.
Article Google Scholar
Engel, Y., Mannor, S., and Meir, R., (2003). Bayes meets Bellman: The Gaussian Process Approach to Temporal Difference Learning. In Proceedings of ICML 20, 154–161.
Google Scholar
Friston, K., (2009). The free-energy principle: a rough guide to the brain? Trends in Cognitive Sciences, 13(7):293–301.
Article PubMed Google Scholar
Friston, K., Kilner, J., and Harrison, L., (2006). A free energy principle for the brain. Journal of Physiology-Paris, 100:70–87.
Article Google Scholar
Fry, R. L., (2008). Computation by Neural and Cortical Systems. Presentation at the Workshop at CNS*2008, Portland, OR: Methods of Information Theory in Computational Neuroscience.
Google Scholar
Fuster, J. M., (2001). The Prefrontal Cortex – An Update: Time Is of the Essence. Neuron, 30:319–333.
Article CAS PubMed Google Scholar
Fuster, J. M., (2006). The cognit: A network model of cortical representation. International Journal of Psychophysiology, 60(2):125–132.
Article PubMed Google Scholar
Gastpar, M., Rimoldi, B., and Vetterli, M., (2003). To Code, or Not to Code: Lossy Source-Channel Communication Revisited. IEEE Transactions on Information Theory, 49(5):1147– 1158.
Article Google Scholar
Globerson, A., Stark, E., Vaadia, E., and Tishby, N., (2009). The Minimum Information principle and its application to neural code analysis. PNAS, 106(9):3490–3495.
Article CAS PubMed Google Scholar
Haken, H., (1983). Advanced synergetics. Berlin: Springer.
Google Scholar
Howard, R. A., (1966). Information value theory. IEEE Transactions on Systems Science and Cybernetics, SSC-2:22–26.
Google Scholar
Jung, T., and Polani, D., (2007). Kernelizing LSPE(λ). In Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, April 1–5, Hawaii, 338–345.
Google Scholar
Kappen, B., Gomez, V., and Opper, M., (2009). Optimal control as a graphical model inference problem. arXiv:0901.0633v2 [cs.AI].
Google Scholar
Kelly, J. L., (1956). A New Interpretation of Information Rate. Bell System Technical Journal, 35:917–926.
Google Scholar
Klyubin, A., Polani, D., and Nehaniv, C., (2007). Representations of Space and Time in the Maximization of Information Flow in the Perception-Action Loop. Neural Computation, 19(9):2387–2432.
Article PubMed Google Scholar
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2004). Organization of the Information Flow in the Perception-Action Loop of Evolved Agents. In Proceedings of 2004 NASA/DoD Conference on Evolvable Hardware, 177–180. IEEE Computer Society.
Google Scholar
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005a). All Else Being Equal Be Empowered. In Advances in Artificial Life, European Conference on Artificial Life (ECAL 2005), vol. 3630 of LNAI, 744–753. Berlin: Springer.
Google Scholar
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2005b). Empowerment: A Universal Agent-Centric Measure of Control. In Proceedings of the IEEE Congress on Evolutionary Computation, 2–5 September 2005, Edinburgh, Scotland (CEC 2005), 128–135. IEEE.
Google Scholar
Klyubin, A. S., Polani, D., and Nehaniv, C. L., (2008). Keep Your Options Open: An Information-Based Driving Principle for Sensorimotor Systems. PLoS ONE, 3(12):e4018. URL: http://dx.doi.org/10.1371/journal.pone.0004018, Dec 2008.
Laughlin, S. B., (2001). Energy as a constraint on the coding and processing of sensory information. Current Opinion in Neurobiology, 11:475–480.
Article CAS PubMed Google Scholar
Lizier, J., Prokopenko, M., and Zomaya, A., (2007). Detecting non-trivial computation in complex dynamics. In Almeida e Costa, F., Rocha, L. M., Costa, E., Harvey, I., and Coutinho, A., editors, Advances in Artificial Life (Proceedings of the ECAL 2007, Lisbon), vol. 4648 of LNCS, 895–904. Berlin: Springer.
Google Scholar
Lungarella, M., and Sporns, O., (2005). Information Self-Structuring: Key Principle for Learning and Development. In Proceedings of 4th IEEE International Conference on Development and Learning, 25–30. IEEE.
Google Scholar
Lungarella, M., and Sporns, O., (2006). Mapping Information Flow in Sensorimotor Networks. PLoS Computational Biology, 2(10):e144.
Article PubMed Google Scholar
Massey, J., (1990). Causality, feedback and directed information. In Proceedings of the International Symposium on Information Theory and its Applications (ISITA-90), 303–305.
Google Scholar
McAllester, D. A., (1999). PAC-Bayesian model averaging. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, Santa Cruz, CA, 164–170. New York: ACM.
Google Scholar
Pearl, J., (2000). Causality: Models, Reasoning and Inference. Cambridge, UK: Cambridge University Press.
Google Scholar
Pfeifer, R., and Bongard, J., (2007). How the Body Shapes the Way We think: A New View of Intelligence. Bradford Books.
Google Scholar
Polani, D., (2009). Information: Currency of Life?. HFSP Journal, 3(5):307–316. URL: http://link.aip.org/link/?HFS/3/307/1, Nov 2009.
Google Scholar
Polani, D., Martinetz, T., and Kim, J., (2001). An Information-Theoretic Approach for the Quantification of Relevance. In Kelemen, J., and Sosik, P., editors, Advances in Artificial Life (Proceedings of the 6th European Conference on Artificial Life), vol. 2159 of LNAI, 704–713. Berlin: Springer.
Google Scholar
Polani, D., Nehaniv, C., Martinetz, T., and Kim, J. T., (2006). Relevant Information in Optimized Persistence vs. Progeny Strategies. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 337–343.
Google Scholar
Prokopenko, M., Gerasimov, V., and Tanev, I., (2006). Evolving Spatiotemporal Coordination in a Modular Robotic System. In Nolfi, S., Baldassarre, G., Calabretta, R., Hallam, J. C. T., Marocco, D., Meyer, J.-A., Miglino, O., and Parisi, D., editors, From Animals to Animats 9: 9th International Conference on the Simulation of Adaptive Behavior (SAB 2006), Rome, Italy, vol. 4095 of Lecture Notes in Computer Science, 558–569. Berlin: Springer.
Google Scholar
Rubin, J., Shamir, O., and Tishby, N., (2010). A PAC-Bayesian Analysis of Reinforcement Learning. In Proceedings of AISTAT 2010.
Google Scholar
Saerens, M., Achbany, Y., Fuss, F., and Yen, L., (2009). Randomized Shortest-Path Problems: Two Related Models. Neural Computation, 21:2363–2404.
Article PubMed Google Scholar
Seldin, Y., and Tishby, N., (2009). PAC-Bayesian Generalization Bound for Density Estimation with Application to Co-clustering. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AIStats 2009), vol. 5 of JMLR Workshop and Conference Proceedings.
Google Scholar
Shalizi, C. R., and Crutchfield, J. P., (2002). Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction. Advances in Complex Systems, 5:1–5.
Article Google Scholar
Shannon, C. E., (1949). The Mathematical Theory of Communication. In Shannon, C. E., and Weaver, W., editors, The Mathematical Theory of Communication. Urbana: The University of Illinois Press.
Google Scholar
Slonim, N., Friedman, N., and Tishby, N., (2006). Multivariate Information Bottleneck. Neural Computation, 18(8):1739–1789.
Article PubMed Google Scholar
Sporns, O., and Lungarella, M., (2006). Evolving coordinated behavior by maximizing information structure. In Rocha, L. M., Bedau, M., Floreano, D., Goldstone, R., Vespignani, A., and Yaeger, L., editors, Proceedings of Artificial Life X, 323–329.
Google Scholar
Still, S., (2009). Information-theoretic approach to interactive learning. EPL (Europhysics Letters), 85(2):28005–28010.
Article Google Scholar
Strens, M., (2000). A Bayesian Framework for Reinforcement Learning. In Langley, P., editor, Proceedings of the 17th International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 – July 2, 2000. Morgan Kaufmann.
Google Scholar
Sutton, R. S., and Barto, A. G., (1998). Reinforcement Learning. Cambridge, Mass.: MIT.
Google Scholar
Taylor, S. F., Tishby, N., and Bialek, W., (2007). Information and Fitness. arXiv.org:0712.4382 [q-bio.PE].
Google Scholar
Tishby, N., Pereira, F. C., and Bialek, W., (1999). The Information Bottleneck Method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, Illinois. Urbana-Champaign.
Google Scholar
Todorov, E., (2009). Efficient computation of optimal actions. PNAS, 106(28):11478–11483.
Article CAS PubMed Google Scholar
Touchette, H., and Lloyd, S., (2000). Information-Theoretic Limits of Control. Physical Review Letters, 84:1156.
Article CAS PubMed Google Scholar
Touchette, H., and Lloyd, S., (2004). Information-theoretic approach to the study of control systems. Physica A, 331:140–172.
Article Google Scholar
van Dijk, S. G., Polani, D., and Nehaniv, C. L., (2009). Hierarchical Behaviours: Getting the Most Bang for your Bit. In Kampis, G., and Szathmáry, E., editors, Proceedings of the European Conference on Artificial Life 2009, Budapest. Springer.
Google Scholar
Vergassola, M., Villermaux, E., and Shraiman, B. I., (2007). ‘Infotaxis’ as a strategy for searching without gradients. Nature, 445:406–409.
Article CAS PubMed Google Scholar
Wennekers, T., and Ay, N., (2005). Finite State Automata Resulting From Temporal Information Maximization. Neural Computation, 17(10):2258–2290.
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank Jonathan Rubin for carrying out the simulations and the preparation of the corresponding diagrams.

Author information

Authors and Affiliations

School of Engineering and Computer Science, Interdisciplinary Center for Neural Computation, The Suadrsky Center for Computational Biology, Hebrew University Jerusalem, Jerusalem, Israel
Naftali Tishby

Authors

Naftali Tishby
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Polani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naftali Tishby .

Editor information

Editors and Affiliations

Department of Psychology, Boston University, Boston, 02215, USA
Vassilis Cutsuridis
Dept. Computing Science, University of Stirling, Stirling, FK9 4LA, United Kingdom
Amir Hussain
King's College London, Dept. Mathematics, University of London, London, WC2R 2LS, United Kingdom
John G. Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tishby, N., Polani, D. (2011). Information Theory of Decisions and Actions. In: Cutsuridis, V., Hussain, A., Taylor, J. (eds) Perception-Action Cycle. Springer Series in Cognitive and Neural Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1452-1_19

Download citation

DOI: https://doi.org/10.1007/978-1-4419-1452-1_19
Published: 31 December 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-1451-4
Online ISBN: 978-1-4419-1452-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics