Skip to main content

A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks

  • Conference paper
  • First Online:
Book cover Genetic Programming (EuroGP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11451))

Included in the following conference series:

Abstract

Visual reinforcement learning implies that, decision making policies are identified under delayed rewards from an environment. Moreover, state information takes the form of high-dimensional data, such as video. In addition, although the video might characterize a 3D world in high resolution, partial observability will place significant limits on what the agent can actually perceive of the world. This means that the agent also has to: (1) provide efficient encodings of state, (2) store the encodings of state efficiently in some form of memory, (3) recall such memories after arbitrary delays for decision making. In this work, we demonstrate how an external memory model facilitates decision making in the complex world of multi-agent ‘deathmatches’ in the ViZDoom first person shooter environment. The ViZDoom environment provides a complex environment of multiple rooms and resources in which agents are spawned from multiple different locations. A unique approach is adopted to defining external memory for genetic programming agents in which: (1) the state of memory is shared across all programs. (2) Writing is formulated as a probabilistic process, resulting in different regions of memory having short- versus long-term memory. (3) Read operations are indexed, enabling programs to identify regions of external memory with specific temporal properties. We demonstrate that agents purposefully navigate the world when external memory is provided, whereas those without external memory are limited to merely ‘flight or fight’ behaviour.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Given the larger state space than encountered in the original TPG work, we begin with larger teams, i.e. a state space of \(\approx \)78,000 versus \(\approx \)1,300 in [3, 4].

  2. 2.

    Both TPG and M-TPG support per program stateful scalar memory, or a limited form of memory in which programs are unaware of each other’s state.

  3. 3.

    Short-term memory is located at indexes near ‘50’, long-term at indexes near ‘1’ and ‘100’.

  4. 4.

    Conversely, deep learning solutions down sampled to a \(84 \times 84\) state space.

References

  1. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  2. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3215–3222 (2018)

    Google Scholar 

  3. Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 64–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_5

    Chapter  Google Scholar 

  4. Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)

    Article  Google Scholar 

  5. Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: ACM Genetic and Evolutionary Computation Conference, pp. 229–236 (2018)

    Google Scholar 

  6. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  7. Graves, A., Wayne, G., Danihelka, I.: Neural Turing machines. CoRR abs/1410.5401 (2014)

    Google Scholar 

  8. Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural Turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)

    Google Scholar 

  9. Merrild, J., Rasmussen, M.A., Risi, S.: HyperNTM: evolving scalable neural Turing machines through HyperNEAT. In: Sim, K., Kaufmann, P. (eds.) EvoApplications 2018. LNCS, vol. 10784, pp. 750–766. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77538-8_50

    Chapter  Google Scholar 

  10. Jaderberg, M., et al.: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. CoRR abs/1807.01281 (2018)

    Google Scholar 

  11. Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press, Amsterdam (1994)

    Google Scholar 

  12. Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)

    Google Scholar 

  13. Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: Esparcia-Alcázar, A.I. (ed.) EvoApplications 2013. LNCS, vol. 7835, pp. 529–539. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37192-9_53

    Chapter  Google Scholar 

  14. Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 19–34. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31204-0_2

    Chapter  Google Scholar 

  15. Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)

    Google Scholar 

  16. Teller, A.: The evolution of mental models. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press, Amsterdam (1994)

    Google Scholar 

  17. Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic, Dordrecht (1998)

    Book  Google Scholar 

  18. Andre, D.: Evolution of mapmaking ability: strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)

    Google Scholar 

  19. Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)

    Google Scholar 

  20. Nordin, P., Banzhaf, W., Brameier, M.: Evolution of world model for a minature robot using genetic programming. Robot. Auton. Syst. 25, 105–116 (1998)

    Article  Google Scholar 

  21. Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)

    Google Scholar 

  22. Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)

    Google Scholar 

  23. Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)

    Google Scholar 

  24. Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007). https://doi.org/10.1007/978-0-387-31030-5

    Book  MATH  Google Scholar 

  25. Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)

    Google Scholar 

  26. Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 135–150. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77553-1_9

    Chapter  Google Scholar 

  27. Quiroga, R.Q., Kreiman, G., Koch, C., Fried, I.: Sparse but not ‘grandmonther-cell’ coding in the medial temporal lobe. Trends Cogn. Sci. 12(3), 87–91 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by NSERC grant CRDJ 499792.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert J. Smith .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Smith, R.J., Heywood, M.I. (2019). A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2019. Lecture Notes in Computer Science(), vol 11451. Springer, Cham. https://doi.org/10.1007/978-3-030-16670-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16670-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16669-4

  • Online ISBN: 978-3-030-16670-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics