Skip to main content

Towards Model-Based Reinforcement Learning for Industry-Near Environments

  • Conference paper
  • First Online:
Book cover Artificial Intelligence XXXVI (SGAI 2019)

Abstract

Deep reinforcement learning has over the past few years shown great potential in learning near-optimal control in complex simulated environments with little visible information. Rainbow (Q-Learning) and PPO (Policy Optimisation) have shown outstanding performance in a variety of tasks, including Atari 2600, MuJoCo, and Roboschool test suite. Although these algorithms are fundamentally different, both suffer from high variance, low sample efficiency, and hyperparameter sensitivity that, in practice, make these algorithms a no-go for critical operations in the industry.

On the other hand, model-based reinforcement learning focuses on learning the transition dynamics between states in an environment. If the environment dynamics are adequately learned, a model-based approach is perhaps the most sample efficient method for learning agents to act in an environment optimally. The traits of model-based reinforcement are ideal for real-world environments where sampling is slow and in mission-critical operations. In the warehouse industry, there is an increasing motivation to minimise time and to maximise production. In many of these environments, the literature suggests that the autonomous agents in these environments act suboptimally using handcrafted policies for a significant portion of the state-space.

In this paper, we present The Dreaming Variational Autoencoder v2 (DVAE-2), a model-based reinforcement learning algorithm that increases sample efficiency, hence enable algorithms with low sample efficiency function better in real-world environments. We introduce the Deep Warehouse environment for industry-near testing of autonomous agents in logistic warehouses. We illustrate that the DVAE-2 algorithm improves the sample efficiency for the Deep Warehouse compared to model-free methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(\mathcal {S}\) and \(\mathcal {A}\) is defined for discrete or continuous spaces. \(r: \mathcal {S} \times \mathcal {A} \rightarrow \mathbb {R}\) where r is commonly referred to as \(\mathcal {R}(s, s')\) in the literature.

  2. 2.

    In this setting, the lowest score is the technique with least accumulated error.

  3. 3.

    We use the mean squared error (MSE) loss in our implementation.

  4. 4.

    The deep warehouse environment is open-source and freely available at https://github.com/cair/deep-warehouse.

  5. 5.

    We recognise large experiments to consist of environments where the agents require significant sampling to converge.

References

  1. Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards a deep reinforcement learning approach for tower line wars. In: Bramer, M., Petridis, M. (eds.) SGAI 2017. LNCS (LNAI), vol. 10630, pp. 101–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71078-5_8

    Chapter  Google Scholar 

  2. Andersen, P.A., Goodwin, M., Granmo, O.C.: Deep RTS: a game environment for deep reinforcement learning in real-time strategy games. In: Proceedings of the IEEE International Conference on Computational Intelligence and Games, August 2018. http://arxiv.org/abs/1808.05032

  3. Andersen, P.-A., Goodwin, M., Granmo, O.-C.: The dreaming variational autoencoder for reinforcement learning environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2018. LNCS (LNAI), vol. 11311, pp. 143–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04191-5_11

    Chapter  Google Scholar 

  4. Azar, M.G., Piot, B., Pires, B.A., Grill, J.B., Altché, F., Munos, R.: World Discovery Models. arxiv preprint arXiv:1902.07685, February 2019. http://arxiv.org/abs/1902.07685

  5. Blundell, C., et al.: Model-Free Episodic Control. arxiv preprint arXiv:1606.04460, June 2016. http://arxiv.org/abs/1606.04460

  6. Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–422 (2019). https://doi.org/10.1016/j.tics.2019.02.006. http://www.ncbi.nlm.nih.gov/pubmed/31003893

    Article  Google Scholar 

  7. Brockman, G., et al.: OpenAI Gym. arxiv preprint arXiv:1606.01540, June 2016. http://arxiv.org/abs/1606.01540

  8. Buckman, J., Hafner, D., Tucker, G., Brevdo, E., Lee, H.: Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8224–8234, July 2018. http://arxiv.org/abs/1807.01675

  9. Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, vol. 31, May 2018. http://arxiv.org/abs/1805.12114

  10. Greensmith, E., Bartlett, P.L., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. J. Mach. Learn. Res. 5, 1471–1530 (2004)

    MathSciNet  MATH  Google Scholar 

  11. Gregor, K., Rezende, D.J., Besse, F., Wu, Y., Merzic, H., van den Oord, A.: Shaping Belief States with Generative Environment Models for RL. arxiv preprint arXiv:1906.09237, June 2019. http://arxiv.org/abs/1906.09237

  12. Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, September 2018. http://arxiv.org/abs/1809.01999

  13. Ha, D., Schmidhuber, J.: World Models. arxiv preprint arXiv:1803.10122, March 2018. https://doi.org/10.5281/zenodo.1207631, https://arxiv.org/abs/1803.10122

  14. Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: Proceedings of the 36th International Conference on Machine Learning, November 2018. http://arxiv.org/abs/1811.04551

  15. Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016. https://openreview.net/forum?id=Sy2fzU9gl

  16. Janner, M., Fu, J., Zhang, M., Levine, S.: When to Trust Your Model: Model-Based Policy Optimization. arXiv preprint arXiv:1906.08253, June 2019. http://arxiv.org/abs/1906.08253

  17. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996). https://doi.org/10.1.1.68.466, http://arxiv.org/abs/cs/9605103

  18. Liang, X., Wang, Q., Feng, Y., Liu, Z., Huang, J.: VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control. arxiv preprint arXiv:1812.09968, December 2018. http://arxiv.org/abs/1812.09968

  19. Mnih, V., et al.: Playing Atari with deep reinforcement learning. In: Neural Information Processing Systems, December 2013. http://arxiv.org/abs/1312.5602

  20. Roodbergen, K.J., Vis, I.F.A.: A survey of literature on automated storage and retrieval systems. Eur. J. Oper. Res. (2009). https://doi.org/10.1016/j.ejor.2008.01.038

    Article  MATH  Google Scholar 

  21. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347, July 2017. http://arxiv.org/abs/1707.06347

  22. Sutton, R.S.: The Bitter Lesson (2019). http://www.incompleteideas.net/IncIdeas/BitterLesson.html

  23. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Per-Arne Andersen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Andersen, PA., Goodwin, M., Granmo, OC. (2019). Towards Model-Based Reinforcement Learning for Industry-Near Environments. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34885-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34884-7

  • Online ISBN: 978-3-030-34885-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics