Towards Model-Based Reinforcement Learning for Industry-Near Environments

Andersen, Per-Arne; Goodwin, Morten; Granmo, Ole-Christoffer

doi:10.1007/978-3-030-34885-4_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11927))

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

1171 Accesses
5 Citations

Abstract

Deep reinforcement learning has over the past few years shown great potential in learning near-optimal control in complex simulated environments with little visible information. Rainbow (Q-Learning) and PPO (Policy Optimisation) have shown outstanding performance in a variety of tasks, including Atari 2600, MuJoCo, and Roboschool test suite. Although these algorithms are fundamentally different, both suffer from high variance, low sample efficiency, and hyperparameter sensitivity that, in practice, make these algorithms a no-go for critical operations in the industry.

On the other hand, model-based reinforcement learning focuses on learning the transition dynamics between states in an environment. If the environment dynamics are adequately learned, a model-based approach is perhaps the most sample efficient method for learning agents to act in an environment optimally. The traits of model-based reinforcement are ideal for real-world environments where sampling is slow and in mission-critical operations. In the warehouse industry, there is an increasing motivation to minimise time and to maximise production. In many of these environments, the literature suggests that the autonomous agents in these environments act suboptimally using handcrafted policies for a significant portion of the state-space.

In this paper, we present The Dreaming Variational Autoencoder v2 (DVAE-2), a model-based reinforcement learning algorithm that increases sample efficiency, hence enable algorithms with low sample efficiency function better in real-world environments. We introduce the Deep Warehouse environment for industry-near testing of autonomous agents in logistic warehouses. We illustrate that the DVAE-2 algorithm improves the sample efficiency for the Deep Warehouse compared to model-free methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
\(\mathcal {S}\) and \(\mathcal {A}\) is defined for discrete or continuous spaces. \(r: \mathcal {S} \times \mathcal {A} \rightarrow \mathbb {R}\) where r is commonly referred to as \(\mathcal {R}(s, s')\) in the literature.
2.
In this setting, the lowest score is the technique with least accumulated error.
3.
We use the mean squared error (MSE) loss in our implementation.
4.
The deep warehouse environment is open-source and freely available at https://github.com/cair/deep-warehouse.
5.
We recognise large experiments to consist of environments where the agents require significant sampling to converge.

References

Andersen, P.-A., Goodwin, M., Granmo, O.-C.: Towards a deep reinforcement learning approach for tower line wars. In: Bramer, M., Petridis, M. (eds.) SGAI 2017. LNCS (LNAI), vol. 10630, pp. 101–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71078-5_8
Chapter Google Scholar
Andersen, P.A., Goodwin, M., Granmo, O.C.: Deep RTS: a game environment for deep reinforcement learning in real-time strategy games. In: Proceedings of the IEEE International Conference on Computational Intelligence and Games, August 2018. http://arxiv.org/abs/1808.05032
Andersen, P.-A., Goodwin, M., Granmo, O.-C.: The dreaming variational autoencoder for reinforcement learning environments. In: Bramer, M., Petridis, M. (eds.) SGAI 2018. LNCS (LNAI), vol. 11311, pp. 143–155. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04191-5_11
Chapter Google Scholar
Azar, M.G., Piot, B., Pires, B.A., Grill, J.B., Altché, F., Munos, R.: World Discovery Models. arxiv preprint arXiv:1902.07685, February 2019. http://arxiv.org/abs/1902.07685
Blundell, C., et al.: Model-Free Episodic Control. arxiv preprint arXiv:1606.04460, June 2016. http://arxiv.org/abs/1606.04460
Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–422 (2019). https://doi.org/10.1016/j.tics.2019.02.006. http://www.ncbi.nlm.nih.gov/pubmed/31003893
Article Google Scholar
Brockman, G., et al.: OpenAI Gym. arxiv preprint arXiv:1606.01540, June 2016. http://arxiv.org/abs/1606.01540
Buckman, J., Hafner, D., Tucker, G., Brevdo, E., Lee, H.: Sample-efficient reinforcement learning with stochastic ensemble value expansion. In: Advances in Neural Information Processing Systems, vol. 32, pp. 8224–8234, July 2018. http://arxiv.org/abs/1807.01675
Chua, K., Calandra, R., McAllister, R., Levine, S.: Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, vol. 31, May 2018. http://arxiv.org/abs/1805.12114
Greensmith, E., Bartlett, P.L., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. J. Mach. Learn. Res. 5, 1471–1530 (2004)
MathSciNet MATH Google Scholar
Gregor, K., Rezende, D.J., Besse, F., Wu, Y., Merzic, H., van den Oord, A.: Shaping Belief States with Generative Environment Models for RL. arxiv preprint arXiv:1906.09237, June 2019. http://arxiv.org/abs/1906.09237
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, September 2018. http://arxiv.org/abs/1809.01999
Ha, D., Schmidhuber, J.: World Models. arxiv preprint arXiv:1803.10122, March 2018. https://doi.org/10.5281/zenodo.1207631, https://arxiv.org/abs/1803.10122
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: Proceedings of the 36th International Conference on Machine Learning, November 2018. http://arxiv.org/abs/1811.04551
Higgins, I., et al.: Beta-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, November 2016. https://openreview.net/forum?id=Sy2fzU9gl
Janner, M., Fu, J., Zhang, M., Levine, S.: When to Trust Your Model: Model-Based Policy Optimization. arXiv preprint arXiv:1906.08253, June 2019. http://arxiv.org/abs/1906.08253
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. (1996). https://doi.org/10.1.1.68.466, http://arxiv.org/abs/cs/9605103
Liang, X., Wang, Q., Feng, Y., Liu, Z., Huang, J.: VMAV-C: A Deep Attention-based Reinforcement Learning Algorithm for Model-based Control. arxiv preprint arXiv:1812.09968, December 2018. http://arxiv.org/abs/1812.09968
Mnih, V., et al.: Playing Atari with deep reinforcement learning. In: Neural Information Processing Systems, December 2013. http://arxiv.org/abs/1312.5602
Roodbergen, K.J., Vis, I.F.A.: A survey of literature on automated storage and retrieval systems. Eur. J. Oper. Res. (2009). https://doi.org/10.1016/j.ejor.2008.01.038
Article MATH Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. arxiv preprint arXiv:1707.06347, July 2017. http://arxiv.org/abs/1707.06347
Sutton, R.S.: The Bitter Lesson (2019). http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of ICT, University of Agder, Grimstad, Norway
Per-Arne Andersen, Morten Goodwin & Ole-Christoffer Granmo

Authors

Per-Arne Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Morten Goodwin
View author publications
You can also search for this author in PubMed Google Scholar
Ole-Christoffer Granmo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Per-Arne Andersen .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, UK
Max Bramer
Middlesex University, London, UK
Miltos Petridis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andersen, PA., Goodwin, M., Granmo, OC. (2019). Towards Model-Based Reinforcement Learning for Industry-Near Environments. In: Bramer, M., Petridis, M. (eds) Artificial Intelligence XXXVI. SGAI 2019. Lecture Notes in Computer Science(), vol 11927. Springer, Cham. https://doi.org/10.1007/978-3-030-34885-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-34885-4_3
Published: 19 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34884-7
Online ISBN: 978-3-030-34885-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics