Deep Ordinal Reinforcement Learning

Zap, Alexander; Joppen, Tobias; Fürnkranz, Johannes

doi:10.1007/978-3-030-46133-1_1

Deep Ordinal Reinforcement Learning

Alexander Zap¹⁴,
Tobias Joppen¹⁴ &
Johannes Fürnkranz¹⁴

Conference paper
First Online: 30 April 2020

1788 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11908))

Abstract

Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This technique of modifying the Q-learning algorithm to deal with rewards on an ordinal scale can analogously be applied to other Q-table based reinforcement learning algorithms like Sarsa and Sarsa-\(\lambda \) [14].
2.
The source code for the implementation of the experiments can be found in https://github.com/az79nefy/OrdinalRL.
3.
For further information about OpenAI visit https://gym.openai.com.
4.
Further technical details about the environments CartPole and Acrobot from OpenAI can be found in https://gym.openai.com/envs/CartPole-v0/ and https://gym.openai.com/envs/Acrobot-v1/.

References

Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-14125-6
Book MATH Google Scholar
Gilbert, H., Weng, P.: Quantile reinforcement learning. CoRR abs/1611.00862 (2016)
Google Scholar
Hasselt, H.V., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)
Google Scholar
Joppen, T., Fürnkranz, J.: Ordinal Monte Carlo tree search. CoRR abs/1901.04274 (2019)
Google Scholar
Lin, L.J.: Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1992). uMI Order No. GAX93-22750
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning, 2nd edn. MIT Press, Cambridge (2018)
MATH Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
MATH Google Scholar
Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS 2011), Freiburg, Germany. AAAI Press (2011)
Google Scholar
Weng, P.: Ordinal decision models for Markov decision processes. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pp. 828–833. IOS Press, Montpellier (2012)
Google Scholar
Weng, P., Busa-Fekete, R., Hüllermeier, E.: Interactive q-learning with ordinal rewards and unreliable tutor. In: Proceedings of the ECML/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards (2013)
Google Scholar
Wirth, C., Akrour, R., Neumann, G., Fürnkranz, J.: A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18(136), 1–46 (2017)
MathSciNet MATH Google Scholar
Zap, A.: Ordinal reinforcement learning. Master’s thesis, Technische Universität Darmstadt (2019, to appear)
Google Scholar

Download references

Acknowledgements

This work was supported by DFG. Calculations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt.

Author information

Authors and Affiliations

TU Darmstadt, 64289, Darmstadt, Germany
Alexander Zap, Tobias Joppen & Johannes Fürnkranz

Authors

Alexander Zap
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Joppen
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Fürnkranz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tobias Joppen .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zap, A., Joppen, T., Fürnkranz, J. (2020). Deep Ordinal Reinforcement Learning. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-46133-1_1
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46132-4
Online ISBN: 978-3-030-46133-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)