Abstract
Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
This technique of modifying the Q-learning algorithm to deal with rewards on an ordinal scale can analogously be applied to other Q-table based reinforcement learning algorithms like Sarsa and Sarsa-\(\lambda \) [14].
- 2.
The source code for the implementation of the experiments can be found in https://github.com/az79nefy/OrdinalRL.
- 3.
For further information about OpenAI visit https://gym.openai.com.
- 4.
Further technical details about the environments CartPole and Acrobot from OpenAI can be found in https://gym.openai.com/envs/CartPole-v0/ and https://gym.openai.com/envs/Acrobot-v1/.
References
Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-14125-6
Gilbert, H., Weng, P.: Quantile reinforcement learning. CoRR abs/1611.00862 (2016)
Hasselt, H.V., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)
Joppen, T., Fürnkranz, J.: Ordinal Monte Carlo tree search. CoRR abs/1901.04274 (2019)
Lin, L.J.: Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1992). uMI Order No. GAX93-22750
Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning, 2nd edn. MIT Press, Cambridge (2018)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS 2011), Freiburg, Germany. AAAI Press (2011)
Weng, P.: Ordinal decision models for Markov decision processes. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pp. 828–833. IOS Press, Montpellier (2012)
Weng, P., Busa-Fekete, R., Hüllermeier, E.: Interactive q-learning with ordinal rewards and unreliable tutor. In: Proceedings of the ECML/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards (2013)
Wirth, C., Akrour, R., Neumann, G., Fürnkranz, J.: A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18(136), 1–46 (2017)
Zap, A.: Ordinal reinforcement learning. Master’s thesis, Technische Universität Darmstadt (2019, to appear)
Acknowledgements
This work was supported by DFG. Calculations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zap, A., Joppen, T., Fürnkranz, J. (2020). Deep Ordinal Reinforcement Learning. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-46133-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46132-4
Online ISBN: 978-3-030-46133-1
eBook Packages: Computer ScienceComputer Science (R0)