Skip to main content

Deep Ordinal Reinforcement Learning

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11908))

Abstract

Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This technique of modifying the Q-learning algorithm to deal with rewards on an ordinal scale can analogously be applied to other Q-table based reinforcement learning algorithms like Sarsa and Sarsa-\(\lambda \) [14].

  2. 2.

    The source code for the implementation of the experiments can be found in https://github.com/az79nefy/OrdinalRL.

  3. 3.

    For further information about OpenAI visit https://gym.openai.com.

  4. 4.

    Further technical details about the environments CartPole and Acrobot from OpenAI can be found in https://gym.openai.com/envs/CartPole-v0/ and https://gym.openai.com/envs/Acrobot-v1/.

References

  1. Fürnkranz, J., Hüllermeier, E. (eds.): Preference Learning. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-14125-6

    Book  MATH  Google Scholar 

  2. Gilbert, H., Weng, P.: Quantile reinforcement learning. CoRR abs/1611.00862 (2016)

    Google Scholar 

  3. Hasselt, H.V., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 2094–2100. AAAI Press (2016)

    Google Scholar 

  4. Joppen, T., Fürnkranz, J.: Ordinal Monte Carlo tree search. CoRR abs/1901.04274 (2019)

    Google Scholar 

  5. Lin, L.J.: Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (1992). uMI Order No. GAX93-22750

    Google Scholar 

  6. Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)

    Google Scholar 

  7. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  8. Sutton, R.S., Barto, A.G.: Reinforcement Learning - An Introduction. Adaptive Computation and Machine Learning, 2nd edn. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  9. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    MATH  Google Scholar 

  10. Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling (ICAPS 2011), Freiburg, Germany. AAAI Press (2011)

    Google Scholar 

  11. Weng, P.: Ordinal decision models for Markov decision processes. In: Proceedings of the 20th European Conference on Artificial Intelligence (ECAI 2012), pp. 828–833. IOS Press, Montpellier (2012)

    Google Scholar 

  12. Weng, P., Busa-Fekete, R., Hüllermeier, E.: Interactive q-learning with ordinal rewards and unreliable tutor. In: Proceedings of the ECML/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards (2013)

    Google Scholar 

  13. Wirth, C., Akrour, R., Neumann, G., Fürnkranz, J.: A survey of preference-based reinforcement learning methods. J. Mach. Learn. Res. 18(136), 1–46 (2017)

    MathSciNet  MATH  Google Scholar 

  14. Zap, A.: Ordinal reinforcement learning. Master’s thesis, Technische Universität Darmstadt (2019, to appear)

    Google Scholar 

Download references

Acknowledgements

This work was supported by DFG. Calculations for this research were conducted on the Lichtenberg high performance computer of the TU Darmstadt.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Joppen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zap, A., Joppen, T., Fürnkranz, J. (2020). Deep Ordinal Reinforcement Learning. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11908. Springer, Cham. https://doi.org/10.1007/978-3-030-46133-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46133-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46132-4

  • Online ISBN: 978-3-030-46133-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics