Policy Iteration for Learning an Exercise Policy for American Options

Li, Yuxi; Schuurmans, Dale

doi:10.1007/978-3-540-89722-4_13

Yuxi Li³ &
Dale Schuurmans³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Included in the following conference series:

European Workshop on Reinforcement Learning

1104 Accesses
2 Citations

Abstract

Options are important financial instruments, whose prices are usually determined by computational methods. Computational finance is a compelling application area for reinforcement learning research, where hard sequential decision making problems abound and have great practical significance. In this paper, we investigate reinforcement learning methods, in particular, least squares policy iteration (LSPI), for the problem of learning an exercise policy for American options. We also investigate a method by Tsitsiklis and Van Roy, referred to as FQI. We compare LSPI and FQI with LSM, the standard least squares Monte Carlo method from the finance community. We evaluate their performance on both real and synthetic data. The results show that the exercise policies discovered by LSPI and FQI gain larger payoffs than those discovered by LSM, on both real and synthetic data. Our work shows that solution methods developed in reinforcement learning can advance the state of the art in an important and challenging application area, and demonstrates furthermore that computational finance remains an under-explored area for deployment of reinforcement learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Antos, A., Szepesvari, C., Munos, R.: Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal 71, 89–129 (2008)
Article MATH Google Scholar
Bertsekas, D.P.: Dynamic programming and optimal control. Athena Scientific, Massachusetts (1995)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Massachusetts (1996)
MATH Google Scholar
Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22(1-3), 33–57 (1996)
Article MATH Google Scholar
Broadie, M., Detemple, J.B.: Option pricing: valuation models and applications. Management Science 50(9), 1145–1177 (2004)
Article Google Scholar
Duffie, D.: Dynamic asset pricing theory. Princeton University Press, Princeton (2001)
MATH Google Scholar
Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)
MATH Google Scholar
Hull, J.C.: Options, Futures and Other Derivatives, 6th edn. Prentice Hall, Englewood Cliffs (2006)
MATH Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet MATH Google Scholar
Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. The Review of Financial Studies 14(1), 113–147 (Spring, 2001)
Article Google Scholar
Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12(4), 875–889 (2001)
Article Google Scholar
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York (1994)
Book MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks (special issue on computational finance) 12(4), 694–703 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Alberta, Canada
Yuxi Li & Dale Schuurmans

Authors

Yuxi Li
View author publications
You can also search for this author in PubMed Google Scholar
Dale Schuurmans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Schuurmans, D. (2008). Policy Iteration for Learning an Exercise Policy for American Options. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics