Applications of Reinforcement Learning
- 131 Downloads
This chapter considers real-world applications of reinforcement learning in finance, as well as further advances in the theory presented in the previous chapter. We start with one of the most common problems of quantitative finance, which is the problem of optimal portfolio trading in discrete time. Many practical problems of trading or risk management amount to different forms of dynamic portfolio optimization, with different optimization criteria, portfolio composition, and constraints. This chapter introduces a reinforcement learning approach to option pricing that generalizes the classical Black–Scholes model to a data-driven approach using Q-learning. It then presents a probabilistic extension of Q-learning called G-learning and shows how it can be used for dynamic portfolio optimization. For certain specifications of reward functions, G-learning is semi-analytically tractable and amounts to a probabilistic version of linear quadratic regulators (LQR). Detailed analyses of such cases are presented, and show their solutions with examples from problems of dynamic portfolio optimization and wealth management.
- Boyd, S., Busetti, E., Diamond, S., Kahn, R., Koh, K., Nystrup, P., et al. (2017). Multi-period trading via convex optimization. Foundations and Trends in Optimization, 1–74.Google Scholar
- Browne, S. (1996). Reaching goals by a deadline: digital options and continuous-time active portfolio management. https://www0.gsb.columbia.edu/mygsb/faculty/research/pubfiles/841/sidbrowne_deadlines.pdf.zbMATHGoogle Scholar
- Das, S. R., Ostrov, D., Radhakrishnan, A., & Srivastav, D. (2018). Dynamic portfolio allocation in goals-based wealth management. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3211951.CrossRefGoogle Scholar
- Fox, R., Pakman, A., & Tishby, N. (2015). Taming the noise in reinforcement learning via soft updates. In 32nd Conference on Uncertainty in Artificial Intelligence (UAI). https://arxiv.org/pdf/1512.08562.pdf.
- Gosavi, A. (2015). Finite horizon Markov control with one-step variance penalties. In Conference Proceedings of the Allerton Conferences, Allerton, IL.Google Scholar
- Grau, A. J. (2007). Applications of least-square regressions to pricing and hedging of financial derivatives. PhD. thesis, Technische Universit”at München.Google Scholar
- Halperin, I. (2018). QLBS: Q-learner in the Black-Scholes(-Merton) worlds. Journal of Derivatives 2020, (to be published). Available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3087076.
- Halperin, I. (2019). The QLBS Q-learner goes NuQLear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19(9). https://doi.org/10.1080/14697688.2019.1622302, available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3102707.
- Halperin, I., & Feldshteyn, I. (2018). Market self-learning of signals, impact and optimal trading: invisible hand inference with free energy, (or, how we learned to stop worrying and love bounded rationality). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3174498.Google Scholar
- Markowitz, H. (1959). Portfolio selection: efficient diversification of investment. John Wiley.Google Scholar
- Ortega, P. A., & Lee, D. D. (2014). An adversarial interpretation of information-theoretic bounded rationality. In Proceedings of the Twenty-Eighth AAAI Conference on AI. https://arxiv.org/abs/1404.5668.
- Petrelli, A., Balachandran, R., Siu, O., Chatterjee, R., Jun, Z., & Kapoor, V. (2010). Optimal dynamic hedging of equity options: residual-risks transaction-costs. working paper.Google Scholar
- Todorov, E., & Li, W. (2005). A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems. In Proceeding of the American Control Conference, Portland OR, USA, pp. 300–306.Google Scholar
- van Hasselt, H. (2010). Double Q-learning. Advances in Neural Information Processing Systems. http://papers.nips.cc/paper/3964-double-q-learning.pdf.
- Watkins, C. J. (1989). Learning from delayed rewards. Ph.D. Thesis, Kings College, Cambridge, England.Google Scholar
- Wilmott, P. (1998). Derivatives: the theory and practice of financial engineering. Wiley.Google Scholar