Discrete Event Dynamic Systems

, Volume 19, Issue 1, pp 91–113 | Cite as

A New Learning Algorithm for Optimal Stopping

  • Vivek S. Borkar
  • Jervis Pinto
  • Tarun Prabhu


A linear programming formulation of the optimal stopping problem for Markov decision processes is approximated using linear function approximation. Using this formulation, a reinforcement learning scheme based on a primal-dual method and incorporating a sampling device called ‘split sampling’ is proposed and analyzed. An illustrative example from option pricing is also included.


Learning algorithm Optimal stopping Linear programming Primal-dual methods Split sampling Option pricing 


  1. Ahamed TKI, Borkar VS, Juneja S (2006) Adaptive importance sampling for Markov chains using stochastic approximation. Oper Res 54:489–504CrossRefMathSciNetGoogle Scholar
  2. Andersen L, Broadie M (2004) Primal-dual simulation algorithm for pricing multidimensional American options. Manage Sci 50:1222–1234CrossRefGoogle Scholar
  3. Barman K, Borkar VS (2008) A note on linear function approximation using random projections. Syst Control Lett 57:784–786CrossRefMathSciNetGoogle Scholar
  4. Bensoussan A (1982) Stochastic control by functional analysis methods. North Holland, AmsterdamMATHGoogle Scholar
  5. Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, BelmontMATHGoogle Scholar
  6. Bertsekas DP (2005) Dynamic programming and optimal control, vol 1, 3rd edn. Athena Scientific, BelmontMATHGoogle Scholar
  7. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, BelmontMATHGoogle Scholar
  8. Bolia N, Glasserman P, Juneja S (2004) Function-approximation-based importance sampling for pricing American options. In: Proceedings of the 2004 winter simulation conference. IEEE, New York, pp 604–611Google Scholar
  9. Borkar VS (1997) Stochastic approximation with two time scales. Syst Control Lett 29:291–294MATHCrossRefMathSciNetGoogle Scholar
  10. Borkar VS (2005) An actor-critic algorithm for constrained Markov decision processes. Syst Control Lett 54:207–213MATHCrossRefMathSciNetGoogle Scholar
  11. Borkar VS (2008) Stochastic approximation: a dynamical systems view. Hindustan Publ. Co., New Delhi, India and Cambridge Uni. Press, Cambridge, UKGoogle Scholar
  12. Borkar VS, Meyn SP (2000) The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J Control Optim 38:447–469MATHCrossRefMathSciNetGoogle Scholar
  13. Cho MJ, Stockbridge RH (2002) Linear programming formulation for optimal stopping problems. SIAM J Control Optim 40:1965–1982MATHCrossRefMathSciNetGoogle Scholar
  14. Choi D, Van Roy B (2006) A generalized Kalman filter for fixed point approximation and efficient temporal difference learning. Disc Event Dyn Syst 16:207–239MATHCrossRefGoogle Scholar
  15. De Farias DP, Van Roy B (2003) The linear programming approach to approximate dynamic programming. Oper Res 51:850–865CrossRefMathSciNetGoogle Scholar
  16. De Farias DP, Van Roy B (2004) On constraint sampling in the linear programming approach to approximate dynamic programming. Math Oper Res 29:462–478MATHCrossRefMathSciNetGoogle Scholar
  17. Dynkin EB (1963) The optimum choice of the instant of stopping a Markov process. Dokl Acad Nauk SSSR 150:238–240 (in Russian; English translation in Sov Math Dokl 4:627–629).MathSciNetGoogle Scholar
  18. Glasserman P (2003) Monte Carlo methods in financial engineering. Springer, New YorkGoogle Scholar
  19. Haugh MB, Kogan L (2004) Pricing American options: a duality approach. Oper Res 52:258–270CrossRefMathSciNetGoogle Scholar
  20. Hernández-Lerma O, Lasserre J-B (1996) Discrete-time Markov control processes. Springer, New YorkGoogle Scholar
  21. Hirsch MW (1989) Convergent activation dynamics in continuous time networks. Neural Netw 2:331–349CrossRefGoogle Scholar
  22. Konda VR, Tsitsiklis JN (2003) On actor-critic algorithms. SIAM J Control Optim 42:1143–1166MATHCrossRefMathSciNetGoogle Scholar
  23. Kushner HJ, Clark DS (1978) Stochastic approximation for constrained and unconstrained systems. Springer, New YorkGoogle Scholar
  24. Luenberger DG (1968) Optimization by vector space methods. Wiley, New YorkGoogle Scholar
  25. Longstaff FA, Schwartz ES (2001) Valuing American options by simulation: a simple least-square approach. Rev Financ Stud 14:113–147CrossRefGoogle Scholar
  26. Mas-Colell A, Whinston MD, Green JR (1995) Microeconomic theory. Oxford University Press, OxfordGoogle Scholar
  27. Rogers LCG (2002) Monte Carlo valuations of American options. Math Finance 12:271–286MATHCrossRefMathSciNetGoogle Scholar
  28. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT, CambridgeGoogle Scholar
  29. Szepesvari C, Smart WD (2004) Interpolation-based Q-learning. In: Proc. of the 21st intl. conf. on machine learning. Banff, Alberta, pp 100–108Google Scholar
  30. Tsitsiklis JN, Van Roy B (1999) Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans Autom Control 44:1840–1851MATHCrossRefGoogle Scholar
  31. Tsitsiklis JN, Van Roy B (2001) Regression methods for pricing complex American-style options. IEEE Trans Neural Netw 12(4):694–703 (special issue on computational finance)CrossRefGoogle Scholar
  32. Yu H, Bertsekas DP (2007) A least squares Q-learning algorithm for optimal stopping problems. Lab. for Information and Decision Systems Report 2731. MIT, CambridgeGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Vivek S. Borkar
    • 1
  • Jervis Pinto
    • 2
    • 3
  • Tarun Prabhu
    • 2
    • 4
  1. 1.Tata Institute of Fundamental ResearchMumbaiIndia
  2. 2.St. Francis Institute of TechnologyMumbaiIndia
  3. 3.School of Electrical Engineering and Computer ScienceOregon State UniversityCorvallisUSA
  4. 4.School of ComputingUniversity of UtahSalt Lake CityUSA

Personalised recommendations